PROGRESS

HOW CLAUDE THINKS

Task 1 of 5

Tokens: The Currency of Claude

What is a token?

Tokens are the pieces Claude reads and writes. They're not characters or words - they're subword units optimized for the model's vocabulary.

Tokenization Examples:

Text	Tokens
"Hello"	1
"Hello world"	2
"Hello, world!"	3
"antidisestablishment"	4
"The quick brown fox jumps over the lazy dog"	9
"Claude is an AI assistant created by Anthropic"	9

Token-to-Text Ratio:

Language/Content	Tokens per unit
English text	~0.75 tokens per word (1,000 words ≈ 750 tokens)
Code	~1.2 tokens per character (1,000 chars ≈ 1,200 tokens)
Chinese/Japanese	~2 tokens per character
JSON	~1.5 tokens per character

Why Tokens Matter:

Cost - You pay per token (input and output)
Context limits - You can only send 200K tokens at once
Speed - More tokens = slower processing

Cost Calculation Formula:

text

Cost = (Input Tokens × Input Price) + (Output Tokens × Output Price)

Example (Claude 3.5 Sonnet):

Input: 10,000 tokens × 3.00/1M=3.00/1M=0.03
Output: 2,000 tokens × 15.00/1M=15.00/1M=0.03
Total: $0.06 per conversation

Scale math:

1,000 conversations = $60
10,000 conversations = $600
1,000,000 conversations = $60,000

Pro Tip: Cache and reuse system prompts. If your system prompt is 10,000 tokens and you send it with every request, you're burning money. Store it and only send it when necessary.

Hands-On: Approximate Token Counter

python

def approximate_tokens(text):
    """Rough heuristic - actual tokenizer is more complex"""
    words = len(text.split())
    chars = len(text)
    return int(words * 0.75 + chars * 0.3)

# Test it
sample = "Claude is an AI assistant created by Anthropic."
print(f"Approximate tokens:{approximate_tokens(sample)}")
# Output: Approximate tokens: 9

1 / 5

Task 2 of 5

The Context Window: Claude's Working Memory

Anatomy of a Context Window:

[System Prompt]        → 5,000 tokens (instructions for behavior)
[Previous Messages]    → 50,000 tokens (conversation history)
[Retrieved Documents]  → 100,000 tokens (RAG results)
[Current User Input]   → 5,000 tokens (the current question)
[Assistant Response]   → Up to 40,000 tokens (Claude's reply)
─────────────────────────────────────────────────
TOTAL: 200,000 tokens max (1M in beta)

What 200K Tokens Means Practically:

Entire "The Great Gatsby" (72K tokens) → 2.7x over
Average startup pitch deck + financials + market research
4 hours of transcribed conversation
Medium-sized codebase (5-10 files)
Full technical documentation for an API

What 1M Tokens Means Practically:

All three books of "The Lord of the Rings" trilogy
Full year of Slack conversations for a 20-person team
Complete codebase for a modest startup
Entire onboarding documentation suite

The "Lost in the Middle" Phenomenon:

Research shows Claude (like all LLMs) pays more attention to the beginning and end of context windows, with reduced attention to the middle.

text

ATTENTION DISTRIBUTION ACROSS 200K CONTEXT:

High ████████████████████████████████████████
Mid  ████████████████████░░░░░░░░░░░░░░░░░░░░
Low  ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
     0%                                     100%
     Beginning                              End

Strategic Placement:

BAD: Critical info in the middle (it gets lost)

context = [
    "Introduction",
    "Background",
    "CRITICAL_INSTRUCTION: Do not hallucinate",  # LOST!
    "Examples",
    "Current question"
]

GOOD: Critical info at beginning or end

context = [
    "CRITICAL_INSTRUCTION: Do not hallucinate",  # High attention
    "Introduction",
    "Background",
    "Examples",
    "Current question",
    "REMINDER: Do not hallucinate"  # High attention
]

Production Pattern: "Bookending"

System Prompt:
[Place most important instructions here - gets highest attention]

User Message:
[Detailed context, examples, data]

[IMPORTANT: Remember to [critical instruction] when responding]

[User's actual question]

REMINDER: [Repeat critical instruction before response]

2 / 5

Task 3 of 5

Memory: Claude Forgets Between Conversations

The Default Behavior:

Conversation 1:                    Conversation 2:
User: "My name is Alice"           User: "What's my name?"
Claude: "Nice to meet you, Alice"  Claude: "I don't know your name"

Claude remembers within chat       Claude starts fresh each chat

Explicit Memory Implementation:

python

# external_memory.py
import sqlite3
import json
from datetime import datetime

class ClaudeMemory:
    def __init__(self, user_id):
        self.conn = sqlite3.connect('claude_memory.db')
        self.user_id = user_id
        self._init_db()

    def _init_db(self):
        self.conn.execute("""
            CREATE TABLE IF NOT EXISTS user_memory (
                user_id TEXT,
                key TEXT,
                value TEXT,
                timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
            )
        """)
        self.conn.commit()

    def remember(self, key, value):
        self.conn.execute(
            "INSERT INTO user_memory (user_id, key, value) VALUES (?, ?, ?)",
            (self.user_id, key, value)
        )
        self.conn.commit()

    def recall(self, key):
        cursor = self.conn.execute(
            "SELECT value FROM user_memory WHERE user_id=? AND key=? ORDER BY timestamp DESC LIMIT 1",
            (self.user_id, key)
        )
        row = cursor.fetchone()
        return row[0] if row else None

    def build_context(self):
        cursor = self.conn.execute(
            "SELECT key, value FROM user_memory WHERE user_id=? ORDER BY timestamp DESC LIMIT 20",
            (self.user_id,)
        )
        memories = cursor.fetchall()
        context = "User information I know:\\n"
        for key, value in memories:
            context += f"-{key}:{value}\\n"
        return context

# Usage
memory = ClaudeMemory(user_id="user_123")
memory.remember("name", "Alice")
memory.remember("preferred_language", "Python")

# Inject into Claude prompt
response = client.messages.create(
    system=f"""
    You are a helpful assistant. Here is what you know about the user:
    {memory.build_context()}

    If the user tells you new information, remember it for future conversations.
    """,
    messages=[...]
)

3 / 5

Task 4 of 5

Hallucinations: When Claude Gets It Wrong

What is a hallucination?

A hallucination is when Claude confidently states something that is false, fabricated, or unsupported by available information.

Hallucination Rate by Task Type:

Task	Claude 3.5	GPT-4
Factual question (common knowledge)	<1%	~2%
Factual question (obscure)	5-10%	15-20%
Summarization (with source)	<1%	~3%
Summarization (no source)	8-12%	15-20%
Code generation	2-5%	5-8%
Mathematical reasoning	3-7%	8-12%
Name/date recall	10-15%	20-25%

Common Hallucination Patterns:

Pattern 1: "Confident Falsehood"

text

User: "Who wrote the book 'The Purple Planet'?"
Claude: "The Purple Planet was written by Dr. Sarah Chen in 2019"
Truth: The book doesn't exist.

Pattern 2: "Information Extrapolation"

text

User: "What are the key features of product X (released next month)?"
Claude: [Provides plausible-sounding but speculative features]

Pattern 3: "False Attribution"

text

User: "Summarize this document"
Claude: [Adds details not in the original document]

Mitigation Strategies:

Strategy 1: Force Citations

text

Prompt:
For every claim you make, cite the exact source text.

Format:
[Claim] (Source: "exact quote from document")

If information is not in the provided documents, say:
"I cannot find information about [topic] in the provided sources."

Strategy 2: Temperature Control

python

# Lower temperature = less creative = fewer hallucinations
response = client.messages.create(
    model="claude-3-sonnet-20241022",
    temperature=0.2,  # Low temperature for factual tasks
    messages=[...]
)

Strategy 3: Confidence Scoring

python

class ConfidenceScorer:
    def __init__(self):
        self.client = anthropic.Anthropic()

    def ask_with_confidence(self, prompt):
        # First, get the answer
        answer = self._get_answer(prompt)

        # Second, ask Claude to self-evaluate
        confidence_prompt = f"""
        Question: {prompt}
        Answer: {answer}

        On a scale of 1-10, how confident are you that this answer is
        completely accurate? Respond with only a number.
        """

        confidence_response = self.client.messages.create(
            model="claude-3-haiku-20240307",
            max_tokens=10,
            messages=[{"role": "user", "content": confidence_prompt}]
        )

        confidence = int(confidence_response.content[0].text)

        if confidence < 7:
            # Get a second opinion
            alternative = self._get_answer(prompt, temperature=0.5)
            return {
                "answer": answer,
                "confidence": confidence,
                "alternative": alternative,
                "requires_review": True
            }

        return {"answer": answer, "confidence": confidence, "requires_review": False}

4 / 5

Task 5 of 5

System Prompts: Steering Claude's Behavior

System Prompt Anatomy:

[Role Definition]
You are [specific role] with [expertise area] and [personality traits].

[Core Principles]
Always follow these principles in order of priority:
1. [Most important rule]
2. [Second most important]
3. [Third]

[Task Specifications]
When users ask for [type of task], you should:
- [Specific behavior A]
- [Specific behavior B]

[Constraints and Boundaries]
NEVER:
- [Prohibited action 1]
- [Prohibited action 2]

ALWAYS:
- [Required action 1]
- [Required action 2]

[Format Specifications]
Output format: [JSON | Markdown | Plain text | XML]

[Anti-Hallucination Instructions]
If you don't know something, say "I don't know" rather than guessing.

Real Production System Prompt (Financial Advisor Bot):

You are a certified financial advisor AI for "WealthGuard AI" (a Series A fintech).

CORE PRINCIPLES (in priority order):
1. NEVER give personalized investment advice without collecting risk profile
2. ALWAYS include disclaimer: "This is not financial advice. Consult a certified advisor."
3. NEVER guarantee returns or predict market movements
4. Base recommendations on standard financial principles (diversification, risk management)

TASK SPECIFICATIONS:
When users ask about investments:
- Step 1: Ask for their risk tolerance (1-10)
- Step 2: Ask for investment horizon (<1 yr, 1-5 yrs, 5+ yrs)
- Step 3: Provide general educational information only
- Step 4: Recommend they speak with an advisor for personalization

OUTPUT FORMAT:
<disclaimer>[Standard disclaimer]</disclaimer>
<analysis>[Step-by-step analysis]</analysis>
<recommendation>[General guidance only, no specific securities]</recommendation>

PROHIBITED CONTENT:
- Specific stock picks (AAPL, TSLA, etc.)
- Performance predictions ("will go up 20%")
- Timing recommendations ("buy now before...")
- Promises of any kind

REQUIRED DISCLAIMER (append to every response):
"DISCLAIMER: WealthGuard AI provides educational information only.
Past performance does not guarantee future results. Always consult with
a certified financial advisor before making investment decisions."

If you cannot answer within these constraints, say:
"I cannot provide that information under financial regulations.
Let me reframe my response..."

REMEMBER: Your responses are legally binding. The company is liable for your outputs.

Why System Prompts Are Superpowers:

Without system prompt:

User: "Write a SQL query to delete all users"
Claude: "Here's how to delete all users: DELETE FROM users;" (dangerous!)

With system prompt:

System: "You are a database administrator. Never write destructive queries
without backups or WHERE clauses. Always warn about data loss risks."

User: "Write a SQL query to delete all users"

Claude: "WARNING: Deleting all users is destructive. Consider:
1. Backup first: CREATE TABLE users_backup AS SELECT * FROM users;
2. Add a WHERE clause to limit deletion
3. Would you like to archive inactive users instead?

If you still want to proceed: DELETE FROM users WHERE [condition];"

5 / 5