The Self-Reflection Loop:
PHASE 1: Generate initial response
User: [Question]
Assistant: [Initial answer]
PHASE 2: Self-critique
User: Now critique your previous answer. Identify any errors, omissions, or improvements.
PHASE 3: Revised response
Assistant: [Critique listing issues]
PHASE 4: Final verification
User: Now provide a revised answer addressing the critique.
Self-Reflection Prompt Template:
You are a careful AI that double-checks its work.
STEP 1: Provide your initial answer to this question:
[Question]
STEP 2: Review your answer against these criteria:
- Is it factually accurate? (Check each claim)
- Does it fully answer the question? (No missing parts)
- Is it well structured and clear?
- Could there be edge cases or exceptions?
- Is there a better approach?
List any issues you find:
<self_critique>
Issue 1: ...
Issue 2: ...
</self_critique>
STEP 3: Provide a revised answer that addresses all issues:
<revised_answer>
[Improved answer]
</revised_answer>
STEP 4: Rate your revised answer on a scale of 1-10 for quality:
<self_rating>[Rating]/10</self_rating>
Real Example:
User: Calculate 17% of 243.89, then add 5.5, then multiply by 0.92.
Claude (Self-Reflection):
STEP 1 - Initial answer:
17% of 243.89 = 41.4613
Add 5.5 = 46.9613
Multiply by 0.92 = 43.204396
Final: 43.20 (rounded)
STEP 2 - Self-critique:
<self_critique>
Issue 1: I rounded prematurely (41.4613 then final). Should carry full precision.
Issue 2: Didn't specify rounding rule (to 2 decimal places? nearest cent?)
Issue 3: Didn't show intermediate steps clearly
Issue 4: Didn't verify with alternative method
</self_critique>
STEP 3 - Revised answer:
Let me recalculate with full precision:
Step 1: 17% of 243.89 = 243.89 × 0.17 = 41.4613 (exact)
Step 2: Add 5.5: 41.4613 + 5.5 = 46.9613
Step 3: Multiply by 0.92: 46.9613 × 0.92 = 43.204396
Step 4: Round to 2 decimal places: 43.20
Verification: (243.89 × 0.17 + 5.5) × 0.92 = 43.204396
<revised_answer>43.20</revised_answer>
STEP 4 - Self-rating:
<self_rating>9/10</self_rating>
Beyond Chain of Thought:
Chain of Thought (CoT) explores one reasoning path.
Tree of Thought (ToT) explores multiple paths simultaneously, evaluates each, and chooses the best.
ToT Prompt Structure:
PROBLEM: [Complex problem requiring reasoning]
For this problem, explore 3 different reasoning paths:
PATH A: [First approach]
- Step 1: ...
- Step 2: ...
- Step 3: ...
- Conclusion A: ...
PATH B: [Second approach]
- Step 1: ...
- Step 2: ...
- Step 3: ...
- Conclusion B: ...
PATH C: [Third approach]
- Step 1: ...
- Step 2: ...
- Step 3: ...
- Conclusion C: ...
Now evaluate each path:
Path A quality (1-10): [score] because [reasoning]
Path B quality (1-10): [score] because [reasoning]
Path C quality (1-10): [score] because [reasoning]
Best path: [A/B/C]
Final answer based on best path: [answer]
Real ToT Example: Strategy Problem
User: A startup has $500k runway, 6 months to launch. They need to choose:
Option 1 - Build full product (9 months, $800k)
Option 2 - Build MVP (4 months, $300k then raise)
Option 3 - Outsource (6 months, $600k)
Which option maximizes success probability?
Claude (Tree-of-Thought):
PATH A - Build full product:
Month 1-3: Requirements and architecture
Month 4-6: Core development
Month 7-9: Testing, polishing, launch
Financial: Need $800k but have $500k → must raise $300k in month 3
Success probability: 30% (fundraising risk + time overrun + market risk)
PATH B - Build MVP then raise:
Month 1-2: Core features only (user auth, basic functionality)
Month 3-4: Launch MVP, get initial users
Month 5-6: Show traction, raise Series A
Success probability: 65% (lower upfront risk, traction-based fundraising)
PATH C - Outsource:
Month 1-2: Vendor selection and contracts
Month 3-5: Vendor development (limited control)
Month 6: Integration and testing
Financial: $600k, leaving -$100k (must raise or cut costs)
Success probability: 20% (quality risk, IP risk, cost overrun risk)
EVALUATION:
Path A score: 4/10 (too optimistic on timeline/funding)
Path B score: 9/10 (standard lean startup approach)
Path C score: 2/10 (outsourcing complex product rarely works)
Best path: Path B
Final answer: Option 2 - Build MVP (4 months, $300k) then raise.
Native Tool Use (Claude API):
tools = [
{
"name": "get_weather",
"description": "Get current weather for a location",
"input_schema": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name or coordinates"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit"
}
},
"required": ["location"]
}
}
]
response = client.messages.create(
model="claude-3-sonnet-20241022",
messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
tools=tools
)
Manual Tool-Use Prompting (Without API):
You have access to these tools:
TOOL 1: calculate(expression)
- Use for mathematical calculations
- Example: calculate("25 * 4 + 10") returns 110
TOOL 2: get_current_time(timezone)
- Returns current date and time
When you need to use a tool, respond with:
<tool_call>
{"tool": "tool_name", "parameters": {"param1": "value1"}}
</tool_call>
Then I will execute the tool and give you the result.
Now answer: What's 15% of 247.89?
Manual Tool-Use in Practice:
Claude: Let me calculate that.
<tool_call>
{"tool": "calculate", "parameters": {"expression": "247.89 * 0.15"}}
</tool_call>
[System executes: calculate("247.89 * 0.15") → 37.1835]
Claude: 15% of 247.89 is 37.18 (rounded).
Final answer: 37.18
Why Compress Prompts?
- Save tokens (reduces cost)
- Faster processing (less context to attend to)
- Fit more into context window
Compression Strategies:
Strategy 1: Remove Redundancy
Before (75 tokens):
You are an expert software engineer with 10 years of experience in
Python, Django, FastAPI, Flask, SQLAlchemy, and PostgreSQL.
After (30 tokens):
Senior Python backend engineer (Django, FastAPI, PostgreSQL)
Strategy 2: Use Acronyms
Before:
You must never provide financial advice, never predict stock prices,
never guarantee returns, and never recommend specific investments.
After:
NEVER: financial advice | stock predictions | return guarantees | specific tickers
Strategy 3: Compact Formatting
Before (verbose):
Please output your response in valid JSON format. The JSON should have
three fields: 'name' which should be a string, 'age' which should be
an integer, and 'email' which should be a string.
After (compact):
Output: {"name": str, "age": int, "email": str}
Compression Results:
| Metric | Before | After | Improvement |
|---|---|---|---|
| Tokens | 550 | 180 | -67% |
| Clarity | Good | Better | + |
| Accuracy | 96% | 95% | -1% (acceptable) |
| Cost per 1K requests | $1.65 | $0.54 | -67% |
Compressed Prompt Example:
Before (550 tokens - verbose classifier):
After (180 tokens - compressed):Classify e-commerce messages:
intents: refund|status|product|complaint|other
extract: order_id(#\\d{5}), product_name, urgency(high:urgent/asap|medium|low)
Examples:
"#12345 late urgent" → {"intent":"status","order_id":"12345","urgency":"high"}
"refund broken headphones #67890" → {"intent":"refund","order_id":"67890","product":"headphones"}
Output: {"intent":"str","order_id":"str|null","product":"str|null","urgency":"str"}
Classify: {message}