Extended Thinking
Give the model a private scratchpad to reason deeply before it responds.
Intent & Description
🎯 Intent
Unlock deeper, multi-step reasoning by giving the model a first-class internal monologue that doesn’t pollute the final output.
📋 Context
Standard CoT mixes reasoning and response in the same token stream, which creates pressure to produce clean, confident-looking output even mid-reasoning. Native thinking blocks remove that pressure — the model can be uncertain, wrong, and self-correcting in the scratchpad.
💡 Solution
Use models with native extended thinking support (Claude extended thinking, o1/o3 reasoning traces). Set a thinking_budget (token cap) appropriate to task complexity. The thinking block is returned separately or stripped from the user response depending on your UX needs. Pair with adaptive-compute-allocation to avoid paying for extended thinking on simple tasks. See also: chain-of-thought, scratchpad, large-reasoning-model-paradigm.
Real-world Use Case
- Hard reasoning tasks: multi-step math, complex code generation, strategic planning.
- Cases where you want the reasoning visible for audit but not shown to end users.
- Tasks where the model needs to explore multiple approaches before committing.
Source
📌 TL;DR
Let the model think privately first — the answer it gives after is meaningfully better.
Advantages
- Significantly improves accuracy on hard benchmarks vs. standard CoT.
- Thinking is isolated — model can be uncertain without undermining response confidence.
- Thinking budget is tunable — balance cost vs. reasoning depth per task.
Disadvantages
- Expensive — thinking tokens count against your bill.
- Not all models support it natively; prompting workarounds are imperfect substitutes.
- Thinking content can be verbose and hard to parse for downstream use.