Test-Time Compute Scaling
Spend more compute at inference time to get better answers — not just bigger models.
Intent & Description
🎯 Intent
Trade inference compute for answer quality. At test time, generate more candidates, search deeper, or reason longer to improve the output without changing model weights.
📋 Context
Scaling model size has diminishing returns and high training costs. Scaling inference compute is more flexible — you can dial it up or down per request, pay for it per call, and apply it selectively to hard problems. OpenAI’s o1/o3, Google’s Gemini thinking, and Anthropic’s extended thinking are all commercial implementations of this idea.
💡 Solution
Concrete implementations: (1) Best-of-N sampling — generate N answers, pick the best via a verifier or reward model. (2) Extended thinking / reasoning tokens — give the model more steps to think before answering. (3) Tree/graph search — explore multiple reasoning paths, prune, return the best. (4) Iterative refinement — generate, critique, revise, repeat. Scale compute up for hard tasks, down for easy ones. See also: adaptive-compute-allocation, extended-thinking, best-of-n-sampling, tree-of-thoughts.
Real-world Use Case
- Hard tasks where a single generation is unreliable and retrying is cheap.
- Any domain where correctness is verifiable and worth paying extra compute for.
- Agentic loops where the quality of a planning step multiplies through subsequent actions.
Source
📌 TL;DR
Better answers don’t always require bigger models — sometimes just more inference compute.
Advantages
- Quality improvement without retraining — works on any existing model.
- Granular control — spend compute exactly where it’s needed.
- Parallelizable strategies (best-of-N) can run in the same wall-clock time.
Disadvantages
- Compute cost scales up fast — best-of-N at N=32 is 32x the token cost.
- Requires a verifier or reward model to select among candidates — adds system complexity.
- Latency increases are significant for sequential strategies (chain refinement, deep search).