Chain-of-Thought Prompting (CoT)
Prompt the model to show its reasoning steps before answering — dramatically improves accuracy on multi-step math, logic, and planning tasks without any fine-tuning.
Intent & Description
🎯 Intent
LLMs reach wrong answers on multi-step reasoning tasks when asked to answer directly. Prompting them to reason step-by-step before answering exposes intermediate reasoning to the model’s own attention — and accuracy improves sharply.
📋 Context
Direct prompting (question → answer) works well for factual recall and simple classification but fails on tasks requiring multi-step inference — math word problems, logical deduction, planning. The model’s attention on prior tokens in the generation includes its own reasoning chain when that chain is written out — functioning as working memory.
💡 Solution
Two modes: (1) Zero-shot CoT: Append "Let's think step by step." to the question. The model generates reasoning before the answer. Simple, no examples needed. (2) Few-shot CoT: Provide 3–8 examples of (question, step-by-step reasoning, answer) before the target question. The model infers the expected format and reasoning pattern from examples. For complex tasks, combine with self-consistency: generate K independent chains, take the majority-vote final answer. On GSM8K (grade school math), few-shot CoT with chain-of-thought reasoning improved PaLM accuracy from ~17% to ~58%.
Real-world Use Case
📌 TL;DR
Add “Let’s think step by step.” and watch accuracy jump on reasoning tasks. Inspect the chain — it tells you not just what the model answered but whether the path there was coherent.
Advantages
- Large accuracy gains on reasoning tasks — often 20–40 percentage point improvements on benchmarks
- Zero-shot CoT works with just one sentence added to any prompt — near-zero implementation cost
- Intermediate steps are inspectable — you can verify reasoning, not just the final answer
Disadvantages
- Increases token generation length — more tokens per response = higher latency and cost
- Chain quality degrades on smaller models (< 7B) — the reasoning steps themselves become incoherent
- Risk of confident-sounding wrong reasoning chains — plausible-looking steps leading to wrong answers