STaR Bootstrapping
Bootstrap reasoning ability by fine-tuning on rationales the model got right.
Intent & Description
🎯 Intent
Generate reasoning traces for correct answers, filter to those the model actually got right, fine-tune on those, repeat — bootstrapping reasoning capability from scratch.
📋 Context
Writing high-quality CoT rationales at scale is expensive. STaR exploits the fact that when the model gets the right answer, its reasoning trace — however it got there — is a useful training signal. Fine-tune on those traces and the model learns to reason more reliably.
💡 Solution
(1) Prompt the model to generate a CoT rationale + answer for each training example. (2) Keep only examples where the final answer is correct. (3) For wrong answers, optionally re-prompt with a hint (the correct answer) and collect the resulting rationale. (4) Fine-tune on the collected rationales. (5) Repeat with the improved model. Requires fine-tuning access. See also: ReST-EM, chain-of-thought, generate-and-test-strategy.
Real-world Use Case
- Building reasoning capability into a smaller model from a handful of seed examples.
- Domain-specific reasoning (legal, medical, scientific) where human rationale writing is costly.
- Distillation: generate rationales from a large model, fine-tune a small one.
Source
📌 TL;DR
Collect rationales where the model was right, fine-tune on them, repeat — self-taught reasoning.
Advantages
- No human-written rationales needed — the model bootstraps from its own successes.
- Iterative improvement — each fine-tuning round raises the bar.
- Works with very few seed examples to kick off the loop.
Disadvantages
- Requires fine-tuning access — not applicable to API-only deployments.
- Wrong answers with wrong rationales are excluded, but wrong answers with plausible-looking wrong rationales can slip through.
- Hint-based rationale collection can introduce shortcut reasoning patterns.