ReST-EM | designpattern.fyi

Back to Catalog

Advantages

Generates training data without human annotation — scales cheaply.
Each iteration genuinely improves the model’s reasoning floor.
Works well in domains with strong verifiers (code, math).

Disadvantages

Requires fine-tuning access — not applicable to API-only deployments.
Verifier quality gates everything; a weak verifier trains on wrong answers.
Can reinforce confident-but-wrong reasoning patterns if the verifier has blind spots.