Planner-Generator-Evaluator Harness
Decompose a long-running coding or creative job into three role-isolated agents — a Planner that emits a structured feature list, a Generator that builds one chunk per fresh context, and an Evaluator that grades the artefact against a fixed rubric without seeing the Generator's reasoning trace.
Intent & Description
🎯 Intent
Decompose a long-running coding or creative job into three role-isolated agents — a Planner that emits a structured feature list, a Generator that builds one chunk per fresh context, and an Evaluator that grades the artefact against a fixed rubric without seeing the Generator’s reasoning trace.
📋 Context
A team runs a coding-agent harness on multi-day creative work — building a new feature across a large application, conducting a large refactor, drafting a long design document. The job is too big to fit into a single model context window, so it must be split across many runs. There is a clear external artefact that can be evaluated on its own merits without inspecting how it was produced.
💡 Solution
- The Planner runs once (or rarely) and emits a structured feature-list artefact: ordered chunks, acceptance criteria, dependencies. - The Generator is invoked per-chunk in a fresh context containing only the feature list, current artefact state, and the chunk to build; it produces a new artefact revision and exits. - The Evaluator is invoked in its own fresh context with only the artefact and fixed rubric; it returns pass/fail plus structured findings, never seeing the Generator’s chain of thought. - A small driver loop routes between the three: failed evaluation re-invokes the Generator with the findings as input.
Real-world Use Case
- A single agent run cannot fit the job into one context window.
- There is a clear external artefact that can be evaluated without inspecting how it was produced.
- A stable rubric exists or can be authored.
Source
Advantages
- Each role’’s context stays small and bounded.
- Evaluator isolation makes scores harder to game from inside the Generator.
- Fresh-context generation per chunk avoids long-trace attention rot.
- Plans are durable artefacts that survive crashes and resumption.
Disadvantages
- Three-agent orchestration adds significant harness complexity over single-agent loops.
- Inter-role hand-offs through files add latency.
- A weak or mis-specified rubric makes the Evaluator useless or actively harmful.
- Planner errors propagate through the whole run because the Generator trusts the plan.