Agentic Context Engineering Playbook
Evolve the agent's long-lived playbook through small, auditable delta updates (add/edit/remove items) — never full rewrites that collapse hard-won specifics.
Intent & Description
🎯 Intent
Let the agent accumulate tactics across runs without context collapse erasing what it learned.
📋 Context
Your agent has a long-lived system prompt or memory file that accumulates tactics, heuristics, and worked examples across weeks of runs. Every time you ask the agent to reflect and update it in place, another batch of specific tactics gets paraphrased into oblivion.
💡 Solution
Store the playbook as an ordered list of items with stable IDs. Each item has a short tactic, an optional worked example, and provenance. After each run: a Generator reads the trajectory and proposes new candidate items. A Reflector reviews proposed and existing items against outcomes, scoring what to keep, edit, or drop. A Curator applies the resulting delta set — strictly add/edit/remove operations against item IDs. Whole-playbook rewrites are forbidden.
Real-world Use Case
- The agent has a long-lived prompt or memory that accumulates tactics across many runs.
- Full-prompt rewrites have measurably degraded specificity (context collapse is real).
- Outcomes are observable per run and can score playbook items.
Source
📌 TL;DR
Playbook as a delta-updated item list. Generator proposes, Reflector scores, Curator applies. No rewrites allowed. Specific tactics survive; context collapse doesn’t.
Advantages
- Specific tactics survive across many runs instead of being paraphrased away.
- Item-level provenance makes the playbook auditable and rollback-able.
- Separating Generator, Reflector, and Curator prevents generation from pre-empting evaluation.
- Small deltas are cheap; full rewrites are expensive — cost per improvement drops.
Disadvantages
- Three-role loop is more machinery than a single reflection pass.
- Item IDs must be stable — adds storage and bookkeeping overhead.
- The Curator’s dedup logic can silently drop items it should have kept; needs its own audit.
- Playbook can still grow unbounded without a separate retention policy.