Episodic Summaries
Compress blocks of past episodes into compact summaries on a schedule — preserve the gist, shed the token cost, consult originals only on demand.
Intent & Description
🎯 Intent
Compress past episodes into summaries that preserve gist while shedding token cost.
📋 Context
A long-running agent has accumulated more conversation history, tool results, and intermediate reasoning than fits in the model’s context window. Replaying raw history on every turn is impossible at scale, and even when it fits it’s wasteful — most turns are not relevant to the next step.
💡 Solution
On a schedule (or at size thresholds), summarize blocks of recent thoughts and conversation into compact representations. Store summaries in a higher tier; archive originals. Reads consult summaries first, fall back to originals on demand.
Real-world Use Case
- Conversation or thought history grows without bound and needs compaction.
- Summaries can preserve gist while shedding token cost meaningfully.
- A tiered read strategy (summaries first, originals on demand) is feasible.
Source
📌 TL;DR
Summarize old episodes into compact tiers on a schedule — bounded context size, faster search, originals still available when you need the full picture.
Advantages
- Effective context size stays bounded despite unbounded history.
- Summaries are smaller, cheaper to embed, and faster to search than raw episodes.
Disadvantages
- Summary errors are sticky — the agent reasons over the summary, not the original, so mistakes compound.
- Compaction policy (what to summarize, when, how) is its own configuration and tuning burden.