Tool-Result Eviction
Once a tool's raw output is consumed, replace it in context with a one-line marker — reclaim tokens without losing the fact that the call happened.
Intent & Description
🎯 Intent
Stop paying context cost for tool outputs the model already extracted and moved on from.
📋 Context
Your agent calls search, file reads, and API queries, each returning bulky JSON or file contents. The model reads the payload, extracts what it needs, acts — and then that raw payload sits in context for the rest of the session, consuming tokens and attention for no reason.
💡 Solution
After a tool result is consumed, replace the raw payload in context with a short marker: 'read config.yaml: 3 services defined', 'searched docs: no rate-limit setting found'. Keep the marker so the agent doesn’t re-issue the call. Offload the full payload to external storage if it might be needed verbatim again. Apply eviction lazily (oldest-consumed first) or eagerly (immediately after extraction) based on window pressure.
Real-world Use Case
- Tool observations are large relative to the context window.
- Most results are consumed once and not needed verbatim again.
- Window pressure or per-call cost is a binding constraint.
- You can write a faithful one-line marker for each consumed result.
Source
📌 TL;DR
Tool result read? Compress it to a one-liner, keep the payload in external storage. Context stays lean. Don’t evict until you’re sure the model is done with it.
Advantages
- Window pressure from bulky observations drops sharply.
- Cost and latency per call fall — dead payloads stop being re-sent.
- The trace of what was called and concluded survives in the marker.
- Signal-to-noise in the window improves.
Disadvantages
- Evicting a result that’s still needed forces a redundant re-call.
- A marker that loses a key value can mislead later reasoning.
- Deciding when an observation is truly ‘consumed’ is error-prone.
- Without offload, an evicted payload needed verbatim later is gone.