MemGPT-Style Paging
Treat the context window as RAM and external storage as disk — the model issues tool calls to page memory in and out at its own discretion.
read_recall, write_archival, and search_archival tool calls — it decides what to page in and out, treating the window budget as a first-class constraint to manage.Intent & Description
🎯 Intent
Treat the LLM context window as RAM and external storage as disk, with the model issuing tool calls to page memory in and out.
📋 Context
A long-running agent’s conversation or document state grows past the model’s context window. The team needs to keep the agent useful over interactions spanning thousands of turns, or over documents larger than any provider window.
💡 Solution
Two memory tiers. Main context: system prompt, working set, recent messages. External context: recall (raw history) and archival (vector store). The model has tool calls for read_recall, write_archival, search_archival. Paging happens at the agent’s discretion — the model treats main context as RAM and external as disk.
Real-world Use Case
- Long-running agents need state that exceeds the model’s context window.
- The model can be trusted to manage memory via tool calls (read, write, search).
- External recall and archival storage tiers are available and queryable.
Source
📌 TL;DR
Give the model RAM/disk semantics — context window as RAM, external storage as disk, tool calls to page in and out — and let it manage its own memory budget.
Advantages
- Conversation continuity far beyond the context window limit.
- Inspectable memory tiers — archival is queryable independently for debugging.
Disadvantages
- Tool definitions themselves consume context budget — you pay for the RAM/disk metaphor.
- Page-fault tool calls add latency on every memory access that misses main context.