Augmented LLM
The foundational agent building block: an LLM wired to retrieval, tools, and memory — where the model decides when to use each.
Intent & Description
🎯 Intent
Build a reusable agent unit that every higher-level workflow can compose without reinventing the basics.
📋 Context
You’re building a support assistant, coding agent, or workflow runner. Every team that builds agents ends up wiring the same three capabilities: retrieval, tool calls, and memory. The question is whether you do it ad hoc every time or build a consistent block once.
💡 Solution
Wire the model with three model-driven capabilities: (1) retrieval queries the model issues against external corpora; (2) tool calls the model emits and whose results stream back; (3) memory the model reads from and writes to across turns. The model — not surrounding code — decides which to invoke at each step. Chains, routers, orchestrators, and multi-agent loops all compose instances of this block.
Real-world Use Case
- You need a consistent building block for any agent system.
- The model should choose when to retrieve, call tools, or use memory — not hard-coded logic.
- Higher-level workflows need a uniform unit to compose.
Source
📌 TL;DR
Augmented LLM = model + retrieval + tools + memory, where the model picks what to use. Build this block once and compose it everywhere.
Advantages
- One indivisible building block — higher-level patterns compose it without re-implementing basics.
- Model-driven augmentation adapts to each request; no brittle if-else routing code.
- Provider-agnostic — swap the underlying model without touching the augmentation surface.
Disadvantages
- Easy to underspecify when each augmentation should fire; the model may retrieve when it should tool-call.
- Cost compounds when every block calls all three augmentations on every request.
- Debugging touches three subsystems at once; you need observability across all paths.