Prompt Caching
Order your prompt so the unchanging prefix gets cached by the provider — cutting per-call cost by 70–90% and TTFT roughly in half.
Intent & Description
🎯 Intent
Stop paying to re-process the same system prompt, tool definitions, and rules on every single call.
📋 Context
Your agent sends a large stable prefix (system prompt, tool definitions, charter, code-style rules) on every call, and only a small suffix varies (current user message, latest tool result). The provider’s API caches byte-identical prefixes.
💡 Solution
Put all stable content at the top of the prompt. Put variable content at the bottom. Mark the cache breakpoint at the boundary. Audit prompt construction to ensure nothing accidentally mutates the prefix — timestamps, UUIDs, and dynamically reordered tool definitions are the classic footguns.
Real-world Use Case
- The same long prefix (system prompt, tools, charter) goes out on every call.
- The provider exposes a prompt cache keyed on byte-stable prefixes.
- Variable content can be cleanly placed at the end of the prompt.
Source
📌 TL;DR
Stable stuff first, variable stuff last, cache breakpoint in between. 70–90% cost reduction. Just don’t let timestamps leak into your prefix.
Advantages
- 70–90% input-cost reduction on long-running agents.
- TTFT roughly halves for the cached portion.
Disadvantages
- Cache misses are silent and expensive — you won’t know without monitoring.
- Prompt assembly code must be disciplined; any prefix mutation invalidates the cache.
- Common footguns: tool-definition reordering, timestamps leaking into the cached prefix, provider-specific breakpoint limits (Anthropic: max 4 breakpoints, 1024-token minimum).