KV Cache | designpattern.fyi

Back to Catalog

Advantages

Reduces per-token generation compute from O(N) to O(1) — makes long-context generation practical
Enables efficient multi-turn conversation without reprocessing the full context each turn
Foundation for prefix caching optimizations (cache system prompts across requests)

Disadvantages

KV cache memory grows linearly with sequence length and batch size — major VRAM pressure at scale
Long sequences or large batches can cause OOM if KV cache isn’t managed carefully
Cache invalidation across requests requires careful memory management (PagedAttention addresses this)