Continuous Batching | designpattern.fyi

Back to Catalog

Advantages

Near-continuous GPU utilization — no idle time waiting for slow sequences to finish
Dramatically higher request throughput vs. static batching at the same hardware cost
Reduced tail latency for short requests that would otherwise wait behind long ones

Disadvantages

More complex scheduling logic — requires iteration-level batch management
Memory management complexity increases with dynamic batch composition (PagedAttention addresses this)
Requires careful integration with KV cache management to avoid memory fragmentation