Sleep-Time Compute | designpattern.fyi

Back to Catalog

Advantages

Test-time latency drops dramatically on cache hits — the answer is already computed.
Cost shifts from peak (test-time) to trough (idle) capacity pricing.
Distilled summaries also speed up cold queries by serving as compact retrieval targets.
Speculative coverage improves over time as the prediction model learns from misses.

Disadvantages

Offline compute is real cost — predictions that never get asked are wasted spend.
Stale pre-answers can mislead if invalidation lags corpus changes.
Privacy implication: pre-answering means the system holds and reasons over user data during idle periods.
Quality regression if speculative pre-answers are lower-effort than live inference and the agent doesn’t detect the gap.