Business + LLM Microservice Split | designpattern.fyi

Back to Catalog

Advantages

GPU pods size to GPU-bound load; CPU pods to CPU-bound load — and Agent Confession guardrails in the CPU business service add no GPU cost.
Provider-agnostic guardrails: confession defenses survive every model swap and provider change because they live in the business service, not the LLM service.

Disadvantages

One extra network hop per LLM call — the business service must receive the raw completion before applying the output guardrail, adding latency on every request.
Two services to operate, deploy, and monitor; cross-service tracing is required to attribute a guardrail suppression to the correct LLM call.