Unified Voice Interface | designpattern.fyi

Back to Catalog

Advantages

Provider switch is configuration, not code — and capability flags ensure guardrails are not silently dropped when a new provider lacks a feature.
The uniform speak() interception point applies Agent Confession defenses consistently across all TTS providers, preventing audio exfiltration of directive content.

Disadvantages

Lowest-common-denominator pressure on the abstraction — provider-specific voices and effects need explicit capability flags or they are lost on swap.
Realtime STS bidirectional framing is hard to emulate when only TTS+STT are available; in STS mode, the guardrail must operate on audio tokens rather than text, which is significantly harder.