Multi-Model Routing
Send each request to the cheapest model that can handle it well.
Intent & Description
🎯 Intent
Send each request to the cheapest model that can handle it well.
📋 Context
A team is building a production agent and has access to several language models from one or more providers — typically a small cheap model, a mid-tier model, and a frontier model whose per-token price is an order of magnitude higher. The traffic mix is realistic: a lot of the requests are simple extractions, classifications, or rephrasings, while a smaller share genuinely needs the frontier model’s depth. The team has to decide which model handles each kind of request.
💡 Solution
Combine routing (classify the request) with a per-class model preference. Routing and filter extraction go to the cheap model; the screen-aware dialog or final answer goes to the strong model. Optionally cascade: try cheap, fall back to strong if confidence is low.
Real-world Use Case
- Cost and quality goals diverge across request types.
- A classifier can route requests to a cheap or strong model with acceptable accuracy.
- A cascade with low-confidence fallback to the strong model is feasible.
Source
Advantages
- Bill drops 5-10x without quality loss when class boundaries match cost boundaries.
- Dev/test runs naturally on cheap models.
Disadvantages
- Two-model debug surface.
- Vendor lock-in when models diverge in tool calling.