Back to Catalog
Language Models
general
LLM Router
Dynamically routes user queries to different specialized LLMs or static handlers based on complexity, intent, and cost.
Intent & Description
An LLM Router analyzes incoming requests (either via rules, classifier models, or a fast embedding check) and directs them to the most suitable engine. Easy questions go to small, cheap models; hard questions go to advanced frontier models.
Real-world Use Case
Optimizing LLM usage costs and latency in production applications.
Advantages
- Drastically lowers API token costs (often by 50-70%).
- Reduces latency by using faster, smaller models for simple queries.
- Improves reliability and specialization.
Disadvantages
- Routing step introduces minor latency overhead.
- Classifier errors might route complex tasks to weaker models.
Implementation Example
# Dynamic query routing based on query attributes
class LLMRouter:
def __init__(self, fast_cheap_model, advanced_heavy_model):
self.fast_model = fast_cheap_model
self.heavy_model = advanced_heavy_model
def route(self, query):
# Check query complexity or keywords
is_complex = len(query) > 50 or "code" in query.lower() or "explain" in query.lower()
if is_complex:
print("Routing to ADVANCED HEAVY model...")
return self.heavy_model.generate(query)
else:
print("Routing to FAST CHEAP model...")
return self.fast_model.generate(query)
class Model:
def __init__(self, name):
self.name = name
def generate(self, q):
return f"[{self.name}] Response to: {q}"
# Usage
router = LLMRouter(Model("GPT-4o-mini"), Model("GPT-4o"))
print(router.route("Hi there"))
print(router.route("Write a thread-safe Singleton in C++ and explain memory barriers"))