RL-Trained Conductor Orchestrator | designpattern.fyi

Back to Catalog

Advantages

Routing improves from experience instead of hand-editing rules on each model release.
Cheap meta-model on the hot path — frontier models are only called as workers when selected.
Recursive self-dispatch handles decomposable subtasks without a separate planner agent.
Worker pool churn is absorbed by retraining the conductor, not rewriting routing logic.

Disadvantages

Requires a reward signal and an RL training pipeline — most teams don’t have this in-house.
Conductor policy can be opaque; a learned routing tree is harder to audit than a written one.
Recursive self-dispatch needs strict depth and budget caps or it can fan out aggressively.
Worker drift (vendor updates a model) silently changes the policy’s effective action semantics.