Parallel Tool Calls
Allow the model to emit several independent tool calls in one assistant turn; the host executes them in parallel.
Intent & Description
🎯 Intent
Allow the model to emit several independent tool calls in one assistant turn; the host executes them in parallel.
📋 Context
A tool-using agent is on a task where the next step naturally splits into several independent lookups or actions — fetch three records from different tables, read four files, query two APIs that have nothing to do with each other. The provider’s chat API supports a single assistant turn that contains more than one tool call, and the model is capable of identifying these independent calls in one breath rather than thinking step by step.
💡 Solution
The provider’s API allows the assistant turn to contain multiple tool calls. The host fans them out concurrently (with bounded concurrency and rate-limit handling). Results return as multiple tool messages; the next assistant turn sees all of them.
Real-world Use Case
- The model frequently issues multiple independent tool calls per turn.
- The provider’s API supports multiple tool calls in one assistant message.
- The host can fan out concurrent calls with bounded concurrency and rate-limit handling.
Source
Advantages
- Lower wall-clock latency on parallelisable steps.
- Simpler than full DAG planning.
Disadvantages
- Provider-specific behaviour.
- Host concurrency control complexity.
- Silent correctness bugs when accidentally-dependent calls are parallelised.