Tool Explosion
Exposing every available tool to the agent on every request — and watching function-calling accuracy collapse past ~20 tools.
Intent & Description
🎯 Intent
Defaulting to “expose all tools” because registration is cheap — then discovering that selection accuracy drops measurably as the tool list grows.
📋 Context
MCP servers, plugin ecosystems, and tool registries make it trivial to expose dozens or hundreds of tools. Teams expose everything “so the agent can reach for anything.” Past ~20 tools, function-calling accuracy measurably degrades — and the token cost of carrying large tool definitions in every prompt compounds the problem.
💡 Solution
Use a tool-loadout: curate the relevant subset per task type. Cap exposed tools at a tested threshold. Measure function-calling accuracy as a release gate.
Real-world Use Case
- Never use this; past about 20 tools, function-calling accuracy drops sharply.
- Use tool-loadout to select per-task subsets and cap exposed tools at a tested threshold.
- Measure function-calling accuracy as a release gate.
Source
📌 TL;DR
Cap exposed tools per request to a tested threshold — quality drops measurably past ~20 tools.
Disadvantages
- Selection accuracy degrades — the model picks the wrong tool or hallucinates one
- Token cost rises from large tool definitions carried in every prompt
- Cache misses on every tool list change, adding latency