Exception Handling and Recovery
Catch and react to predictable failure modes (tool errors, rate limits, validation failures) with structured recovery paths.
Intent & Description
🎯 Intent
Catch and react to predictable failure modes (tool errors, rate limits, validation failures) with structured recovery paths.
📋 Context
A team runs a production agent that calls many tools in a loop: search APIs, internal databases, third-party services, model endpoints. In real traffic those tools fail in predictable, repeating ways — the API is briefly down, the caller hit a rate limit, the response came back malformed, the credential was rejected, the request timed out. Each of those failure modes wants a different response from the agent.
💡 Solution
Catalogue failure modes. For each, define: detect (typed error), respond (retry / fall back / surface to user / replan), and log. The agent receives a structured error message and can react with a typed branch in its loop.
Real-world Use Case
- Tool errors, rate limits, or validation failures occur often enough that random retries waste effort.
- Failure modes can be catalogued with typed errors and structured recovery responses.
- The agent loop can branch on typed error messages.
Source
Advantages
- Failure modes become first-class.
- Reliability under partial failures rises.
Disadvantages
- Exception-handling code is its own surface to maintain.
- Hidden retries can mask deeper issues.