Agent Output Alert Fatigue
Your agent raises so many low-quality findings that reviewers stop reading — and the human oversight you built in quietly disappears.
Intent & Description
🎯 Intent
High-volume, low-signal output trains humans to ignore the agent — including when it’’s actually right.
📋 Context
An agent deployed as an assistive reviewer (code review, anomaly detection, QA) errs toward “surface everything that might matter.” Most of it doesn’t. Reviewers adapt fast: they skim, they mute, they auto-approve. The oversight control silently disappears while still appearing on the org chart.
💡 Solution
Gate output on a confidence threshold so the agent raises fewer, higher-precision findings. Track usefulness-per-finding, not findings-per-run. Monitor reviewer engagement (resolve rate, mute rate, time-to-skim) as a first-class health signal. If comment count rises while usefulness stays flat — that’’s an alarm, not progress. See cross-encoder reranking, verifier stages, confidence-gated output.
Real-world Use Case
- A code-review agent posting 8+ comments per PR at ~35% usefulness.
- Reviewer mute or auto-approve rates rising over time.
- The agent is measured by output volume, not by whether findings get acted on.
Source
📌 TL;DR
Optimize for usefulness-per-finding, not findings-per-run — or reviewers will mute the agent and your oversight disappears.
Disadvantages
- The human-in-the-loop safeguard vanishes in practice while still existing on paper
- The agent’’s correct findings get buried along with the noise — real issues reach production
- Reviewer disengagement is sticky; hard to rebuild once it sets in