Degenerate-Output Detection
Detect when the agent is about to emit a near-duplicate of its own recent output and either drop, replace, or escalate to a stronger model rather t...
Intent & Description
🎯 Intent
Detect when the agent is about to emit a near-duplicate of its own recent output and either drop, replace, or escalate to a stronger model rather than ship the loop.
📋 Context
A team runs an agent on a smaller or locally-hosted model that has a habit of falling into shallow filler loops under context pressure — repeating the same greeting, asking the same clarifying question, or returning the same generic prompt back to the user across multiple turns. This happens in user-facing chat replies and in unprompted background ticks for long-running agents. Each model generation is independent, so the model has no built-in awareness that it just said the same thing two turns ago.
💡 Solution
Maintain a small ring buffer (e.g. last 8 outgoing messages). Before publishing a new reply, normalize (lowercase, strip punctuation) and compare: exact normalized match → duplicate; high Jaccard token overlap (≥0.7) on short replies → near-duplicate. On hit: replace the body with a transparent marker (‘I caught myself looping — switching to
Real-world Use Case
- The agent produces outputs in a loop where consecutive replies can be compared.
- Near-duplicate outputs are observable failure mode (model wedged, decoding loop, prompt collapse).
- Cost of detection (similarity check) is small relative to cost of shipping the duplicate.
Source
Advantages
- Visible loops never reach the user.
- Auto-recovery via provider escalation rather than human intervention.
- Self-correction signal to the model in the conversation history.
Disadvantages
- False positives on legitimately repeated short answers (‘yes’, ’thanks’).
- Threshold tuning is per-domain.
- Escalation has cost; budget for repeated triggers.