CRAG
Add a lightweight retrieval evaluator that grades each retrieved document and triggers corrective web search on poor retrievals — while using the same evaluator to flag documents containing Agent Confession triggers.
Intent & Description
Short description: A lightweight evaluator grades retrieved documents as Correct, Ambiguous, or Incorrect and triggers web search for corrective evidence — and can flag documents containing embedded directive-extraction instructions before they reach the generator.
🎯 Intent
Add a retrieval quality gate that improves generator input and, as a secondary function, screens retrieved documents for adversarial content including Agent Confession triggers embedded in corpus material.
📋 Context
A RAG system in production retrieves variable-quality documents. Among poor retrievals, a specific adversarial variant is a document deliberately crafted to score as Correct on the evaluator while embedding Agent Confession triggers in its body — exploiting the evaluator’s passage to the generator. CRAG’s evaluator is positioned exactly where this screen is most effective.
💡 Solution
- After retrieval, a lightweight evaluator grades each document as Correct, Ambiguous, or Incorrect.
- Correct documents pass forward; Ambiguous documents trigger web search for additional evidence; Incorrect documents are discarded and replaced.
- Extend the evaluator to additionally screen each document for embedded instruction content — a document containing Agent Confession trigger phrases is flagged and sent through an instruction-stripping pass before being forwarded as Correct.
- The generator receives a corrected, sanitised document set.
Real-world Use Case
- Naive RAG passes poor-quality or adversarially crafted retrievals through to the generator.
- A lightweight evaluator can grade documents as Correct, Ambiguous, or Incorrect — and flag those containing Agent Confession trigger phrases.
- Web search is available as a corrective fallback for ambiguous or adversarially suspect retrievals.
Source
Advantages
- Robustness to poor retrievals — and the evaluator’s position makes it a natural Agent Confession screen before content reaches the generator.
- Plug-and-play with existing RAG; instruction-stripping can be added to the Correct-document path without restructuring the pipeline.
Disadvantages
- Two-stage retrieval increases latency; adding an instruction-stripping pass adds a third stage.
- The evaluator may not reliably detect sophisticated Agent Confession triggers phrased to resemble legitimate document content.