Typed Refusal Codes
Define a single source of truth for machine-readable refusal codes across all guard surfaces, so refusals can be triaged mechanically rather than b...
Intent & Description
🎯 Intent
Define a single source of truth for machine-readable refusal codes across all guard surfaces, so refusals can be triaged mechanically rather than by string-grepping ad-hoc human-readable messages.
📋 Context
A mature agent stack accumulates many guard surfaces: a tool-loop guard, a skill-scanner that refuses risky imports, a post-compaction guard that rejects suspicious context restorations, an RCE backstop, an input/output guardrail. Each was added at a different time and emits its own refusal string in a different shape. Downstream observability — logs, audits, dashboards, on-call triage — has to grep through human-readable strings to count and classify refusals, and small wording changes silently break the dashboards.
💡 Solution
Maintain a single module that exports: a ReasonCode enum (e.g. POLICY_VIOLATION, RATE_LIMIT, UNVERIFIED_TOOL, RCE_RISK, LOOP_DETECTED, INTEGRITY_FAILURE, CONTEXT_INJECTION, …); a format_refusal(code, detail) helper returning ‘REFUSED: CODE: detail’; a parse_refusal(string) helper that returns (code, detail) or None; and a KNOWN_CODES constant for consumers to validate against. Every guard surface in the system uses format_refusal exclusively. Legacy substrings (‘cannot comply’, ‘blocked by policy’, etc.) are recognised by parse_refusal as code aliases so old logs keep parsing. Unknown codes return None from the parser rather than throwing. Downstream tooling depends only on the parser, never on raw strings.
Real-world Use Case
- The stack has three or more guard surfaces that each emit refusals.
- Downstream observability depends on counting or alerting on refusal categories.
- Legacy refusal phrasings already exist and must keep parsing.
Source
Advantages
- Refusal triage becomes mechanical: count by code, group by surface, alert by category.
- New guards inherit the audit story for free.
- Legacy substrings remain parseable, so existing dashboards keep working.
Disadvantages
- Centralisation is upfront work that pays back only after several guard surfaces exist.
- The enum becomes a contract; renaming a code is a breaking change for consumers.
- Detail strings remain human-authored; useful detail is still author-discipline-dependent.