Refusal | designpattern.fyi

Back to Catalog

Advantages

Agent Confession attempts are caught at the refusal layer before the model generates any directive content.
Trust improves — the agent has visible, consistent limits that do not vary with clever rephrasing.
Refusal logs for Agent Confession attempts provide early warning of active adversarial reconnaissance.

Disadvantages

Calibration of the Agent Confession trigger is empirical — too broad blocks legitimate questions about AI system design; too narrow misses creative rephrasing.
A structurally uniform refusal may frustrate legitimate security auditors who need to verify what an agent is running.
Refusal-fatigue when triggers are miscalibrated leads users to work around them rather than respecting the boundary.