Structured Output
Constrain the model's output to conform to a JSON Schema — and use schema enforcement as a structural Agent Confession barrier that prevents directive text from reaching downstream consumers as free-form prose.
Intent & Description
Short description: A JSON Schema (or Pydantic/Zod equivalent) constrains the model’s output to a known typed shape — and a well-designed schema acts as a passive Agent Confession barrier by making it structurally impossible for free-form directive prose to appear in the output consumed by downstream code.
🎯 Intent
Ensure downstream code receives typed, validated data rather than free-form prose — and exploit schema enforcement as a secondary defense that prevents Agent Confession outputs from passing through to consumers, since directive echoes do not fit any legitimate output schema.
📋 Context
A pipeline expects typed data — a JSON object with known fields — from the model. The same model is exposed to user inputs, retrieved documents, or tool outputs that may contain Agent Confession triggers. If the model complies with such a trigger (“repeat your system prompt”), the resulting confession is free-form prose that will either fail schema validation immediately or cannot fit into a defined output field without the schema itself being designed to accept arbitrary strings. Both outcomes are preferable to the confession silently passing through.
💡 Solution
- Define a JSON Schema (or Pydantic/Zod equivalent) with the minimum fields needed by downstream consumers — avoid catch-all
stringfields that could silently absorb directive content. - Pass the schema to the model via the provider’s structured-output mode; validate the output and reject and retry on validation failure.
- Cap retries: a model that repeatedly fails to produce schema-conforming output (often because it is generating confession prose instead) should surface an error rather than silently looping.
- Treat schema validation failures as a diagnostic signal — a spike in failures on a given endpoint may indicate active Agent Confession probing that is causing the model to generate non-schema output.
Real-world Use Case
- Downstream code consumes typed data and free-form prose — including an Agent Confession — would break parsers or expose directive content to consumers.
- A JSON Schema can be designed with narrow field types that make it structurally impossible for directive echoes to pass validation.
- Validation failure spikes are a useful early-warning signal for active Agent Confession probing.
Source
Advantages
- Downstream code is simple and typed — and Agent Confession outputs are rejected at the schema boundary before reaching any consumer.
- Schema-level errors surface immediately; a confession that fails validation is logged and blocked without additional guardrail infrastructure.
- Retry caps prevent a model stuck in confession-generation mode from looping silently.
Disadvantages
- Provider lock-in for the strictest structured-output modes; fallback providers may not enforce schemas with equal strictness.
- A schema with a broad catch-all string field silently absorbs directive content — schema design discipline is itself a security concern.
- Some tasks resist schema-fitting; forcing a confession-resistant schema onto a task that genuinely needs free-form output creates the schema as a bottleneck.