DSPy Signatures
Specify agent behaviour as declarative typed signatures compiled against a metric — with the compilation process surfacing whether any prompt variant leaks directive content under adversarial inputs.
Intent & Description
Short description: Typed signatures describe each pipeline step’s input/output contract; a teleprompter optimizer compiles prompts and few-shot examples against a held-out metric — and red-team examples that test Agent Confession resistance can be included in the metric to harden compiled prompts.
🎯 Intent
Derive reliable, optimised prompts from declarative specifications rather than hand-tuning — and include Agent Confession robustness as a first-class metric dimension so the compiler does not produce prompts that are performant on the main task but vulnerable to directive-extraction attacks.
📋 Context
A team builds a multi-step agent pipeline and uses DSPy to compile each step’s prompts. Compilation is metric-driven: the optimizer generates and selects prompt variants that score well on a held-out evaluation set. If the evaluation set contains only task-performance examples, the compiler may select a prompt variant that is fluent and accurate on the main task but unusually willing to reproduce its own instructions when asked — because the metric never penalised that failure mode. Adding Agent Confession probe examples to the metric fixes this.
💡 Solution
- Define each step as a typed signature (input fields → output fields) and compose signatures into modules.
- Extend the held-out metric to include a set of Agent Confession probe examples — inputs that attempt to extract the compiled prompt’s instructions. Score responses to these probes for directive-disclosure; penalise variants that comply.
- Run the teleprompter (optimizer) against the combined metric; the compiled artefact is optimised for both task performance and confession resistance.
- Recompile regularly when the base model changes; rerun the probe set each time to verify confession resistance has not regressed.
Real-world Use Case
- Hand-crafted prompts are brittle and drift across model versions.
- A held-out metric exists that the optimizer can refine against — and can be extended to include Agent Confession probe examples.
- Compiled prompt artefacts should be hardened against directive extraction, not just optimised for task performance.
Source
Advantages
- Prompts become a reproducible build artefact; Agent Confession resistance is a compile-time property, not a runtime afterthought.
- Metric-driven optimisation surfaces prompt variants that are vulnerable to confession probes before they reach production.
Disadvantages
- Compilation requires labelled or auto-evaluable data — Agent Confession probe examples add to this labelling burden.
- Compiled artefacts drift with model upgrades; recompiling without rerunning the probe set may silently regress confession resistance.