Lethal Trifecta Threat Model
Block prompt-injection-driven exfiltration by ensuring no single agent execution path holds all three of: access to private data, exposure to untru...
Intent & Description
🎯 Intent
Block prompt-injection-driven exfiltration by ensuring no single agent execution path holds all three of: access to private data, exposure to untrusted content, and an outbound communication channel.
📋 Context
A team builds a tool-using agent that combines three capabilities in the same execution: it reads data the operator wants to keep private (tokens, customer records, internal files), it ingests content from sources the operator does not control (emails, fetched web pages, third-party API responses, MCP servers from unknown providers), and it can call tools that transmit information outside the trust boundary (public HTTP requests, image-URL renders, link previews, chat webhooks, even error reports). This combination is extremely common — email assistants, browsing agents, coding agents with model-context-protocol servers, and any large language model that can both query internal systems and reach the public internet.
💡 Solution
Treat the three capabilities — private-data read, untrusted-content ingest, and outbound communication — as a tagged capability set on every tool and data source. For each agent execution path, enforce at orchestration time that at least one of the three is missing. Concrete moves: split the agent into two runs (one that reads private data, one that reads untrusted content), strip outbound network for the run that touches both, or sanitise untrusted content into typed fields before it reaches private-data context. The check is performed by the host, not by guardrail prompts.
Real-world Use Case
- The agent processes content the operator does not control.
- The same agent has access to data or credentials the operator wants to keep private.
- The tool catalogue includes any tool that can reach a destination the operator does not control.
Source
Advantages
- Eliminates an entire class of exfiltration attacks by construction, not by classifier accuracy.
- Forces explicit capability tagging — surfaces tools that combine too much authority.
- Composable with other safety patterns (dual-LLM, egress lockdown, sandbox isolation).
Disadvantages
- Restricts powerful single-agent designs that read everything and act anywhere.
- Requires disciplined capability tagging across the tool catalogue; missing tags create silent gaps.
- Does not address injection by other paths (poisoned tool output, supply-chain prompts, model weights).