Sandbox Escape Monitoring
Instrument the sandbox boundary — log every out-of-scope syscall, unauthorized network egress, and unexpected filesystem write — and alert or kill on threshold breaches.
Intent & Description
🎯 Intent
Treat sandbox boundary violations as telemetry — alert on syscalls, network egress, or filesystem writes outside expected scope.
📋 Context
An agent that executes generated code or manipulates files runs inside an isolation boundary (container, microVM, syscall-filtered sandbox). The boundary confines what the agent can read, write, and reach over the network — but real-world sandboxes have known escape vectors and zero-day vulnerabilities. Isolation is necessary but not sufficient.
💡 Solution
Instrument the sandbox: log every syscall outside the allowed set, every network egress not on the allowlist, and every filesystem write outside the working directory. Stream to safety telemetry. Alert on threshold breaches. Pair with a kill-switch for automatic halt on confirmed escape.
Real-world Use Case
- The agent executes code or operates a filesystem inside a sandbox.
- Sandbox boundaries can be instrumented to log syscalls, egress, and filesystem writes.
- A safety telemetry pipeline and kill-switch already exist or are being built.
Source
📌 TL;DR
Treat every sandbox boundary violation as a telemetry event — stream out-of-scope syscalls, egress, and writes to safety monitoring and kill automatically on confirmed escapes.
Advantages
- Detects both escape attempts and successful escapes before they cause downstream damage.
- Provides a forensic trail for incident investigation.
Disadvantages
- High telemetry volume — requires efficient streaming and storage.
- Alert fatigue if thresholds are mis-tuned; too tight = noise, too loose = blind spots.