Interruptible Agent Execution
Treat pause, resume, and cancel as a first-class control surface on every long-running agent so users can halt expensive or off-track trajectories ...
Intent & Description
🎯 Intent
Treat pause, resume, and cancel as a first-class control surface on every long-running agent so users can halt expensive or off-track trajectories mid-task while state is preserved for resumption.
📋 Context
An agent runs for minutes, hours, or longer on a single user task — a deep-research loop, a code-agent session, an autonomous browser flow. The user is watching it work and forms a judgment mid-run: it has gone off-track, it is burning tokens unnecessarily, or the task is no longer wanted. The user expects to stop it like any other long-running application — pause and inspect, cancel cleanly, or resume after a check.
💡 Solution
Build the runtime so each step boundary is a snapshot point: state is durable across pause/resume. Pause stops further model and tool calls without killing the process. Resume rehydrates from the snapshot. Cancel runs compensating actions on in-flight side effects (mark drafts as discarded, release locks, end provider sessions) before tearing down. Expose all three as visible UX, not hidden APIs. Distinct from a kill-switch, which is an operator-level emergency halt.
Real-world Use Case
- Agent runs are long enough that users will form mid-run judgments.
- In-flight side effects can be compensated cleanly.
- State is small enough to snapshot at step boundaries without prohibitive cost.
Source
Advantages
- User trust survives long-running runs because the user retains control.
- Pause-and-inspect becomes a debugging affordance during development.
- Cancel with compensating actions limits blast radius of mistakes.
Disadvantages
- Implementing snapshot at every step boundary is invasive across the runtime.
- In-flight tool calls without idempotency hooks make pause and cancel unsafe.
- Resume from a stale snapshot can produce a Frankenstein run if the external world has moved on.