Partial-Output Salvage
Stream every model token to an atomic partial file so mid-stream crashes leave a consistent salvage — then surface the recovery status to the model on the next prompt.
Intent & Description
🎯 Intent
Without a partial-output mechanism, a SIGKILL mid-inference loses all tokens that were streaming — minutes of model time and real context gone with no trace.
📋 Context
The agent runs on hardware that occasionally crashes: OOM killer, watchdog timer, deploy restart mid-stream. Per-call inference is long enough that losing a half-finished stream is meaningful. The existing resumption pattern only restores durably written state — not the tokens that were streaming when the kill signal landed.
💡 Solution
Mechanical finite-state machine. On stream start: open partial.tmp, write a start marker with thought-id, timestamp, model ID. On each chunk: append to tmp, periodically os.rename(tmp, partial) for atomicity. On normal stream end: rename to canonical thought path, delete partial. On startup: scan for orphan partial.* files, finalize each with a typed RecoveryStatus enum (RECOVERED_FROM_PARTIAL for hard kill, TIMEOUT_PARTIAL for watchdog timeout). Include last_partial_recovery:
Real-world Use Case
- The runtime can SIGKILL the agent mid-stream and that loses meaningful work.
- Inference is long enough per call that a partial stream has real salvage value.
- The filesystem supports atomic rename in the working directory.
Source
Advantages
- Mid-stream tokens are not lost on hard crash — minutes of inference are recoverable
- Typed recovery marker preserves debuggability — the salvage isn’t hidden from the model
- Atomic rename keeps the partial file readable and consistent at every moment
Disadvantages
- Rename overhead per N chunks is non-zero; chunk size needs tuning
- Partials add filesystem clutter if not periodically cleaned up
- Recovery status surfaced in the prompt costs tokens every time it fires