Scaffold Ablation on Model Upgrade

Agentic AI Governance & Observability

Scaffold Ablation on Model Upgrade

On each model upgrade, treat every harness component as an encoded assumption about a past model weakness — ablate the ones the new model no longer needs, gated by evals.

Scaffold Ablation on Model Upgrade is the discipline of actively removing harness components that were built for weaker models: each component carries the assumption it encodes, each assumption gets stress-tested against the new model by temporarily removing the component and running the eval suite — if it holds, the assumption has expired and the component comes out.

Intent & Description

🎯 Intent

On each model upgrade, treat every harness component as an encoded assumption about a model weakness — ablate the components the new model no longer needs, gated by evals.

📋 Context

An agent harness accretes over several model generations: retry wrappers, decomposition scaffolds, format-coercion steps, guardrails, planning constructs. Each was added to compensate for something a past model couldn’t do reliably. A stronger model arrives, and the harness is carried over wholesale because it “works.” The result: scaffolding that was designed to patch weaknesses is now constraining strengths.

💡 Solution

Make each harness component carry the assumption it encodes (“the model cannot keep a long plan straight,” “the model will not emit valid JSON”). On model upgrade, walk the components and stress-test each assumption against the new model: temporarily remove the component and run the eval suite. If the eval holds, the assumption has expired and the component comes out; if it regresses, the assumption survives and the component stays. The eval suite is the gate; the anti-pattern is carrying everything over by default.

Real-world Use Case

A harness has accreted scaffolding across several model generations.
A model upgrade is being adopted and the team owns an eval suite to gate changes.
There is evidence or suspicion that carried-over scaffolding is suppressing the new model’s capability.

Source

View Original Source →

📌 TL;DR

When you upgrade the model, audit the scaffolding too — remove every component that compensated for a weakness the new model doesn’t have, gated by evals. Stale harness suppresses capability.

Advantages

Harness complexity tracks the current model’s real weaknesses instead of accumulating across generations.
Capability suppression from scaffolding built for weaker models is removed, not inherited.
Each removal is evidence-backed — the review is auditable, not a matter of taste.

Disadvantages

Ablating a component whose assumption hasn’t fully expired causes regression if the eval missed the edge case.
The review is only as trustworthy as the eval suite gating it.
Per-release review is recurring work that a carry-everything-over approach avoids.

77 of 329