Hypothesis Tracking
Persist the agent's provisional answers as a typed ledger with confidence, status, and a next-test condition — so guesses survive sessions and stay distinguishable from open questions.
Intent & Description
🎯 Intent
Without a typed store, provisional answers live only in the current prompt window and dissolve at turn end. This makes them first-class, revisable, and falsifiable.
📋 Context
A long-running agent maintains an open-question ledger and observes patterns of evidence that point toward provisional answers. When it commits enough weight to a guess to act on it, that guess stops being a question. Without a dedicated store, it silently rejoins the prompt blur.
💡 Solution
Maintain a hypothesis store keyed by short ID. Each record carries: one-line summary, numeric confidence (0..1), status (active/confirmed/disconfirmed/superseded/abandoned), a next-test sentence (what observation would move confidence), and an evidence list with sources. When the agent commits a guess, write it at status:active. As evidence arrives, append and adjust confidence. If next-test fires, transition to confirmed or disconfirmed. If a better hypothesis subsumes it, mark it superseded. Render active records into the agent’s daily working context.
Real-world Use Case
- The agent runs over weeks and accumulates partial evidence about persistent questions.
- Provisional answers need to be defensible and revisable across sessions, not just remembered.
- An existing open-question store already separates pulls of curiosity from active commitments.
Source
Advantages
- Provisional answers survive across sessions with a continuity of confidence
- Disconfirmed hypotheses leave a paper trail rather than being silently re-spawned
- Next-test fields keep hypotheses falsifiable rather than free-floating beliefs
Disadvantages
- Two-store discipline (questions vs. hypotheses) is harder than one undifferentiated note pile
- Confidence numbers are seductive — they’re the agent’s temperature, not the world’s truth
- Hypothesis stores grow if abandonment isn’t periodically swept