Salience Attention Mechanism
Score every candidate memory item with a weighted salience function and attend to only the top-k per tick — bounded attention cost regardless of how large the memory store grows.
alpha * novelty + beta * goal_relevance + gamma * recency + delta * prediction_error - epsilon * fatigue — the top-k scoring items enter the working set for the next tick, and the fatigue term breaks rumination loops by penalizing over-attended items.Intent & Description
🎯 Intent
Score every candidate memory item with a weighted salience function so each tick attends to a small, relevant top-k subset rather than re-reading all memory.
📋 Context
A long-running agent’s memory store has grown past what fits in a single call’s context. The agent has accumulated thoughts, summaries, insights, and observations over hours or days, and on every tick only a small, currently relevant slice should drive the next step.
💡 Solution
Score each candidate memory item m with a weighted sum: alpha * novelty(m) + beta * goal_relevance(m) + gamma * recency(m) + delta * prediction_error(m) - epsilon * fatigue(m). Pick the top-k into the working set for the next tick. Persist the weights in a tunable config so a reflection pass can adjust them. The fatigue term penalizes items that have already been attended to many times in the recent window, breaking rumination loops.
Real-world Use Case
- The persistent memory store is too large to read in full at every tick.
- Memory items have features (recency, importance, frequency, similarity) that can be combined into a salience score.
- The agent needs predictable, bounded per-tick read cost.
Source
📌 TL;DR
Score every memory item with a weighted salience function, attend to the top-k, and penalize over-attended items with a fatigue term — bounded attention cost, no rumination loops.
Advantages
- Bounded attention cost per tick regardless of memory store size.
- Salience scores are inspectable and tunable — operators can see what’s driving attention.
- Fatigue term breaks repetitive attention loops without manual intervention.
Disadvantages
- Weight tuning is empirical and per-deployment — no universal defaults.
- A bad scoring function can suppress genuinely relevant items by misevaluating any one dimension.
- Salience scoring is itself compute — it has to stay cheap enough to run every tick.