Reflexion
After each episode, the agent writes a verbal lesson from the failure. Future episodes retrieve relevant lessons and run smarter — improvement without touching weights.
Intent & Description
🎯 Intent
Let the agent get better at recurring mistakes across episodes without fine-tuning model weights.
📋 Context
Your agent solves many similar tasks over time — coding problems, research queries, workflow steps. Each task is a separate episode and the agent forgets everything. It keeps making the same errors. RL fine-tuning is too expensive to run every time a new failure mode shows up.
💡 Solution
After each episode, the agent reflects on success/failure and writes a verbal lesson. Lessons are stored in long-term memory keyed by task type. Future episodes retrieve relevant lessons and prepend them to context.
Real-world Use Case
- Stateless agents repeat the same errors across episodes.
- Linguistic lessons from past failures can be retrieved and prepended in future runs.
- Full RL fine-tuning is too expensive for the setting.
Source
📌 TL;DR
Agent fails → writes a lesson → future runs retrieve it. Better performance over time, no weight updates. Curate the lesson store or it rots.
Advantages
- Improvement without fine-tuning weights — lessons are cheap to generate and store.
- Lessons are human-readable and editable — you can curate the knowledge base.
Disadvantages
- Single-agent reflexion repeats blind spots because the same model writes and reads the lessons.
- Lesson stores grow; without curation they become noise that hurts more than it helps.