Back to Catalog
Agentic AI
Planning & Control Flow
Language Agent Tree Search
Lift the agent loop into a search tree with a learned value function and backtracking.
Intent & Description
🎯 Intent
Lift the agent loop into a search tree with a learned value function and backtracking.
📋 Context
A team gives an agent a problem where several reasoning paths are plausible at the start — a coding bug with multiple possible root causes, a puzzle with several candidate frames, an investigation that could go in three directions. The first plausible path is often not the best one, and committing to it produces confidently wrong answers when it dead-ends. Some signal (test suite, verifier, heuristic scorer) can rate a partial trajectory.
💡 Solution
- Apply MCTS to the agent loop: each node is a partial trajectory. - Expansion samples next thoughts/actions from the current node. - Evaluation scores the node via a learned or heuristic value function. - Backpropagation updates value estimates up the tree. - Selection chooses the next node to expand by UCT. - The agent can backtrack from a failing branch instead of committing to it.
Real-world Use Case
- Single-chain agent loops commit too early on ambiguous problems.
- A learned or heuristic value function can score partial trajectories.
- Backtracking from failing branches is worth the search overhead.
Source
Advantages
- Higher answer quality on hard and ambiguous tasks.
- Explicit exploration/exploitation trade-off via UCT.
Disadvantages
- Token cost can be 5–10x ReAct.
- The value function is hard to train without adequate supervision signals.