Adaptive Branching Tree Search
At each node of an inference-time search tree, use Thompson sampling to decide adaptively whether to deepen an existing answer or branch a fresh attempt, optionally choosing per-node which underlying LLM to invoke as a third search axis.
Intent & Description
🎯 Intent
At each node of an inference-time search tree, use Thompson sampling to decide adaptively whether to deepen an existing answer or branch a fresh attempt, optionally choosing per-node which underlying LLM to invoke as a third search axis.
📋 Context
A team is using a large language model to attack problems whose outputs can be scored — running code against tests, checking a math answer, or grading an abstract-reasoning puzzle. They have a fixed budget of model calls to spend at inference time and want to spend it better than a flat sampling pass would. Several models with different strengths may be available at once, and the controller can choose which to call at each step.
💡 Solution
- Each node in the search tree maintains posterior estimates over the value of its possible actions: refine the current candidate (deepen), generate a fresh sibling (branch), and — in the multi-LLM variant — which model to call. - At each step the controller draws a Thompson sample from the per-action posterior and picks the highest sampled value. - The resulting rollout’s score updates the posterior. - Over many rollouts the tree concentrates compute on the branches and models that are paying off. - The score function must be either verifiable (compiler, test, oracle) or a trusted evaluator. - The framework runs until a budget or success threshold is hit.
Real-world Use Case
- A reliable score function (verifier, tests, oracle) is available.
- The task benefits from a mix of refinement and fresh attempts.
- Multiple LLMs are available and their strengths differ across the input distribution.
Source
Advantages
- Adaptive width/depth balance outperforms fixed-shape search on hard problems.
- Per-node model choice exploits complementary strengths of multiple LLMs.
- Thompson sampling gives a principled exploration-exploitation trade-off.
- Compute concentrates on payoff-rich branches automatically.
Disadvantages
- Requires a usable score function; without one, the posteriors are noise.
- Bookkeeping is heavier than plain MCTS or best-of-N.
- Inference cost is still high; the pattern reduces waste but does not make search cheap.
- Multi-LLM variant adds operational complexity (different APIs, latencies, pricing).