Cross-Encoder Reranking
After cheap bi-encoder or BM25 retrieval, rescore top-N candidates with a cross-encoder that jointly attends over (query, candidate).
Intent & Description
🎯 Intent
After cheap bi-encoder or BM25 retrieval, rescore top-N candidates with a cross-encoder that jointly attends over (query, candidate).
📋 Context
A team is using a two-stage retrieval pipeline. The first stage is a fast bi-encoder that embeds the query and each document independently and compares their vectors; an approximate nearest-neighbour index returns a top-k candidate set from a large corpus. Because the encoder sees query and document separately, it cannot model fine-grained interactions between them, and because the index is tuned for recall, the top-k list mixes truly relevant candidates with topically similar but unhelpful ones.
💡 Solution
Two-stage retrieval. Stage 1: cheap retrieve (BM25, dense, hybrid) returns top-N. Stage 2: cross-encoder scores each (query, candidate) jointly. Return top-K « N to the generator.
Real-world Use Case
- Initial retrieval returns a noisy top-100 and accuracy of top-5 matters.
- Inference budget can afford a cross-encoder pass on each candidate.
- Downstream LLM context can only fit a small number of chunks.
Source
Advantages
- Largest single quality win on top of contextual embeddings (Anthropic ablation).
- Reranker can be swapped without re-indexing.
Disadvantages
- Latency adds one call per candidate.
- Reranker calibration on out-of-domain content.