Latent-Space Reasoning
Reason in the model's embedding space, not in token space.
Intent & Description
🎯 Intent
Decouple the reasoning process from the language generation process — reasoning in high-dimensional embedding space is faster and potentially richer than forcing every intermediate thought through the token bottleneck.
📋 Context
Standard CoT forces every reasoning step to be a discrete token sequence. This is expensive (tokens = cost), lossy (nuance gets flattened into words), and slow. Latent-space reasoning is an emerging research direction (e.g. Coconut, “Chain of Continuous Thought”) that processes reasoning steps as continuous vectors.
💡 Solution
Primarily a research/fine-tuning concern today — not something you implement via prompting. If using models trained with continuous thought (e.g. Coconut-style), pass reasoning states as embeddings between forward passes rather than decoding to tokens. For most practitioners, this is a “watch this space” pattern — the practical version is using extended thinking or scratchpad and compressing the trace. See also: extended-thinking, scratchpad, chain-of-thought.
Real-world Use Case
- Cutting-edge research pipelines where token-level reasoning overhead is a bottleneck.
- Fine-tuning scenarios where you control the model’s training and inference loop.
- Long-horizon reasoning tasks where token-space traces fill context windows.
Source
📌 TL;DR
Reasoning in embedding space is faster and richer — but it’s a research frontier, not a prod pattern yet.
Advantages
- Dramatically faster reasoning — no token decoding overhead for intermediate steps.
- Can represent richer intermediate states than natural language tokens allow.
- Reduces context window pressure from verbose reasoning traces.
Disadvantages
- Not available in standard API models today — requires custom training.
- Intermediate states are not human-readable — auditability goes to zero.
- Still largely experimental; production readiness is uncertain.