Special Token Design
Use typed special tokens exactly as trained — system, user, assistant, tool_call, tool_result — in the exact positions the model learned during fine-tuning.
Intent & Description
🎯 Intent
LLMs learn conversation structure from the special tokens present during fine-tuning. Misusing or omitting them at inference silently breaks instruction-following without any error signal.
📋 Context
A chat model fine-tuned with specific role delimiters (e.g. <|im_start|>system, <|im_start|>user) expects those exact tokens at inference. Call the model with raw text, wrong delimiters, or custom invented tokens and it can’t locate the system prompt boundary, user turn, or assistant response — instruction following degrades quietly.
💡 Solution
Study the model’s official chat template and reproduce it exactly using tokenizer.apply_chat_template() (HuggingFace) or the documented API format. Define explicit typed roles for every message boundary. For tool-calling models, use the documented tool_call and tool_result token types — not ad-hoc JSON crammed into user messages. Never invent special tokens at inference time that the model wasn’t trained on.
Real-world Use Case
📌 TL;DR
Use the model’s exact chat template — every special token in its documented position. Deviating silently degrades quality with no warning.
Advantages
- Correct structure the model was trained to expect — maximizes instruction-following quality
- Role separation makes multi-turn context unambiguous to the model
- Documented chat templates are reproducible and model-version-stable
Disadvantages
- Chat templates are model-specific and change between versions — must be tracked per deployment
- Wrong chat template degrades output with zero error signal — it just looks worse
- Custom fine-tuning with different special tokens requires updating all downstream inference code