Special Token Design | designpattern.fyi

designpattern.fyi

Software Design Catalog

Language Models

Back to Catalog

Language Models Tokenization

Special Token Design

Use typed special tokens exactly as trained — system, user, assistant, tool_call, tool_result — in the exact positions the model learned during fine-tuning.

Intent & Description

🎯 Intent

LLMs learn conversation structure from the special tokens present during fine-tuning. Misusing or omitting them at inference silently breaks instruction-following without any error signal.

📋 Context

A chat model fine-tuned with specific role delimiters (e.g. <|im_start|>system, <|im_start|>user) expects those exact tokens at inference. Call the model with raw text, wrong delimiters, or custom invented tokens and it can’t locate the system prompt boundary, user turn, or assistant response — instruction following degrades quietly.

💡 Solution

Study the model’s official chat template and reproduce it exactly using tokenizer.apply_chat_template() (HuggingFace) or the documented API format. Define explicit typed roles for every message boundary. For tool-calling models, use the documented tool_call and tool_result token types — not ad-hoc JSON crammed into user messages. Never invent special tokens at inference time that the model wasn’t trained on.

Real-world Use Case

Any model served via the chat completion API. Multi-turn conversation systems. Tool-calling and function-calling agents. Any deployment where system prompt injection and role boundary separation affect output quality.

📌 TL;DR

Use the model’s exact chat template — every special token in its documented position. Deviating silently degrades quality with no warning.

Advantages

Correct structure the model was trained to expect — maximizes instruction-following quality
Role separation makes multi-turn context unambiguous to the model
Documented chat templates are reproducible and model-version-stable

Disadvantages

Chat templates are model-specific and change between versions — must be tracked per deployment
Wrong chat template degrades output with zero error signal — it just looks worse
Custom fine-tuning with different special tokens requires updating all downstream inference code

36 of 58

© 2026 designpattern.fyi. Crafted with ❤️ for modern software engineers by OpenAGI