
Chat Templates can improve LM inferencing.
What are chat templates? They are like little “scripts” or blueprints that tell a language model how to handle a conversation. Think of them as a recipe or puppet-show script that formats all parts of a chat (system instructions, user messages, assistant replies) in a consistent way. How do they work? In Hugging Face’s Transformers library, chat templates are written in Jinja (a templating language). The tokenizer uses a template to combine the system prompt, user messages, and assistant prompts into one formatted string, which it then tokenizes for the model. This means the model always sees the conversation in the right format without us manually concatenating text. Key parts: Chat templates use placeholders and roles . For example, {{user}} might be replaced with the user’s message, and {{assistant}} with the model’s reply. Common roles include system (instructions), user , and assistant . The template defines how these pieces are ordered and separated. we worked with SmolLM3 (3B mo
Continue reading on Dev.to
Opens in a new tab



