September 19, 2025
5 min read

Customizing Chat Templates in LLMs

Introduction

Large language models (LLMs) generate text by predicting what comes next, but they don’t inherently understand conversational roles or separate inputs from outputs. To support chat-like interactions, chat templates are used to organize the dialogue into a specific pattern the model can interpret effectively.

Chat templates serve as predefined structures that guide the flow of conversation, ensuring that LLMs can effectively interpret and respond to user inputs. Without these templates, LLMs struggle to maintain context, identify roles, or execute complex tasks.

Each model typically expects a certain template format to perform optimally. While built-in templates usually cover most use cases, there are situations where customizing the template becomes necessary. This could happen if you’re working with a model that doesn’t have an existing template, or if you want finer control over how information is presented to the model beyond what simple system instructions allow.

To empower advanced users who want to override the model’s default chat template, FriendliAI now supports custom chat templates that can be set per model or even per adapter (coming soon!). This flexibility lets you tailor the conversational structure exactly how you want it, improving the model’s understanding and responses across diverse use cases. We made it easy to override the model’s default chat template right within the Dedicated Endpoint creation page on Friendli Suite.

The Necessity of Chat Templates

Chat templates can play a vital role in various ways.

1. Maintaining Consistent Conversation Structure

In multi-turn interactions, it's crucial to maintain a consistent structure to ensure that the model can follow the conversation's progression. Chat templates provide a standardized format that helps in organizing messages, making it easier for the model to process and respond appropriately.

2. Ensuring Proper Role Identification

Clearly defining roles such as "user," "assistant," and "system" helps the model understand the context and intent behind each message. This role-based structuring allows the model to differentiate between user inputs and system instructions, leading to more accurate responses.

3. Managing Context Across Multiple Turns

LLMs often require context from previous interactions to generate coherent responses. Chat templates help in structuring this context, enabling the model to access relevant information from earlier in the conversation, thereby improving the quality of its responses.

4. Supporting Tool Calling

Modern LLMs are capable of performing more complex tasks thanks to tool usage. Chat templates facilitate this ability by providing a clear structure that the model can follow to execute these tasks effectively.

Popular Chat Template Formats

LLMs utilize a variety of chat template formats. Understanding these formats is essential for ensuring compatibility.

1. ChatML (e.g., Qwen3)

<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Hi!<|im_end|>
<|im_start|>assistant
Hi, what can I help you with?<|im_end|>

2. Alpaca

### Instruction:
Hey!

### Response:
Hey, how are you?

3. Mistral

<s>[INST] You are a helpful assistant. [/INST]Hi! How can I help you?</s>[INST] Hi! [/INST]

4. LLaMA 3

<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are a helpful assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>Hey, what's up?<|eot_id|><|start_header_id|>assistant<|end_header_id|>Hey! What's up?<|eot_id|>

5. OpenAI Harmony (e.g., gpt-oss)

<|start|>system<|message|>You are a helpful assistant.<|end|><|start|>user<|message|>Hello, how are you?<|end|><|start|>assistant<|channel|>final<|message|>I'm doing great. How can I help you today?<|end|><|start|>user<|message|>Can you tell me a joke?<|end|><|start|>assistant

6. DeepSeek R1

You are a helpful assistant.<｜User｜>Hello, how are you?<｜Assistant｜>I'm doing great. How can I help you today?<｜end▁of▁sentence｜><｜User｜>Can you tell me a joke?<｜Assistant｜>

Each format has its specific requirements and understanding these is crucial for ensuring that the model processes the input correctly. While it’s best to use the chat template that comes with the model, you might need to override its default behavior.

Common Use Cases for Custom Chat Templates

Custom chat templates can be especially useful in the following scenarios:

Lack of Tool Call Support: Some models don’t support tool calls natively. With a custom chat template, you can work around this limitation and enable tool call functionality.
Missing Templates: In some cases, a model may be published without a chat template due to oversight. You can manually add one to enable chat-style interactions.
Legacy Models: Older models often lack default chat templates. By providing a well-structured custom template, you can significantly improve their performance in conversational contexts.
Non-Standard Template Formats: Certain models use unconventional chat formats that may not be fully compatible with your inference engine. These may require additional customization to function properly.

How to Customize Chat Templates on FriendliAI

Here’s how you can easily override the model’s base chat template:

Go to the endpoint creation page.

Figure 1: Creating a Dedicated Endpoint.

Click the “+ Custom Chat Template” tab.

Figure 2: Configuring custom chat templates.

Copy-paste or upload your Jinja chat templates.

Deploy.

This allows you to experiment with various templates to see which format best suits your application’s needs.

Writing Chat Templates

Writing chat templates involves structuring the conversation in a way that aligns with the model's expected format. For instance, using the appropriate tags and ensuring that messages are correctly ordered can significantly impact the model's performance.

Libraries like Hugging Face's transformers provide tools to assist in applying chat templates. For example, the apply_chat_template function can format messages to match the required structure for a specific model. This automation helps in reducing errors and streamlining the process of preparing inputs for LLMs.

Best Practices

To effectively utilize Jinja chat templates, consider the following best practices:

Use Default: If possible, always use the default chat template provided by the model.
Consistent Formatting: Always use the same template format throughout your application to avoid confusion and errors.
Robust Validation & Error Handling: Implement proper error handling for tool calls and multimodal inputs to prevent unexpected behaviors. Jinja supports conditional statements and functions, notably raise_exception(message).
Trim whitespace: Jinja template produces whitespace by default, which can be problematic in LLM. It’s recommended to explicitly trim out all the whitespace if possible.

Challenges and Considerations

While chat templates are beneficial, they also present certain challenges:

Model Compatibility: Not all models support all template formats, and using an incompatible format can lead to poor performance or errors.
Security Concerns: Template structures may be exploited by malicious users to bypass safety mechanisms via vulnerabilities such as format mismatch and message overflow (See Also: https://arxiv.org/abs/2406.12935).
Complexity: Implementing advanced features like tool usage and multimodal inputs requires careful structuring to ensure that the model can handle these tasks effectively.

Addressing these challenges requires ongoing research and development to create more robust and flexible chat template systems.

Conclusion

Chat templates are a key component in enabling smooth and effective interactions with large language models. By defining a structured format, they help models interpret and manage multi-turn conversations more accurately. When implementing chat templates, it's important to factor in model compatibility, potential security risks, and the added complexity of advanced features.

FriendliAI’s new chat template feature makes it easy to customize these structures for each model, giving you the flexibility to optimize performance across different use cases.

Ready to get started? Jump right into building with chat templates on Friendli Suite.

Written by

FriendliAI Tech & Research

General FAQ

What is FriendliAI?

FriendliAI is a GPU-inference platform that lets you deploy, scale, and monitor large language and multimodal models in production, without owning or managing GPU infrastructure. We offer three things for your AI models: Unmatched speed, cost efficiency, and operational simplicity. Find out which product is the best fit for you in here.

How does FriendliAI help my business?

Our Friendli Inference allows you to squeeze more tokens-per-second out of every GPU. Because you need fewer GPUs to serve the same load, the true metric—tokens per dollar—comes out higher even if the hourly GPU rate looks similar on paper. View pricing

Which models and modalities are supported?

Over 380,000 text, vision, audio, and multi-modal models are deployable out of the box. You can also upload custom models or LoRA adapters. Explore models

Can I deploy models from Hugging Face directly?

Yes. A one-click deploy by selecting “Friendli Endpoints” on the Hugging Face Hub will take you to our model deployment page. The page provides an easy-to-use interface for setting up Friendli Dedicated Endpoints, a managed service for generative AI inference. Learn more about our Hugging Face partnership

Still have questions?

If you want a customized solution for that key issue that is slowing your growth, contact@friendli.ai or click Talk to an engineer — our engineers (not a bot) will reply within one business day.