Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Qwen3-14B-Instruct Highlights
OpenPipe/Qwen3-14B-Instruct is a finetune friendly instruct variant of Qwen3-14B. Qwen3 release does not include a 14B Instruct (non-thinking) model, this fork introduces an updated chat template that makes Qwen3-14B non-thinking by default and be highly compatible with OpenPipe and other finetuning frameworks.
The default Qwen3 chat template does not render <think></think> tags on the previous assistant message, which can lead to inconsistencies between training and generation. This version resolves that issue by adding <think></think> tags to all assistant prompts and generation templates to ensure message format consistency during both training and inference.
The model retains the strong general capabilities of Qwen3-14B while providing a more finetuning friendly chat template.
Model Overview
Qwen3-14B has the following features:
- Type: Causal Language Models
- Training Stage: Pretraining & Post-training
- Number of Parameters: 14.8B
- Number of Paramaters (Non-Embedding): 13.2B
- Number of Layers: 40
- Number of Attention Heads (GQA): 40 for Q and 8 for KV
- Context Length: 32,768 natively and 131,072 tokens with YaRN.
For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our blog, GitHub, and Documentation.
Model provider
skilledu
Model tree
Base
Qwen/Qwen3-14B-Base
Fine-tuned
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information