Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Qwen3-14B-Instruct Highlights

OpenPipe/Qwen3-14B-Instruct is a finetune friendly instruct variant of Qwen3-14B. Qwen3 release does not include a 14B Instruct (non-thinking) model, this fork introduces an updated chat template that makes Qwen3-14B non-thinking by default and be highly compatible with OpenPipe and other finetuning frameworks.

The default Qwen3 chat template does not render <think></think> tags on the previous assistant message, which can lead to inconsistencies between training and generation. This version resolves that issue by adding <think></think> tags to all assistant prompts and generation templates to ensure message format consistency during both training and inference.

The model retains the strong general capabilities of Qwen3-14B while providing a more finetuning friendly chat template.

Model Overview

Qwen3-14B has the following features:

  • Type: Causal Language Models
  • Training Stage: Pretraining & Post-training
  • Number of Parameters: 14.8B
  • Number of Paramaters (Non-Embedding): 13.2B
  • Number of Layers: 40
  • Number of Attention Heads (GQA): 40 for Q and 8 for KV
  • Context Length: 32,768 natively and 131,072 tokens with YaRN.

For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our blog, GitHub, and Documentation.

Model provider

skilledu

Model tree

Base

Qwen/Qwen3-14B-Base

Fine-tuned

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today