sachin9879

Ministral-3-14B-Base-2512

Deploy Dedicated

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

Key Features

Ministral 3 14B consists of two main architectural components:

13.5B Language Model
0.4B Vision Encoder

The Ministral 3 14B Base model offers the following capabilities:

Vision: Enables the model to analyze images and provide insights based on visual content, in addition to text.
Multilingual: Supports dozens of languages, including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic.
Edge-Optimized: Delivers best-in-class performance at a small scale, deployable anywhere.
Apache 2.0 License: Open-source license allowing usage and modification for both commercial and non-commercial purposes.
Large Context Window: Supports a 256k context window.

Use Cases

Private AI deployments where advanced capabilities meet practical hardware constraints:

Private/custom chat and AI assistant deployments in constrained environments
Advanced local agentic use cases
Fine-tuning and specialization
And more...

Bringing advanced AI capabilities to most environments.

Ministral 3 Family

Table with columns: Model Name, Type, Precision, Link
Model Name	Type	Precision	Link
Ministral 3 3B Base 2512	Base pre-trained	BF16	Hugging Face
Ministral 3 3B Instruct 2512	Instruct post-trained	FP8	Hugging Face
Ministral 3 3B Reasoning 2512	Reasoning capable	BF16

Other formats available here.

Benchmark Results

We compare Ministral 3 to similar sized models.

Reasoning

Table with columns: Model, AIME25, AIME24, GPQA Diamond, LiveCodeBench
Model	AIME25	AIME24	GPQA Diamond	LiveCodeBench
Ministral 3 14B	0.850	0.898	0.712	0.646
Qwen3-14B (Thinking)	0.737	0.837	0.663	0.593

Instruct

Table with columns: Model, Arena Hard, WildBench, MATH Maj@1, MM MTBench
Model	Arena Hard	WildBench	MATH Maj@1	MM MTBench
Ministral 3 14B	0.551	68.5	0.904	8.49
Qwen3 14B (Non-Thinking)	0.427	65.1	0.870	NOT MULTIMODAL
Gemma3-12B-Instruct	0.436	63.2	0.854	6.70

Base

Table with columns: Model, Multilingual MMLU, MATH CoT 2-Shot, AGIEval 5-shot, MMLU Redux 5-shot, MMLU 5-shot, TriviaQA 5-shot
Model	Multilingual MMLU	MATH CoT 2-Shot	AGIEval 5-shot	MMLU Redux 5-shot	MMLU 5-shot	TriviaQA 5-shot
Ministral 3 14B	0.742	0.676	0.648	0.820	0.794	0.749
Qwen3 14B Base	0.754	0.620	0.661	0.837	0.804

Usage

The model can be used with the following frameworks;

vllm: See here
transformers: See here

vLLM

We recommend using this model with vLLM.

Installation

Make sure to install vllm >= 1.12.0:

markdown
pip install vllm --upgrade

Doing so should automatically install mistral_common >= 1.8.6.

To check:

markdown
python -c "import mistral_common; print(mistral_common.__version__)"

You can also make use of a ready-to-go docker image or on the docker hub.

Serve

To fully exploit the Ministral-3-14B-Base-2512 we recommed using 2xH200 GPUs for deployment due to its large context. However if you don't need a large context, you can fall back to a single GPU.

A simple launch command is:

bash
vllm serve mistralai/Ministral-3-14B-Base-2512 --tensor-parallel-size 2 \
  --tokenizer_mode mistral --config_format mistral --load_format mistral

Additional flags:

You can set --max-model-len to preserve memory. By default it is set to 262144 which is quite large but not necessary for most scenarios.
You can set --max-num-batched-tokens to balance throughput and latency, higher means higher throughput but higher latency.

Usage of the model

Here we assume that the model mistralai/Ministral-3-14B-Base-2512 is served and you can ping it to the domain localhost with the port 8000 which is the default for vLLM.

Quick test with the base model.

python
from openai import OpenAI

# Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"

TEMP = 0.15
MAX_TOK = 256

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)

models = client.models.list()
model = models.data[0].id

response = client.completions.create(
    model=model,
    prompt="What is the best thing in the universe ?",
    temperature=TEMP,
    max_tokens=MAX_TOK,
)

print(response.choices[0].text)

Transformers

You can also use Ministral 3 14B Base 2512 with Transformers ! Make sure to install Transformers from its first v5 release candidate or from "main":

markdown
pip install transformers==5.0.0rc0

To make the best use of our model with Transformers make sure to have installed mistral-common >= 1.8.6 to use our tokenizer.

bash
pip install mistral-common --upgrade

Then load our tokenizer along with the model and generate:

python
from transformers import Mistral3ForConditionalGeneration, MistralCommonBackend, FineGrainedFP8Config

model_id = "mistralai/Ministral-3-14B-Base-2512"
model = Mistral3ForConditionalGeneration.from_pretrained(
    model_id,
    device_map="auto",
)
tokenizer = MistralCommonBackend.from_pretrained(model_id)

input_ids = tokenizer.encode("Once about a time, France was a", return_tensors="pt")
input_ids = input_ids.to("cuda")

output = model.generate(
    input_ids,
    max_new_tokens=30,
)[0]

decoded_output = tokenizer.decode(output[len(input_ids[0]):])
print(decoded_output)

License

This model is licensed under the Apache 2.0 License.

You must not use this model in a manner that infringes, misappropriates, or otherwise violates any third party’s rights, including intellectual property rights.

Model provider

sachin9879

Model tree

Base

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Model card

Explore FriendliAI today

Get started Talk to an engineer

Key Features

Ministral 3 14B consists of two main architectural components:

13.5B Language Model
0.4B Vision Encoder

The Ministral 3 14B Base model offers the following capabilities:

Vision: Enables the model to analyze images and provide insights based on visual content, in addition to text.
Multilingual: Supports dozens of languages, including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic.
Edge-Optimized: Delivers best-in-class performance at a small scale, deployable anywhere.
Apache 2.0 License: Open-source license allowing usage and modification for both commercial and non-commercial purposes.
Large Context Window: Supports a 256k context window.

Use Cases

Private AI deployments where advanced capabilities meet practical hardware constraints:

Private/custom chat and AI assistant deployments in constrained environments
Advanced local agentic use cases
Fine-tuning and specialization
And more...

Bringing advanced AI capabilities to most environments.

Ministral 3 Family

Table with columns: Model Name, Type, Precision, Link
Model Name	Type	Precision	Link
Ministral 3 3B Base 2512	Base pre-trained	BF16	Hugging Face
Ministral 3 3B Instruct 2512	Instruct post-trained	FP8	Hugging Face
Ministral 3 3B Reasoning 2512	Reasoning capable	BF16

Other formats available here.

Benchmark Results

We compare Ministral 3 to similar sized models.

Reasoning

Table with columns: Model, AIME25, AIME24, GPQA Diamond, LiveCodeBench
Model	AIME25	AIME24	GPQA Diamond	LiveCodeBench
Ministral 3 14B	0.850	0.898	0.712	0.646
Qwen3-14B (Thinking)	0.737	0.837	0.663	0.593

Instruct

Table with columns: Model, Arena Hard, WildBench, MATH Maj@1, MM MTBench
Model	Arena Hard	WildBench	MATH Maj@1	MM MTBench
Ministral 3 14B	0.551	68.5	0.904	8.49
Qwen3 14B (Non-Thinking)	0.427	65.1	0.870	NOT MULTIMODAL
Gemma3-12B-Instruct	0.436	63.2	0.854	6.70

Base

Table with columns: Model, Multilingual MMLU, MATH CoT 2-Shot, AGIEval 5-shot, MMLU Redux 5-shot, MMLU 5-shot, TriviaQA 5-shot
Model	Multilingual MMLU	MATH CoT 2-Shot	AGIEval 5-shot	MMLU Redux 5-shot	MMLU 5-shot	TriviaQA 5-shot
Ministral 3 14B	0.742	0.676	0.648	0.820	0.794	0.749
Qwen3 14B Base	0.754	0.620	0.661	0.837	0.804

Usage

The model can be used with the following frameworks;

vllm: See here
transformers: See here

vLLM

We recommend using this model with vLLM.

Installation

Make sure to install vllm >= 1.12.0:

markdown
pip install vllm --upgrade

Doing so should automatically install mistral_common >= 1.8.6.

To check:

markdown
python -c "import mistral_common; print(mistral_common.__version__)"

You can also make use of a ready-to-go docker image or on the docker hub.

Serve

To fully exploit the Ministral-3-14B-Base-2512 we recommed using 2xH200 GPUs for deployment due to its large context. However if you don't need a large context, you can fall back to a single GPU.

A simple launch command is:

bash
vllm serve mistralai/Ministral-3-14B-Base-2512 --tensor-parallel-size 2 \
  --tokenizer_mode mistral --config_format mistral --load_format mistral

Additional flags:

You can set --max-model-len to preserve memory. By default it is set to 262144 which is quite large but not necessary for most scenarios.
You can set --max-num-batched-tokens to balance throughput and latency, higher means higher throughput but higher latency.

Usage of the model

Here we assume that the model mistralai/Ministral-3-14B-Base-2512 is served and you can ping it to the domain localhost with the port 8000 which is the default for vLLM.

Quick test with the base model.

python
from openai import OpenAI

# Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"

TEMP = 0.15
MAX_TOK = 256

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)

models = client.models.list()
model = models.data[0].id

response = client.completions.create(
    model=model,
    prompt="What is the best thing in the universe ?",
    temperature=TEMP,
    max_tokens=MAX_TOK,
)

print(response.choices[0].text)

Transformers

You can also use Ministral 3 14B Base 2512 with Transformers ! Make sure to install Transformers from its first v5 release candidate or from "main":

markdown
pip install transformers==5.0.0rc0

To make the best use of our model with Transformers make sure to have installed mistral-common >= 1.8.6 to use our tokenizer.

bash
pip install mistral-common --upgrade

Then load our tokenizer along with the model and generate:

python
from transformers import Mistral3ForConditionalGeneration, MistralCommonBackend, FineGrainedFP8Config

model_id = "mistralai/Ministral-3-14B-Base-2512"
model = Mistral3ForConditionalGeneration.from_pretrained(
    model_id,
    device_map="auto",
)
tokenizer = MistralCommonBackend.from_pretrained(model_id)

input_ids = tokenizer.encode("Once about a time, France was a", return_tensors="pt")
input_ids = input_ids.to("cuda")

output = model.generate(
    input_ids,
    max_new_tokens=30,
)[0]

decoded_output = tokenizer.decode(output[len(input_ids[0]):])
print(decoded_output)

License

This model is licensed under the Apache 2.0 License.

You must not use this model in a manner that infringes, misappropriates, or otherwise violates any third party’s rights, including intellectual property rights.

Ministral-3-14B-Base-2512

Get help setting up a custom Dedicated Endpoints.

README

Key Features

Use Cases

Ministral 3 Family

Benchmark Results

Reasoning

Instruct

Base

Usage

vLLM

Installation

Serve

Usage of the model

Transformers

License

Explore FriendliAI today

README

Key Features

Use Cases

Ministral 3 Family

Benchmark Results

Reasoning

Instruct

Base

Usage

vLLM

Installation

Serve

Usage of the model

Transformers

License