Key Features
Ministral 3 14B consists of two main architectural components:
- 13.5B Language Model
- 0.4B Vision Encoder
The Ministral 3 14B Base model offers the following capabilities:
- Vision: Enables the model to analyze images and provide insights based on visual content, in addition to text.
- Multilingual: Supports dozens of languages, including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic.
- Edge-Optimized: Delivers best-in-class performance at a small scale, deployable anywhere.
- Apache 2.0 License: Open-source license allowing usage and modification for both commercial and non-commercial purposes.
- Large Context Window: Supports a 256k context window.
Use Cases
Private AI deployments where advanced capabilities meet practical hardware constraints:
- Private/custom chat and AI assistant deployments in constrained environments
- Advanced local agentic use cases
- Fine-tuning and specialization
- And more...
Bringing advanced AI capabilities to most environments.
Ministral 3 Family
Table with columns: Model Name, Type, Precision, Link| Model Name | Type | Precision | Link |
|---|
| Ministral 3 3B Base 2512 | Base pre-trained | BF16 | Hugging Face |
| Ministral 3 3B Instruct 2512 | Instruct post-trained | FP8 | Hugging Face |
| Ministral 3 3B Reasoning 2512 | Reasoning capable | BF16 | |
Other formats available here.
Benchmark Results
We compare Ministral 3 to similar sized models.
Reasoning
Table with columns: Model, AIME25, AIME24, GPQA Diamond, LiveCodeBench| Model | AIME25 | AIME24 | GPQA Diamond | LiveCodeBench |
|---|
| Ministral 3 14B | 0.850 | 0.898 | 0.712 | 0.646 |
| Qwen3-14B (Thinking) | 0.737 | 0.837 | 0.663 | 0.593 |
| | | | |
|
Instruct
Table with columns: Model, Arena Hard, WildBench, MATH Maj@1, MM MTBench| Model | Arena Hard | WildBench | MATH Maj@1 | MM MTBench |
|---|
| Ministral 3 14B | 0.551 | 68.5 | 0.904 | 8.49 |
| Qwen3 14B (Non-Thinking) | 0.427 | 65.1 | 0.870 | NOT MULTIMODAL |
| Gemma3-12B-Instruct | 0.436 | 63.2 | 0.854 | 6.70 |
Base
Table with columns: Model, Multilingual MMLU, MATH CoT 2-Shot, AGIEval 5-shot, MMLU Redux 5-shot, MMLU 5-shot, TriviaQA 5-shot| Model | Multilingual MMLU | MATH CoT 2-Shot | AGIEval 5-shot | MMLU Redux 5-shot | MMLU 5-shot | TriviaQA 5-shot |
|---|
| Ministral 3 14B | 0.742 | 0.676 | 0.648 | 0.820 | 0.794 | 0.749 |
| Qwen3 14B Base | 0.754 | 0.620 | 0.661 | 0.837 | 0.804 |
Usage
The model can be used with the following frameworks;
vLLM
We recommend using this model with vLLM.
Installation
Make sure to install vllm >= 1.12.0:
pip install vllm --upgrade
Doing so should automatically install mistral_common >= 1.8.6.
To check:
python -c "import mistral_common; print(mistral_common.__version__)"
You can also make use of a ready-to-go docker image or on the docker hub.
Serve
To fully exploit the Ministral-3-14B-Base-2512 we recommed using 2xH200 GPUs for deployment due to its large context. However if you don't need a large context, you can fall back to a single GPU.
A simple launch command is:
vllm serve mistralai/Ministral-3-14B-Base-2512 --tensor-parallel-size 2 \
--tokenizer_mode mistral --config_format mistral --load_format mistral
Additional flags:
- You can set
--max-model-len to preserve memory. By default it is set to 262144 which is quite large but not necessary for most scenarios.
- You can set
--max-num-batched-tokens to balance throughput and latency, higher means higher throughput but higher latency.
Usage of the model
Here we assume that the model mistralai/Ministral-3-14B-Base-2512 is served and you can ping it to the domain localhost with the port 8000 which is the default for vLLM.
Quick test with the base model.
from openai import OpenAI
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"
TEMP = 0.15
MAX_TOK = 256
client = OpenAI(
api_key=openai_api_key,
base_url=openai_api_base,
)
models = client.models.list()
model = models.data[0].id
response = client.completions.create(
model=model,
prompt="What is the best thing in the universe ?",
temperature=TEMP,
max_tokens=MAX_TOK,
)
print(response.choices[0].text)
You can also use Ministral 3 14B Base 2512 with Transformers !
Make sure to install Transformers from its first v5 release candidate or from "main":
pip install transformers==5.0.0rc0
To make the best use of our model with Transformers make sure to have installed mistral-common >= 1.8.6 to use our tokenizer.
pip install mistral-common --upgrade
Then load our tokenizer along with the model and generate:
from transformers import Mistral3ForConditionalGeneration, MistralCommonBackend, FineGrainedFP8Config
model_id = "mistralai/Ministral-3-14B-Base-2512"
model = Mistral3ForConditionalGeneration.from_pretrained(
model_id,
device_map="auto",
)
tokenizer = MistralCommonBackend.from_pretrained(model_id)
input_ids = tokenizer.encode("Once about a time, France was a", return_tensors="pt")
input_ids = input_ids.to("cuda")
output = model.generate(
input_ids,
max_new_tokens=30,
)[0]
decoded_output = tokenizer.decode(output[len(input_ids[0]):])
print(decoded_output)
License
This model is licensed under the Apache 2.0 License.
You must not use this model in a manner that infringes, misappropriates, or otherwise violates any third party’s rights, including intellectual property rights.