Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Key Features

Ministral 3 8B consists of two main architectural components:

  • 8.4B Language Model
  • 0.4B Vision Encoder

The Ministral 3 8B Base model offers the following capabilities:

  • Vision: Enables the model to analyze images and provide insights based on visual content, in addition to text.
  • Multilingual: Supports dozens of languages, including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic.
  • Edge-Optimized: Delivers best-in-class performance at a small scale, deployable anywhere.
  • Apache 2.0 License: Open-source license allowing usage and modification for both commercial and non-commercial purposes.
  • Large Context Window: Supports a 256k context window.

Use Cases

Perfect for balanced performance in local or embedded systems, combining versatility with efficiency.

  • Chat interfaces in constrained environments
  • Local daily-driver AI assistant
  • Image/document description and understanding
  • Translation and content generation
  • Specialized agentic use cases
  • Fine-tuning and specialization
  • And more...

Bringing advanced AI capabilities to resource-constrained environments.

Ministral 3 Family

Model NameTypePrecisionLink
Ministral 3 3B Base 2512Base pre-trainedBF16Hugging Face
Ministral 3 3B Instruct 2512Instruct post-trainedFP8Hugging Face
Ministral 3 3B Reasoning 2512Reasoning capableBF16Hugging Face
Ministral 3 8B Base 2512Base pre-trainedBF16Hugging Face
Ministral 3 8B Instruct 2512Instruct post-trainedFP8Hugging Face
Ministral 3 8B Reasoning 2512Reasoning capableBF16Hugging Face
Ministral 3 14B Base 2512Base pre-trainedBF16Hugging Face
Ministral 3 14B Instruct 2512Instruct post-trainedFP8Hugging Face
Ministral 3 14B Reasoning 2512Reasoning capableBF16Hugging Face

Other formats available here.

Benchmark Results

We compare Ministral 3 to similar sized models.

Reasoning

ModelAIME25AIME24GPQA DiamondLiveCodeBench
Ministral 3 14B0.8500.8980.7120.646
Qwen3-14B (Thinking)0.7370.8370.6630.593
Ministral 3 8B0.7870.8600.6680.616
Qwen3-VL-8B-Thinking0.7980.8600.6710.580
Ministral 3 3B0.7210.7750.5340.548
Qwen3-VL-4B-Thinking0.6970.7290.6010.513

Instruct

ModelArena HardWildBenchMATH Maj@1MM MTBench
Ministral 3 14B0.55168.50.9048.49
Qwen3 14B (Non-Thinking)0.42765.10.870NOT MULTIMODAL
Gemma3-12B-Instruct0.43663.20.8546.70
Ministral 3 8B0.50966.80.8768.08
Qwen3-VL-8B-Instruct0.52866.30.9468.00
Ministral 3 3B0.30556.80.8307.83
Qwen3-VL-4B-Instruct0.43856.80.9008.01
Qwen3-VL-2B-Instruct0.16342.20.7866.36
Gemma3-4B-Instruct0.31849.10.7595.23

Base

ModelMultilingual MMLUMATH CoT 2-ShotAGIEval 5-shotMMLU Redux 5-shotMMLU 5-shotTriviaQA 5-shot
Ministral 3 14B0.7420.6760.6480.8200.7940.749
Qwen3 14B Base0.7540.6200.6610.8370.8040.703
Gemma 3 12B Base0.6900.4870.5870.7660.7450.788
Ministral 3 8B0.7060.6260.5910.7930.7610.681
Qwen 3 8B Base0.7000.5760.5960.7940.7600.639
Ministral 3 3B0.6520.6010.5110.7350.7070.592
Qwen 3 4B Base0.6770.4050.5700.7590.7130.530
Gemma 3 4B Base0.5160.2940.4300.6260.5890.640

Usage

The model can be used with the following frameworks;

vLLM

We recommend using this model with vLLM.

Installation

Make sure to install vllm >= 1.12.0:

markdown

pip install vllm --upgrade

Doing so should automatically install mistral_common >= 1.8.6.

To check:

markdown

python -c "import mistral_common; print(mistral_common.__version__)"

You can also make use of a ready-to-go docker image or on the docker hub.

Serve

Due to their size and the BF16 format of their weights Ministral-3-3B-Base-2512 and Ministral-3-8B-Base-2512 can run on a single 1xH200 GPU.

A simple launch command is:

bash

vllm serve mistralai/Ministral-3-8B-Instruct-2512 \
--tokenizer_mode mistral --config_format mistral --load_format mistral

Additional flags:

  • You can set --max-model-len to preserve memory. By default it is set to 262144 which is quite large but not necessary for most scenarios.
  • You can set --max-num-batched-tokens to balance throughput and latency, higher means higher throughput but higher latency.

Usage of the model

Here we asumme that the model mistralai/Ministral-3-8B-Base-2512 is served and you can ping it to the domain localhost with the port 8000 which is the default for vLLM.

Quick test with the base model.

python

from openai import OpenAI
# Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"
TEMP = 0.15
MAX_TOK = 256
client = OpenAI(
api_key=openai_api_key,
base_url=openai_api_base,
)
models = client.models.list()
model = models.data[0].id
response = client.completions.create(
model=model,
prompt="What is the best thing in the universe ?",
temperature=TEMP,
max_tokens=MAX_TOK,
)
print(response.choices[0].text)

Transformers

You can also use Ministral 3 8B Base 2512 with Transformers ! Make sure to install Transformers from its first v5 release candidate or from "main":

markdown

pip install transformers==5.0.0rc0

To make the best use of our model with Transformers make sure to have installed mistral-common >= 1.8.6 to use our tokenizer.

bash

pip install mistral-common --upgrade

Then load our tokenizer along with the model and generate:

python

from transformers import Mistral3ForConditionalGeneration, MistralCommonBackend, FineGrainedFP8Config
model_id = "mistralai/Ministral-3-8B-Base-2512"
model = Mistral3ForConditionalGeneration.from_pretrained(
model_id,
device_map="auto",
)
tokenizer = MistralCommonBackend.from_pretrained(model_id)
input_ids = tokenizer.encode("Once about a time, France was a", return_tensors="pt")
input_ids = input_ids.to("cuda")
output = model.generate(
input_ids,
max_new_tokens=30,
)[0]
decoded_output = tokenizer.decode(output[len(input_ids[0]):])
print(decoded_output)

License

This model is licensed under the Apache 2.0 License.

You must not use this model in a manner that infringes, misappropriates, or otherwise violates any third party’s rights, including intellectual property rights.

Model provider

mistralai

mistralai

Model tree

Base

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today