XiaomiMiMo
MiMo-7B-SFT
Model provider
XiaomiMiMo
Model tree
Base
this model
Modalities
Input
-
Output
-
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information
GLM-5.2 is live. #1 throughput on OpenRouter, pay-per-token on FriendliAI. Try it today ➜
XiaomiMiMo
Model provider
XiaomiMiMo
Model tree
Base
this model
Modalities
Input
-
Output
-
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information
[2025.05.30] We scaled the SFT dataset from approximately 500K to 6M instances and continuously expanding the RL training window size from 32K to 48K, the performance of MiMo-7B-RL-0530 on AIME24 can be continuously improved and eventually surpass that of DeepSeek R1 (79.8).
Currently, most successful RL works, including open-source research, rely on relatively large base models, e.g., 32B models, particularly for enhancing code reasoning capabilities. Moreover, it was widely considered that achieving uniform and simultaneous improvements in both mathematical and code capabilities within a small model is challenging. Nonetheless, we believe that the effectiveness of the RL trained reasoning model relies on the inherent reasoning potential of the base model. To fully unlock the reasoning potential of language models, efforts must focus not only on post-training but also on pre-training strategies tailored to reasoning.
In this work, we present MiMo-7B, a series of models trained from scratch and born for reasoning tasks. Our RL experiments from MiMo-7B-Base show that our model possesses extraordinary reasoning potential, even surpassing much larger 32B models. Additionally, we perform RL training on a cold-started SFT model, resulting in MiMo-7B-RL, which demonstrates superior performance on both mathematics and code reasoning tasks, matching the performance of OpenAI o1-mini.
We open-source MiMo-7B series, including checkpoints of the base model, SFT model, RL model trained from base model, and RL model trained from the SFT model. We believe this report along with the models will provide valuable insights to develop powerful reasoning LLMs that benefit the larger community.
Pre-Training: Base Model Born for Reasoning
Post-Training Recipe: Pioneering Reasoning Model
RL Infrastructure
The MTP layers of MiMo-7B is tuned during pretraining and SFT and freezed during RL. With one MTP layer for speculative decoding, the acceptance rate is about 90%.
Models are available at https://huggingface.co/XiaomiMiMo and https://www.modelscope.cn/organization/XiaomiMiMo
| Model | Description | Download (HuggingFace) | Download (ModelScope) |
|---|---|---|---|
| MiMo-7B-Base | Base model with extraordinary reasoning potential | 🤗 XiaomiMiMo/MiMo-7B-Base | 🤖️ XiaomiMiMo/MiMo-7B-Base |
| MiMo-7B-RL-Zero | RL model trained from base model | 🤗 XiaomiMiMo/MiMo-7B-RL-Zero | 🤖️ XiaomiMiMo/MiMo-7B-RL-Zero |
| Benchmark | GPT-4o-0513 | Claude-3.5-Sonnet-1022 | OpenAI o1-mini | QwQ-32B-Preview | R1-Distill-Qwen-14B | R1-Distill-Qwen-7B | MiMo-7B-RL |
|---|---|---|---|---|---|---|---|
| General | |||||||
| GPQA Diamond(Pass@1) | 49.9 | 65.0 | 60.0 |
MiMo-7B series
| Benchmark | MiMo-7B-Base | MiMo-7B-RL-Zero | MiMo-7B-SFT | MiMo-7B-RL |
|---|---|---|---|---|
| Mathematics | ||||
| MATH500(Pass@1) | 37.4 | 93.6 | 93.0 | 95.8 |
| AIME 2024(Pass@1) | 32.9 | 56.4 | 58.7 | 68.2 |
[!IMPORTANT] The evaluations are conducted with
temperature=0.6.AIME24 and AIME25 are with averaged score of 32 repetitions. LiveCodeBench v5 (20240801-20250201), LiveCodeBench v6 (20250201-20250501), GPQA-Diamond and IF-Eval are with averaged score of 8 repetitions. MATH500 and SuperGPQA are with a single run.
Thanks to the MiMo model support and MTP from the SGLang team, we supported MiMo in SGLang mainstream.
Example Script
bash
# Install the latest SGlang from main branchpython3 -m uv pip install "sglang[all] @ git+https://github.com/sgl-project/sglang.git/@main#egg=sglang&subdirectory=python"# Launch SGLang Serverpython3 -m sglang.launch_server --model-path XiaomiMiMo/MiMo-7B-SFT --host 0.0.0.0 --trust-remote-code# Launch MTP Serverpython3 -m sglang.launch_server --model-path XiaomiMiMo/MiMo-7B-SFT --trust-remote-code \--speculative-algorithm EAGLE --speculative-num-steps 1 --speculative-eagle-topk 1 \--speculative-num-draft-tokens 2 --mem-fraction 0.5
Detailed usage can be found in SGLang documents.
Example script
py
from vllm import LLM, SamplingParamsmodel_path = "/path/to/MiMo"llm = LLM(model=model_path,trust_remote_code=True,num_speculative_tokens=1,disable_log_stats=False)sampling_params = SamplingParams(temperature=0.6)conversation = [{"role": "system","content": ""},{"role": "user","content": "Write an essay about the importance of higher education.",},]outputs = llm.chat(conversation,sampling_params=sampling_params,use_tqdm=False)for output in outputs:prompt = output.promptgenerated_text = output.outputs[0].textprint(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")print("=" * 80)
You can copy the registry/register_mimo_in_vllm.py to your directory and import it with
py
import register_mimo_in_vllmfrom vllm import LLM, SamplingParamsmodel_path = "/path/to/MiMo"llm = LLM(model=model_path,trust_remote_code=True,# num_speculative_tokens=1,disable_log_stats=False)sampling_params = SamplingParams(temperature=0.6)
Example script
py
from transformers import AutoModel, AutoModelForCausalLM, AutoTokenizermodel_id = "XiaomiMiMo/MiMo-7B-SFT"model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)tokenizer = AutoTokenizer.from_pretrained(model_id)inputs = tokenizer(["Today is"], return_tensors='pt')output = model.generate(**inputs, max_new_tokens = 100)print(tokenizer.decode(output.tolist()[0]))
We haven't verified MiMo with other inference engines and welcome contributions based on the model definition in the Huggingface repo 💻.
bibtex
@misc{coreteam2025mimounlockingreasoningpotential,title={MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining},author={LLM-Core-Team Xiaomi},year={2025},eprint={2505.07608},archivePrefix={arXiv},primaryClass={cs.CL},url={https://arxiv.org/abs/2505.07608},}
Please contact us at mimo@xiaomi.com or open an issue if you have any questions.
| MiMo-7B-SFT | SFT model trained from base model | 🤗 XiaomiMiMo/MiMo-7B-SFT | 🤖️ XiaomiMiMo/MiMo-7B-SFT |
| MiMo-7B-RL | RL model trained from SFT model, superior performance matching OpenAI o1-mini | 🤗 XiaomiMiMo/MiMo-7B-RL | 🤖️ XiaomiMiMo/MiMo-7B-RL |
| 54.5 |
| 59.1 |
| 49.1 |
| 54.4 |
| SuperGPQA(Pass@1) | 42.4 | 48.2 | 45.2 | 43.6 | 40.6 | 28.9 | 40.5 |
| DROP(3-shot F1) | 83.7 | 88.3 | 83.9 | 71.2 | 85.5 | 77.0 | 78.7 |
| MMLU-Pro(EM) | 72.6 | 78.0 | 80.3 | 52.0 | 68.8 | 53.5 | 58.6 |
| IF-Eval(Prompt Strict) | 84.3 | 86.5 | 84.8 | 40.4 | 78.3 | 60.5 | 61.0 |
| Mathematics |
| MATH-500(Pass@1) | 74.6 | 78.3 | 90.0 | 90.6 | 93.9 | 92.8 | 95.8 |
| AIME 2024(Pass@1) | 9.3 | 16.0 | 63.6 | 50.0 | 69.7 | 55.5 | 68.2 |
| AIME 2025(Pass@1) | 11.6 | 7.4 | 50.7 | 32.4 | 48.2 | 38.8 | 55.4 |
| Code |
| LiveCodeBench v5(Pass@1) | 32.9 | 38.9 | 53.8 | 41.9 | 53.1 | 37.6 | 57.8 |
| LiveCodeBench v6(Pass@1) | 30.9 | 37.2 | 46.8 | 39.1 | 31.9 | 23.9 | 49.3 |
| AIME 2025(Pass@1) | 24.3 | 46.3 | 44.3 | 55.4 |
| Code |
| LiveCodeBench v5(Pass@1) | 32.9 | 49.1 | 52.3 | 57.8 |
| LiveCodeBench v6(Pass@1) | 29.1 | 42.9 | 45.5 | 49.3 |