MiMo-VRPRM-7B API & Inference Endpoint

Model Details

Model family: VRPRM
Release variant: MiMo-7B
Serialized architecture: Qwen2_5_VLForConditionalGeneration
Model type: qwen2_5_vl
Weights format: sharded safetensors
Recommended library: transformers

Training Summary

The VRPRM paper trains the model with a two-stage recipe:

Supervised fine-tuning cold start on high-quality CoT-PRM data. Open-sourced on VRPRM3.6K.
Reinforcement learning scaling on lower-cost non-CoT PRM data.

Intended Use

This model is intended for research on:

Visual process reward modeling
Multimodal reasoning evaluation
Step-level scoring of visual question answering rationales
Best-of-N selection for vision-language model responses

This model is not intended to be used as a standalone assistant.

Usage

Load the model with Hugging Face Transformers from the repository root:

python
from transformers import AutoModelForVision2Seq, AutoProcessor

model_id = "YOUR_USERNAME/VRPRM-MiMo-7B"

processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForVision2Seq.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True,
)

For the complete inference and evaluation pipeline, use the VRPRM project code.

Citation

bibtex
@misc{chen2026vrprmprocessrewardmodeling,
      title={VRPRM: Process Reward Modeling via Visual Reasoning}, 
      author={Xinquan Chen and Chongying Yue and Bangwei Liu and Xuhong Wang and Yingchun Wang and Chaochao Lu},
      year={2026},
      eprint={2508.03556},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2508.03556}, 
}

Model Details

Model family: VRPRM
Release variant: MiMo-7B
Serialized architecture: Qwen2_5_VLForConditionalGeneration
Model type: qwen2_5_vl
Weights format: sharded safetensors
Recommended library: transformers

Training Summary

The VRPRM paper trains the model with a two-stage recipe:

Supervised fine-tuning cold start on high-quality CoT-PRM data. Open-sourced on VRPRM3.6K.
Reinforcement learning scaling on lower-cost non-CoT PRM data.

Intended Use

This model is intended for research on:

Visual process reward modeling
Multimodal reasoning evaluation
Step-level scoring of visual question answering rationales
Best-of-N selection for vision-language model responses

This model is not intended to be used as a standalone assistant.

Usage

Load the model with Hugging Face Transformers from the repository root:

python
from transformers import AutoModelForVision2Seq, AutoProcessor

model_id = "YOUR_USERNAME/VRPRM-MiMo-7B"

processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForVision2Seq.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True,
)

For the complete inference and evaluation pipeline, use the VRPRM project code.

Citation

bibtex
@misc{chen2026vrprmprocessrewardmodeling,
      title={VRPRM: Process Reward Modeling via Visual Reasoning}, 
      author={Xinquan Chen and Chongying Yue and Bangwei Liu and Xuhong Wang and Yingchun Wang and Chaochao Lu},
      year={2026},
      eprint={2508.03556},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2508.03556}, 
}

MiMo-VRPRM-7B

README

Model Details

Training Summary

Intended Use

Usage

Citation

Explore FriendliAI today

README

Model Details

Training Summary

Intended Use

Usage

Citation