Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

1.1. Install packages

bash

# For `transformers` backend
pip install "mineru-vl-utils[transformers]"
# For `vllm-engine` and `vllm-async-engine` backend
pip install "mineru-vl-utils[vllm]"

1.2. transformers Example

python

from transformers import AutoProcessor, Qwen2VLForConditionalGeneration
from PIL import Image
from mineru_vl_utils import MinerUClient
# for transformers>=4.56.0
model = Qwen2VLForConditionalGeneration.from_pretrained(
"opendatalab/MinerU2.5-Pro-2604-1.2B", dtype="auto", device_map="auto"
)
processor = AutoProcessor.from_pretrained(
"opendatalab/MinerU2.5-Pro-2604-1.2B", use_fast=True
)
client = MinerUClient(
backend="transformers", model=model, processor=processor,
image_analysis=False # default False, set True to enable image/chart analysis
)
print(client.two_step_extract(Image.open("/path/to/page.png")))

1.3. vllm-engine Example (Recommended!)

python

from vllm import LLM
from PIL import Image
from mineru_vl_utils import MinerUClient
from mineru_vl_utils import MinerULogitsProcessor # if vllm>=0.10.1
llm = LLM(
model="opendatalab/MinerU2.5-Pro-2604-1.2B",
logits_processors=[MinerULogitsProcessor] # if vllm>=0.10.1
)
client = MinerUClient(
backend="vllm-engine", vllm_llm=llm,
image_analysis=False # default False, set True to enable image/chart analysis
)
print(client.two_step_extract(Image.open("/path/to/page.png")))

1.4. JSON result to Markdown (enable truncated paragraph merging)

python

from mineru_vl_utils.post_process import json2md
# ... omit client initialize
content_list = client.two_step_extract(Image.open("path/to/page.png"))
md_res = json2md(content_list)

🚧 Cross-Page Table Merging: Currently under integration. Stay tuned!

2. Performance

2.1. End-to-End Document Parsing on OmniDocBench v1.6

2.2. Text Recognition

2.3. Formula Recognition

2.4. Table Recognition

3. Showcase

3.1. Basic Parsing Capability

3.2. Extra Supported Features

4. Acknowledgement & Citation

We would like to thank Qwen Team, vLLM, OmniDocBench, PaddleOCR, UniMERNet, DocLayout-YOLO for providing valuable code and models. We also appreciate everyone's contribution to this open-source project!

If you find our work useful in your research, please consider giving a star ⭐ and citation 📝 :

BibTeX

@misc{wang2026mineru25propushinglimitsdatacentric,
title={MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale},
author={Bin, Wang and Tianyao, He and Linke, Ouyang and Fan, Wu and Zhiyuan, Zhao and Tao, Chu and Yuan, Qu and Zhenjiang, Jin and Weijun, Zeng and Ziyang, Miao and Bangrui, Xu and Junbo, Niu and others},
year={2026},
eprint={2604.04771},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2604.04771},
}

Model provider

opendatalab

opendatalab

Model tree

Base

this model

Modalities

Input

Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today