Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.01.1. Install packages
bash
# For `transformers` backendpip install "mineru-vl-utils[transformers]"# For `vllm-engine` and `vllm-async-engine` backendpip install "mineru-vl-utils[vllm]"
1.2. transformers Example
python
from transformers import AutoProcessor, Qwen2VLForConditionalGenerationfrom PIL import Imagefrom mineru_vl_utils import MinerUClient# for transformers>=4.56.0model = Qwen2VLForConditionalGeneration.from_pretrained("opendatalab/MinerU2.5-Pro-2604-1.2B", dtype="auto", device_map="auto")processor = AutoProcessor.from_pretrained("opendatalab/MinerU2.5-Pro-2604-1.2B", use_fast=True)client = MinerUClient(backend="transformers", model=model, processor=processor,image_analysis=False # default False, set True to enable image/chart analysis)print(client.two_step_extract(Image.open("/path/to/page.png")))
1.3. vllm-engine Example (Recommended!)
python
from vllm import LLMfrom PIL import Imagefrom mineru_vl_utils import MinerUClientfrom mineru_vl_utils import MinerULogitsProcessor # if vllm>=0.10.1llm = LLM(model="opendatalab/MinerU2.5-Pro-2604-1.2B",logits_processors=[MinerULogitsProcessor] # if vllm>=0.10.1)client = MinerUClient(backend="vllm-engine", vllm_llm=llm,image_analysis=False # default False, set True to enable image/chart analysis)print(client.two_step_extract(Image.open("/path/to/page.png")))
1.4. JSON result to Markdown (enable truncated paragraph merging)
python
from mineru_vl_utils.post_process import json2md# ... omit client initializecontent_list = client.two_step_extract(Image.open("path/to/page.png"))md_res = json2md(content_list)
🚧 Cross-Page Table Merging: Currently under integration. Stay tuned!
2. Performance
2.1. End-to-End Document Parsing on OmniDocBench v1.6
2.2. Text Recognition
2.3. Formula Recognition
2.4. Table Recognition
3. Showcase
3.1. Basic Parsing Capability
3.2. Extra Supported Features
4. Acknowledgement & Citation
We would like to thank Qwen Team, vLLM, OmniDocBench, PaddleOCR, UniMERNet, DocLayout-YOLO for providing valuable code and models. We also appreciate everyone's contribution to this open-source project!
If you find our work useful in your research, please consider giving a star ⭐ and citation 📝 :
BibTeX
@misc{wang2026mineru25propushinglimitsdatacentric,title={MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale},author={Bin, Wang and Tianyao, He and Linke, Ouyang and Fan, Wu and Zhiyuan, Zhao and Tao, Chu and Yuan, Qu and Zhenjiang, Jin and Weijun, Zeng and Ziyang, Miao and Bangrui, Xu and Junbo, Niu and others},year={2026},eprint={2604.04771},archivePrefix={arXiv},primaryClass={cs.CV},url={https://arxiv.org/abs/2604.04771},}
Model provider
opendatalab
Model tree
Base
this model
Modalities
Input
Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information