1.1. Install packages
# For `transformers` backend
pip install "mineru-vl-utils[transformers]"
# For `vllm-engine` and `vllm-async-engine` backend
pip install "mineru-vl-utils[vllm]"
from transformers import AutoProcessor, Qwen2VLForConditionalGeneration
from PIL import Image
from mineru_vl_utils import MinerUClient
model = Qwen2VLForConditionalGeneration.from_pretrained(
"opendatalab/MinerU2.5-Pro-2605-1.2B", dtype="auto", device_map="auto"
)
processor = AutoProcessor.from_pretrained(
"opendatalab/MinerU2.5-Pro-2605-1.2B", use_fast=True
)
client = MinerUClient(
backend="transformers", model=model, processor=processor,
image_analysis=False
)
print(client.two_step_extract(Image.open("/path/to/page.png")))
1.3. vllm-engine Example (Recommended!)
from vllm import LLM
from PIL import Image
from mineru_vl_utils import MinerUClient
from mineru_vl_utils import MinerULogitsProcessor
llm = LLM(
model="opendatalab/MinerU2.5-Pro-2605-1.2B",
logits_processors=[MinerULogitsProcessor]
)
client = MinerUClient(
backend="vllm-engine", vllm_llm=llm,
image_analysis=False
)
print(client.two_step_extract(Image.open("/path/to/page.png")))
1.4. JSON result to Markdown (enable truncated paragraph merging)
from mineru_vl_utils.post_process import json2md
content_list = client.two_step_extract(Image.open("path/to/page.png"))
md_res = json2md(content_list)
🚧 Cross-Page Table Merging: Currently under integration. Stay tuned!
2.1. End-to-End Document Parsing on OmniDocBench v1.6
2.2. Text Recognition
2.4. Table Recognition
3. Showcase
3.1. Basic Parsing Capability
4. Acknowledgement & Citation
We would like to thank Qwen Team, vLLM, OmniDocBench, PaddleOCR, UniMERNet, DocLayout-YOLO for providing valuable code and models. We also appreciate everyone's contribution to this open-source project!
If you find our work useful in your research, please consider giving a star ⭐ and citation 📝 :
@misc{wang2026mineru25propushinglimitsdatacentric,
title={MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale},
author={Bin, Wang and Tianyao, He and Linke, Ouyang and Fan, Wu and Zhiyuan, Zhao and Tao, Chu and Yuan, Qu and Zhenjiang, Jin and Weijun, Zeng and Ziyang, Miao and Bangrui, Xu and Junbo, Niu and others},
year={2026},
eprint={2604.04771},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2604.04771},
}