Sawfwair

Infinity-Parser2-Pro-Int8

Deploy Dedicated

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Changes From The Base Model

Quantized eligible language-model linear weights to int8.
Used group size 64 with MLX affine quantization metadata.
Split Infinity-Parser2-Pro fused expert tensors into the native switch_mlp layout expected by mere.run.
Preserved vision tower and tokenizer sidecar files required by the native Qwen-family OCR runtime.
Added mererun_model.json metadata for managed local installation.

Intended Use

Use this model as the quality-focused native Infinity-Parser2 Pro OCR option in mere.run:

bash
mere.run model pull vision-ocr-infinity-pro-int8
mere.run vision ocr ./page.png \
  --backend infinity \
  --infinity-model vision-ocr-infinity-pro-int8 \
  --infinity-task doc2md \
  --temperature 0

mere.run keeps LightOnOCR as the default OCR backend because it is smaller and more predictable across the local smoke set. This quantized Pro model is for document types where Pro's layout and parsing quality justify higher latency and memory use.

Local Evaluation Notes

On local samples, this int8 model improved over Infinity-Parser2-Flash on some metadata-heavy article layouts, while LightOnOCR remained stronger on the tested default-OCR mix. Treat this as an eval target rather than a universal default.