Sawfwair
Infinity-Parser2-Pro-Int8
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Changes From The Base Model
- Quantized eligible language-model linear weights to int8.
- Used group size 64 with MLX affine quantization metadata.
- Split Infinity-Parser2-Pro fused expert tensors into the native
switch_mlplayout expected bymere.run. - Preserved vision tower and tokenizer sidecar files required by the native Qwen-family OCR runtime.
- Added
mererun_model.jsonmetadata for managed local installation.
Intended Use
Use this model as the quality-focused native Infinity-Parser2 Pro OCR option in
mere.run:
bash
mere.run model pull vision-ocr-infinity-pro-int8mere.run vision ocr ./page.png \--backend infinity \--infinity-model vision-ocr-infinity-pro-int8 \--infinity-task doc2md \--temperature 0
mere.run keeps LightOnOCR as the default OCR backend because it is smaller and
more predictable across the local smoke set. This quantized Pro model is for
document types where Pro's layout and parsing quality justify higher latency and
memory use.
Local Evaluation Notes
On local samples, this int8 model improved over Infinity-Parser2-Flash on some metadata-heavy article layouts, while LightOnOCR remained stronger on the tested default-OCR mix. Treat this as an eval target rather than a universal default.
License
The base model is licensed under Apache-2.0. This quantized derivative is
distributed under Apache-2.0 as well. See LICENSE and NOTICE.
Model provider
Sawfwair
Model tree
Base
infly/Infinity-Parser2-Pro
Quantized
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information