Sawfwair

Infinity-Parser2-Pro-Int8

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Changes From The Base Model

  • Quantized eligible language-model linear weights to int8.
  • Used group size 64 with MLX affine quantization metadata.
  • Split Infinity-Parser2-Pro fused expert tensors into the native switch_mlp layout expected by mere.run.
  • Preserved vision tower and tokenizer sidecar files required by the native Qwen-family OCR runtime.
  • Added mererun_model.json metadata for managed local installation.

Intended Use

Use this model as the quality-focused native Infinity-Parser2 Pro OCR option in mere.run:

bash

mere.run model pull vision-ocr-infinity-pro-int8
mere.run vision ocr ./page.png \
--backend infinity \
--infinity-model vision-ocr-infinity-pro-int8 \
--infinity-task doc2md \
--temperature 0

mere.run keeps LightOnOCR as the default OCR backend because it is smaller and more predictable across the local smoke set. This quantized Pro model is for document types where Pro's layout and parsing quality justify higher latency and memory use.

Local Evaluation Notes

On local samples, this int8 model improved over Infinity-Parser2-Flash on some metadata-heavy article layouts, while LightOnOCR remained stronger on the tested default-OCR mix. Treat this as an eval target rather than a universal default.

License

The base model is licensed under Apache-2.0. This quantized derivative is distributed under Apache-2.0 as well. See LICENSE and NOTICE.

Model provider

Sawfwair

Model tree

Base

infly/Infinity-Parser2-Pro

Quantized

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today