sugartai

Qwen3.5-2B-MathParser-pro

README

License: apache-2.0

Model Summary

Qwen3.5-2B-MathParser-pro is a compact vision-language model for handwritten mathematical formula OCR. It is optimized to transcribe single-line and multi-line handwritten mathematical expressions into LaTeX, with a focus on local deployment.

This 2B release is intended for lower-memory local deployment. The companion release is Qwen3.5-4B-MathParser-pro.

Intended Use

Handwritten mathematical formula recognition
Multi-line LaTeX transcription
OCR for mathematical expressions and derivations
Research and application prototyping around handwritten math parsing

This model is not intended to be a general mathematical reasoning model. It should be used as an OCR/transcription model.

Training Recipe

The model follows a two-stage MathParser training recipe:

Stage 1 SFT builds a stable handwritten mathematical OCR base and teaches direct LaTeX transcription.
Stage 2 DPO v34 prefers concise, stable, line-count-faithful transcriptions and reduces malformed outputs, repetition, max-token runaway, and very low-similarity failures.

The released weights are fully merged model weights, not LoRA adapters.

Evaluation

Evaluation set: 756 multi-line handwritten mathematical formula samples.

Metrics:

Avg Sim / Median Sim: normalized edit similarity, higher is better.
Line Match: exact line-count match with ground truth.
Within +/-1: predicted line count differs from ground truth by at most one.
Runaway: max-token or obviously overlong/repetitive generations, lower is better.
Bad <0.50: samples with similarity below 0.50, lower is better.

Table with columns: Model, Samples, Avg Sim, Median Sim, Line Match, Within +/-1, Runaway, Bad <0.50
Model	Samples	Avg Sim	Median Sim	Line Match	Within +/-1	Runaway	Bad <0.50
Qwen3.5-0.8B Base	756	0.544843	0.580742	149	235	108	262
Qwen3.5-2B Base	756	0.599258	0.651649

For this release, the main result is:

Table with columns: Release, Avg Sim, Median Sim, Line Match, Within +/-1, Runaway, Bad <0.50
Release	Avg Sim	Median Sim	Line Match	Within +/-1	Runaway	Bad <0.50
Qwen3.5-2B-MathParser-pro	0.916060	0.951464	569	714	3	15

Figures

Overall average similarity

Error reduction

Bucket average similarity

Model size quality tradeoff

Usage

python
from PIL import Image
import torch
from transformers import AutoModelForImageTextToText, AutoProcessor
from qwen_vl_utils import process_vision_info

model_id = "sugartai/Qwen3.5-2B-MathParser-pro"

processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained(
    model_id,
    trust_remote_code=True,
    dtype=torch.bfloat16,
    device_map="auto",
).eval()

image = Image.open("formula.png").convert("RGB")
messages = [
    {
        "role": "system",
        "content": "You are a handwritten mathematical OCR model. Return only the LaTeX transcription.",
    },
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": "Transcribe the handwritten mathematical formula into LaTeX only."},
        ],
    },
]

text = processor.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=False,
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
).to(model.device)

eos_ids = [processor.tokenizer.eos_token_id]
pad_id = processor.tokenizer.pad_token_id
if pad_id is not None and pad_id not in eos_ids:
    eos_ids.append(pad_id)

with torch.no_grad():
    output_ids = model.generate(
        **inputs,
        max_new_tokens=1536,
        do_sample=False,
        num_beams=1,
        eos_token_id=eos_ids,
        pad_token_id=pad_id if pad_id is not None else eos_ids[0],
    )

new_ids = output_ids[:, inputs["input_ids"].shape[1]:]
print(processor.decode(new_ids[0], skip_special_tokens=True))

Limitations

The model is specialized for handwritten mathematical OCR and LaTeX transcription.
It is not a general reasoning or theorem-proving model.
Very noisy images, unusual notation, extreme layout variation, or out-of-distribution handwriting may degrade quality.
The reported metrics are from an internal 756-sample multi-line handwritten formula evaluation set.

License

This model is released under Apache 2.0, following the base model license of Qwen/Qwen3.5-2B.

Citation

If you use this model, please cite or link this model page and the Qwen3.5 base model.

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Model Details

Model Provider

sugartai

Model Tree

Base

Qwen/Qwen3.5-2B

Fine-tuned

this model

Input Modalities

Text

Image

Video

Output Modalities