Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Model Summary

Qwen3.5-2B-MathParser-pro is a compact vision-language model for handwritten mathematical formula OCR. It is optimized to transcribe single-line and multi-line handwritten mathematical expressions into LaTeX, with a focus on local deployment.

This 2B release is intended for lower-memory local deployment. The companion release is Qwen3.5-4B-MathParser-pro.

Intended Use

  • Handwritten mathematical formula recognition
  • Multi-line LaTeX transcription
  • OCR for mathematical expressions and derivations
  • Research and application prototyping around handwritten math parsing

This model is not intended to be a general mathematical reasoning model. It should be used as an OCR/transcription model.

Training Recipe

The model follows a two-stage MathParser training recipe:

  1. Stage 1 SFT builds a stable handwritten mathematical OCR base and teaches direct LaTeX transcription.
  2. Stage 2 DPO v34 prefers concise, stable, line-count-faithful transcriptions and reduces malformed outputs, repetition, max-token runaway, and very low-similarity failures.

The released weights are fully merged model weights, not LoRA adapters.

Evaluation

Evaluation set: 756 multi-line handwritten mathematical formula samples.

Metrics:

  • Avg Sim / Median Sim: normalized edit similarity, higher is better.
  • Line Match: exact line-count match with ground truth.
  • Within +/-1: predicted line count differs from ground truth by at most one.
  • Runaway: max-token or obviously overlong/repetitive generations, lower is better.
  • Bad <0.50: samples with similarity below 0.50, lower is better.
ModelSamplesAvg SimMedian SimLine MatchWithin +/-1RunawayBad <0.50
Qwen3.5-0.8B Base7560.5448430.580742149235108262
Qwen3.5-2B Base7560.5992580.65164925239219236
Qwen3.5-4B Base7560.5344560.5416742643685295
Qwen3.5-2B SFT7560.9065160.9527325507061325
Qwen3.5-2B SFT+DPO7560.9160600.951464569714315
Qwen3.5-4B SFT7560.9420450.96654661273002
Qwen3.5-4B SFT+DPO7560.9428780.96856061173001

For this release, the main result is:

ReleaseAvg SimMedian SimLine MatchWithin +/-1RunawayBad <0.50
Qwen3.5-2B-MathParser-pro0.9160600.951464569714315

Figures

Overall average similarity

Error reduction

Bucket average similarity

Model size quality tradeoff

Usage

python

from PIL import Image
import torch
from transformers import AutoModelForImageTextToText, AutoProcessor
from qwen_vl_utils import process_vision_info
model_id = "sugartai/Qwen3.5-2B-MathParser-pro"
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained(
model_id,
trust_remote_code=True,
dtype=torch.bfloat16,
device_map="auto",
).eval()
image = Image.open("formula.png").convert("RGB")
messages = [
{
"role": "system",
"content": "You are a handwritten mathematical OCR model. Return only the LaTeX transcription.",
},
{
"role": "user",
"content": [
{"type": "image", "image": image},
{"type": "text", "text": "Transcribe the handwritten mathematical formula into LaTeX only."},
],
},
]
text = processor.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=False,
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
text=[text],
images=image_inputs,
videos=video_inputs,
padding=True,
return_tensors="pt",
).to(model.device)
eos_ids = [processor.tokenizer.eos_token_id]
pad_id = processor.tokenizer.pad_token_id
if pad_id is not None and pad_id not in eos_ids:
eos_ids.append(pad_id)
with torch.no_grad():
output_ids = model.generate(
**inputs,
max_new_tokens=1536,
do_sample=False,
num_beams=1,
eos_token_id=eos_ids,
pad_token_id=pad_id if pad_id is not None else eos_ids[0],
)
new_ids = output_ids[:, inputs["input_ids"].shape[1]:]
print(processor.decode(new_ids[0], skip_special_tokens=True))

Limitations

  • The model is specialized for handwritten mathematical OCR and LaTeX transcription.
  • It is not a general reasoning or theorem-proving model.
  • Very noisy images, unusual notation, extreme layout variation, or out-of-distribution handwriting may degrade quality.
  • The reported metrics are from an internal 756-sample multi-line handwritten formula evaluation set.

License

This model is released under Apache 2.0, following the base model license of Qwen/Qwen3.5-2B.

Citation

If you use this model, please cite or link this model page and the Qwen3.5 base model.

Model provider

sugartai

Model tree

Base

Qwen/Qwen3.5-2B

Fine-tuned

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today