Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Model Summary
Qwen3.5-2B-MathParser-pro is a compact vision-language model for handwritten mathematical formula OCR. It is optimized to transcribe single-line and multi-line handwritten mathematical expressions into LaTeX, with a focus on local deployment.
This 2B release is intended for lower-memory local deployment. The companion release is Qwen3.5-4B-MathParser-pro.
Intended Use
- Handwritten mathematical formula recognition
- Multi-line LaTeX transcription
- OCR for mathematical expressions and derivations
- Research and application prototyping around handwritten math parsing
This model is not intended to be a general mathematical reasoning model. It should be used as an OCR/transcription model.
Training Recipe
The model follows a two-stage MathParser training recipe:
- Stage 1 SFT builds a stable handwritten mathematical OCR base and teaches direct LaTeX transcription.
- Stage 2 DPO v34 prefers concise, stable, line-count-faithful transcriptions and reduces malformed outputs, repetition, max-token runaway, and very low-similarity failures.
The released weights are fully merged model weights, not LoRA adapters.
Evaluation
Evaluation set: 756 multi-line handwritten mathematical formula samples.
Metrics:
- Avg Sim / Median Sim: normalized edit similarity, higher is better.
- Line Match: exact line-count match with ground truth.
- Within +/-1: predicted line count differs from ground truth by at most one.
- Runaway: max-token or obviously overlong/repetitive generations, lower is better.
- Bad <0.50: samples with similarity below 0.50, lower is better.
| Model | Samples | Avg Sim | Median Sim | Line Match | Within +/-1 | Runaway | Bad <0.50 |
|---|---|---|---|---|---|---|---|
| Qwen3.5-0.8B Base | 756 | 0.544843 | 0.580742 | 149 | 235 | 108 | 262 |
| Qwen3.5-2B Base | 756 | 0.599258 | 0.651649 | 252 | 392 | 19 | 236 |
| Qwen3.5-4B Base | 756 | 0.534456 | 0.541674 | 264 | 368 | 5 | 295 |
| Qwen3.5-2B SFT | 756 | 0.906516 | 0.952732 | 550 | 706 | 13 | 25 |
| Qwen3.5-2B SFT+DPO | 756 | 0.916060 | 0.951464 | 569 | 714 | 3 | 15 |
| Qwen3.5-4B SFT | 756 | 0.942045 | 0.966546 | 612 | 730 | 0 | 2 |
| Qwen3.5-4B SFT+DPO | 756 | 0.942878 | 0.968560 | 611 | 730 | 0 | 1 |
For this release, the main result is:
| Release | Avg Sim | Median Sim | Line Match | Within +/-1 | Runaway | Bad <0.50 |
|---|---|---|---|---|---|---|
| Qwen3.5-2B-MathParser-pro | 0.916060 | 0.951464 | 569 | 714 | 3 | 15 |
Figures




Usage
python
from PIL import Imageimport torchfrom transformers import AutoModelForImageTextToText, AutoProcessorfrom qwen_vl_utils import process_vision_infomodel_id = "sugartai/Qwen3.5-2B-MathParser-pro"processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)model = AutoModelForImageTextToText.from_pretrained(model_id,trust_remote_code=True,dtype=torch.bfloat16,device_map="auto",).eval()image = Image.open("formula.png").convert("RGB")messages = [{"role": "system","content": "You are a handwritten mathematical OCR model. Return only the LaTeX transcription.",},{"role": "user","content": [{"type": "image", "image": image},{"type": "text", "text": "Transcribe the handwritten mathematical formula into LaTeX only."},],},]text = processor.apply_chat_template(messages,tokenize=False,add_generation_prompt=True,enable_thinking=False,)image_inputs, video_inputs = process_vision_info(messages)inputs = processor(text=[text],images=image_inputs,videos=video_inputs,padding=True,return_tensors="pt",).to(model.device)eos_ids = [processor.tokenizer.eos_token_id]pad_id = processor.tokenizer.pad_token_idif pad_id is not None and pad_id not in eos_ids:eos_ids.append(pad_id)with torch.no_grad():output_ids = model.generate(**inputs,max_new_tokens=1536,do_sample=False,num_beams=1,eos_token_id=eos_ids,pad_token_id=pad_id if pad_id is not None else eos_ids[0],)new_ids = output_ids[:, inputs["input_ids"].shape[1]:]print(processor.decode(new_ids[0], skip_special_tokens=True))
Limitations
- The model is specialized for handwritten mathematical OCR and LaTeX transcription.
- It is not a general reasoning or theorem-proving model.
- Very noisy images, unusual notation, extreme layout variation, or out-of-distribution handwriting may degrade quality.
- The reported metrics are from an internal 756-sample multi-line handwritten formula evaluation set.
License
This model is released under Apache 2.0, following the base model license of Qwen/Qwen3.5-2B.
Citation
If you use this model, please cite or link this model page and the Qwen3.5 base model.
Model provider
sugartai
Model tree
Base
Qwen/Qwen3.5-2B
Fine-tuned
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information