Model Summary
Qwen3.5-2B-MathParser-pro is a compact vision-language model for handwritten mathematical formula OCR. It is optimized to transcribe single-line and multi-line handwritten mathematical expressions into LaTeX, with a focus on local deployment.
This 2B release is intended for lower-memory local deployment. The companion release is Qwen3.5-4B-MathParser-pro.
Intended Use
- Handwritten mathematical formula recognition
- Multi-line LaTeX transcription
- OCR for mathematical expressions and derivations
- Research and application prototyping around handwritten math parsing
This model is not intended to be a general mathematical reasoning model. It should be used as an OCR/transcription model.
Training Recipe
The model follows a two-stage MathParser training recipe:
- Stage 1 SFT builds a stable handwritten mathematical OCR base and teaches direct LaTeX transcription.
- Stage 2 DPO v34 prefers concise, stable, line-count-faithful transcriptions and reduces malformed outputs, repetition, max-token runaway, and very low-similarity failures.
The released weights are fully merged model weights, not LoRA adapters.
Evaluation
Evaluation set: 756 multi-line handwritten mathematical formula samples.
Metrics:
- Avg Sim / Median Sim: normalized edit similarity, higher is better.
- Line Match: exact line-count match with ground truth.
- Within +/-1: predicted line count differs from ground truth by at most one.
- Runaway: max-token or obviously overlong/repetitive generations, lower is better.
- Bad <0.50: samples with similarity below 0.50, lower is better.
Table with columns: Model, Samples, Avg Sim, Median Sim, Line Match, Within +/-1, Runaway, Bad <0.50| Model | Samples | Avg Sim | Median Sim | Line Match | Within +/-1 | Runaway | Bad <0.50 |
|---|
| Qwen3.5-0.8B Base | 756 | 0.544843 | 0.580742 | 149 | 235 | 108 | 262 |
| Qwen3.5-2B Base | 756 | 0.599258 | 0.651649 |
For this release, the main result is:
Table with columns: Release, Avg Sim, Median Sim, Line Match, Within +/-1, Runaway, Bad <0.50| Release | Avg Sim | Median Sim | Line Match | Within +/-1 | Runaway | Bad <0.50 |
|---|
| Qwen3.5-2B-MathParser-pro | 0.916060 | 0.951464 | 569 | 714 | 3 | 15 |




Usage
from PIL import Image
import torch
from transformers import AutoModelForImageTextToText, AutoProcessor
from qwen_vl_utils import process_vision_info
model_id = "sugartai/Qwen3.5-2B-MathParser-pro"
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained(
model_id,
trust_remote_code=True,
dtype=torch.bfloat16,
device_map="auto",
).eval()
image = Image.open("formula.png").convert("RGB")
messages = [
{
"role": "system",
"content": "You are a handwritten mathematical OCR model. Return only the LaTeX transcription.",
},
{
"role": "user",
"content": [
{"type": "image", "image": image},
{"type": "text", "text": "Transcribe the handwritten mathematical formula into LaTeX only."},
],
},
]
text = processor.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=False,
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
text=[text],
images=image_inputs,
videos=video_inputs,
padding=True,
return_tensors="pt",
).to(model.device)
eos_ids = [processor.tokenizer.eos_token_id]
pad_id = processor.tokenizer.pad_token_id
if pad_id is not None and pad_id not in eos_ids:
eos_ids.append(pad_id)
with torch.no_grad():
output_ids = model.generate(
**inputs,
max_new_tokens=1536,
do_sample=False,
num_beams=1,
eos_token_id=eos_ids,
pad_token_id=pad_id if pad_id is not None else eos_ids[0],
)
new_ids = output_ids[:, inputs["input_ids"].shape[1]:]
print(processor.decode(new_ids[0], skip_special_tokens=True))
Limitations
- The model is specialized for handwritten mathematical OCR and LaTeX transcription.
- It is not a general reasoning or theorem-proving model.
- Very noisy images, unusual notation, extreme layout variation, or out-of-distribution handwriting may degrade quality.
- The reported metrics are from an internal 756-sample multi-line handwritten formula evaluation set.
License
This model is released under Apache 2.0, following the base model license of Qwen/Qwen3.5-2B.
Citation
If you use this model, please cite or link this model page and the Qwen3.5 base model.