Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Intended use

Single-shot OCR of a tightly cropped image containing one handwritten numeric/short value. Output is the value only, no prose.

Prompt (used at training and inference):

markdown

Read the handwritten value. Output only the value.

Training

Base modelQwen/Qwen2-VL-2B-Instruct
MethodPEFT LoRA
Rank (r)16
lora_alpha32
lora_dropout0.05
Target modulesq_proj, k_proj, v_proj, o_proj
Task typeCAUSAL_LM
DataOMR Part-C cell crops, JSONL splits (80/20 train/eval)
Starting adapterEarlier Part-D LoRA (continued fine-tune)

Usage

python

from peft import PeftModel
from transformers import AutoProcessor, Qwen2VLForConditionalGeneration
from PIL import Image
import torch
BASE = "Qwen/Qwen2-VL-2B-Instruct"
ADAPTER = "kshitizjangra/qwen2vl-omr-lora-partc"
processor = AutoProcessor.from_pretrained(BASE)
model = Qwen2VLForConditionalGeneration.from_pretrained(BASE, torch_dtype=torch.float16, device_map="auto")
model = PeftModel.from_pretrained(model, ADAPTER)
model.eval()
image = Image.open("crop.jpg").convert("RGB")
messages = [{
"role": "user",
"content": [
{"type": "image", "image": image},
{"type": "text", "text": "Read the handwritten value. Output only the value."},
],
}]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device)
with torch.no_grad():
out = model.generate(**inputs, max_new_tokens=16, do_sample=False)
print(processor.batch_decode(out[:, inputs["input_ids"].shape[1]:], skip_special_tokens=True)[0].strip())

Files

FilePurpose
adapter_model.safetensorsLoRA weights
adapter_config.jsonPEFT config
tokenizer.json, tokenizer_config.json, chat_template.jinjaTokenizer + chat template
processor_config.jsonImage/text processor

Limitations

  • Trained only on Part-C marks_obtained cells. Other handwriting domains (full-page free-form, non-English script, very long sequences) are out of scope.
  • Inference expects a tight crop. Loose crops or rotated images degrade accuracy.
  • Same biases and limitations as the base Qwen2-VL-2B-Instruct model.

Pipeline

Source code for cropping, dataset building, training, and inference lives at: https://github.com/kshitizjangra/omr_validator

Model provider

kshitizjangra

Model tree

Base

Qwen/Qwen2-VL-2B-Instruct

Adapter

this model

Modalities

Input

Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today