Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Training

base modelQwen/Qwen2-VL-2B-Instruct
methodLoRA (r=16, α=32, dropout=0.05) on q_proj,k_proj,v_proj,o_proj
starting weightscontinued from prior adapter (v1) — not from scratch
dataset1500 OMR sheets × 3 fields, human-corrected via Django labeling tool
split80/20 by sheet (no leakage) → 3414 train / 859 eval rows
epochs3
batch size1 (per-device) × 8 grad-accum = effective 8
learning rate5e-5 (lower than v1 since starting from a trained adapter)
warmup0.03
precisionfp16 on Apple Silicon MPS
gradient checkpointingon
total steps1281
final train loss0.014 (running avg)
wall-clock~6 hours on M-series MPS

Inference

python

from peft import PeftModel
from transformers import AutoProcessor, Qwen2VLForConditionalGeneration
from PIL import Image
import torch
BASE = "Qwen/Qwen2-VL-2B-Instruct"
ADAPTER = "kshitizjangra/qwen2vl-omr-lora-v2"
processor = AutoProcessor.from_pretrained(BASE, trust_remote_code=True)
base = Qwen2VLForConditionalGeneration.from_pretrained(BASE, dtype=torch.float16, trust_remote_code=True)
model = PeftModel.from_pretrained(base, ADAPTER).to("mps").eval()
img = Image.open("path/to/roll_no.jpg").convert("RGB")
messages = [{"role": "user", "content": [
{"type": "image", "image": img},
{"type": "text", "text": "Read the handwritten value. Output only the value."},
]}]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[[img]], return_tensors="pt").to("mps")
with torch.no_grad():
out = model.generate(**inputs, max_new_tokens=64, do_sample=False)
print(processor.tokenizer.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))

Intended use

Internal tool for digitizing university OMR sheets at a fixed template (Part-D). The model expects a single section crop (registration_no / roll_no / course_code) and returns the handwritten value as a string.

Limitations

  • Trained only on this specific Part-D template; will not generalize to arbitrary forms.
  • Some labeler errors are present in the training data (e.g. occasional field mix-ups where a roll number was entered in the registration field).
  • Eval accuracy not yet measured against the 859-row held-out split.

Model provider

kshitizjangra

Model tree

Base

Qwen/Qwen2-VL-2B-Instruct

Adapter

this model

Modalities

Input

Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today