Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: other

Supported fields

fieldcontentexample
registration_nodigits and letters2572U00739
roll_nodigits only7200739
course_codea letter followed by digitsU12028
marks_obtaineda number45

Bad-field categories

Instead of guessing on unreadable fields, the model emits a category token:

  • <blank> — the field is empty (nothing written)
  • <strikethrough> — a value was written and then crossed out

(<unclear> is described in the prompt convention but had no training examples in this release, so the model is not expected to emit it.)

Treat any category token as "flag for manual review".

Evaluation

Exact-match accuracy on a held-out validation set (702 examples, 10% stratified split of the combined data):

fieldaccuracy
overall0.979
course_code0.995
marks_obtained0.980
roll_no0.980
registration_no0.960

Bad-field detection: precision 1.00, recall 0.80 (bad-field examples are rare, so recall is measured on a small sample). Remaining value errors are single ambiguous-digit confusions (e.g. 0↔9, 7↔1, 6↔8).

Usage

python

import torch
from PIL import Image
from peft import PeftModel
from transformers import AutoModelForImageTextToText, AutoProcessor
BASE = "Qwen/Qwen3.5-2B"
ADAPTER = "jhanwarsid/qwen3.5-2b-omr-lora"
CATEGORIES = {"<blank>", "<strikethrough>", "<unclear>"} # outputs meaning "review"
PROMPTS = {
"registration_no": "This image is a single field cropped from a student OMR sheet containing a handwritten registration number (digits and letters, e.g. 2572U00739). Transcribe it exactly as written, outputting only the value with no spaces, quotes, labels, or explanation. If the field is empty with nothing written, output <blank>. If a value was written and then struck through or crossed out, output <strikethrough>.",
"roll_no": "This image is a single field cropped from a student OMR sheet containing a handwritten roll number (digits only, e.g. 7200739). Transcribe it exactly as written, outputting only the value with no spaces, quotes, labels, or explanation. If the field is empty with nothing written, output <blank>. If a value was written and then struck through or crossed out, output <strikethrough>.",
"course_code": "This image is a single field cropped from a student OMR sheet containing a handwritten course code (a letter followed by digits, e.g. U12028). Transcribe it exactly as written, outputting only the value with no spaces, quotes, labels, or explanation. If the field is empty with nothing written, output <blank>. If a value was written and then struck through or crossed out, output <strikethrough>.",
"marks_obtained": "This image is a single field cropped from a student OMR sheet containing the handwritten marks obtained (a number, e.g. 45). Transcribe it exactly as written, outputting only the value with no spaces, quotes, labels, or explanation. If the field is empty with nothing written, output <blank>. If a value was written and then struck through or crossed out, output <strikethrough>.",
}
processor = AutoProcessor.from_pretrained(BASE, trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained(
BASE, dtype=torch.bfloat16, trust_remote_code=True
)
model = PeftModel.from_pretrained(model, ADAPTER).eval()
# model = model.to("cuda") # or "mps"
def read_field(image_path: str, field: str, max_new_tokens: int = 24) -> str:
img = Image.open(image_path).convert("RGB")
messages = [{
"role": "user",
"content": [
{"type": "image", "image": img},
{"type": "text", "text": PROMPTS[field]},
],
}]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[[img]], return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=max_new_tokens, do_sample=False)
value = processor.tokenizer.decode(
out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True
).strip()
return value # a value, or a category token in CATEGORIES (-> manual review)
print(read_field("roll_no.jpg", "roll_no"))

Training details

  • Base: Qwen/Qwen3.5-2B (vision-language)
  • Method: LoRA (r=16, alpha=32, dropout 0.05, target_modules="all-linear")
  • Objective: supervised fine-tuning; loss computed only on the answer tokens
  • Data: ~7k labeled cropped OMR field images across the four fields, including bad images labeled <blank> / <strikethrough>
  • Precision: bf16, 2 epochs

Intended use & limitations

Built for transcribing the four OMR fields above from pre-cropped field images. It is not a general OCR model and expects a single field per image. Trained on one institution's OMR layout/handwriting, so accuracy may drop on different forms. Bad-field examples were rare in training, so always keep the category tokens as a manual-review path. Inherits the license and usage terms of the base model (Qwen/Qwen3.5-2B).

Model provider

jhanwarsid

Model tree

Base

Qwen/Qwen3.5-2B

Adapter

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today