Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: otherSupported fields
| field | content | example |
|---|---|---|
registration_no | digits and letters | 2572U00739 |
roll_no | digits only | 7200739 |
course_code | a letter followed by digits | U12028 |
marks_obtained | a number | 45 |
Bad-field categories
Instead of guessing on unreadable fields, the model emits a category token:
<blank>— the field is empty (nothing written)<strikethrough>— a value was written and then crossed out
(<unclear> is described in the prompt convention but had no training examples in
this release, so the model is not expected to emit it.)
Treat any category token as "flag for manual review".
Evaluation
Exact-match accuracy on a held-out validation set (702 examples, 10% stratified split of the combined data):
| field | accuracy |
|---|---|
| overall | 0.979 |
| course_code | 0.995 |
| marks_obtained | 0.980 |
| roll_no | 0.980 |
| registration_no | 0.960 |
Bad-field detection: precision 1.00, recall 0.80 (bad-field examples are rare, so recall is measured on a small sample). Remaining value errors are single ambiguous-digit confusions (e.g. 0↔9, 7↔1, 6↔8).
Usage
python
import torchfrom PIL import Imagefrom peft import PeftModelfrom transformers import AutoModelForImageTextToText, AutoProcessorBASE = "Qwen/Qwen3.5-2B"ADAPTER = "jhanwarsid/qwen3.5-2b-omr-lora"CATEGORIES = {"<blank>", "<strikethrough>", "<unclear>"} # outputs meaning "review"PROMPTS = {"registration_no": "This image is a single field cropped from a student OMR sheet containing a handwritten registration number (digits and letters, e.g. 2572U00739). Transcribe it exactly as written, outputting only the value with no spaces, quotes, labels, or explanation. If the field is empty with nothing written, output <blank>. If a value was written and then struck through or crossed out, output <strikethrough>.","roll_no": "This image is a single field cropped from a student OMR sheet containing a handwritten roll number (digits only, e.g. 7200739). Transcribe it exactly as written, outputting only the value with no spaces, quotes, labels, or explanation. If the field is empty with nothing written, output <blank>. If a value was written and then struck through or crossed out, output <strikethrough>.","course_code": "This image is a single field cropped from a student OMR sheet containing a handwritten course code (a letter followed by digits, e.g. U12028). Transcribe it exactly as written, outputting only the value with no spaces, quotes, labels, or explanation. If the field is empty with nothing written, output <blank>. If a value was written and then struck through or crossed out, output <strikethrough>.","marks_obtained": "This image is a single field cropped from a student OMR sheet containing the handwritten marks obtained (a number, e.g. 45). Transcribe it exactly as written, outputting only the value with no spaces, quotes, labels, or explanation. If the field is empty with nothing written, output <blank>. If a value was written and then struck through or crossed out, output <strikethrough>.",}processor = AutoProcessor.from_pretrained(BASE, trust_remote_code=True)model = AutoModelForImageTextToText.from_pretrained(BASE, dtype=torch.bfloat16, trust_remote_code=True)model = PeftModel.from_pretrained(model, ADAPTER).eval()# model = model.to("cuda") # or "mps"def read_field(image_path: str, field: str, max_new_tokens: int = 24) -> str:img = Image.open(image_path).convert("RGB")messages = [{"role": "user","content": [{"type": "image", "image": img},{"type": "text", "text": PROMPTS[field]},],}]text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)inputs = processor(text=[text], images=[[img]], return_tensors="pt").to(model.device)out = model.generate(**inputs, max_new_tokens=max_new_tokens, do_sample=False)value = processor.tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True).strip()return value # a value, or a category token in CATEGORIES (-> manual review)print(read_field("roll_no.jpg", "roll_no"))
Training details
- Base: Qwen/Qwen3.5-2B (vision-language)
- Method: LoRA (
r=16,alpha=32, dropout 0.05,target_modules="all-linear") - Objective: supervised fine-tuning; loss computed only on the answer tokens
- Data: ~7k labeled cropped OMR field images across the four fields, including
bad images labeled
<blank>/<strikethrough> - Precision: bf16, 2 epochs
Intended use & limitations
Built for transcribing the four OMR fields above from pre-cropped field images. It is not a general OCR model and expects a single field per image. Trained on one institution's OMR layout/handwriting, so accuracy may drop on different forms. Bad-field examples were rare in training, so always keep the category tokens as a manual-review path. Inherits the license and usage terms of the base model (Qwen/Qwen3.5-2B).
Model provider
jhanwarsid
Model tree
Base
Qwen/Qwen3.5-2B
Adapter
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information