jhanwarsid/qwen3.5-2b-omr-lora API & Inference Endpoint

Supported fields

Table with columns: field, content, example
field	content	example
`registration_no`	digits and letters	`2572U00739`
`roll_no`	digits only	`7200739`
`course_code`	a letter followed by digits	`U12028`
`marks_obtained`	a number	`45`

Bad-field categories

Instead of guessing on unreadable fields, the model emits a category token:

<blank> — the field is empty (nothing written)
<strikethrough> — a value was written and then crossed out

(<unclear> is described in the prompt convention but had no training examples in this release, so the model is not expected to emit it.)

Treat any category token as "flag for manual review".

Evaluation

Exact-match accuracy on a held-out validation set (702 examples, 10% stratified split of the combined data):

Table with columns: field, accuracy
field	accuracy
overall	0.979
course_code	0.995
marks_obtained	0.980
roll_no	0.980
registration_no	0.960

Bad-field detection: precision 1.00, recall 0.80 (bad-field examples are rare, so recall is measured on a small sample). Remaining value errors are single ambiguous-digit confusions (e.g. 0↔9, 7↔1, 6↔8).

Usage

python
import torch
from PIL import Image
from peft import PeftModel
from transformers import AutoModelForImageTextToText, AutoProcessor

BASE = "Qwen/Qwen3.5-2B"
ADAPTER = "jhanwarsid/qwen3.5-2b-omr-lora"

CATEGORIES = {"<blank>", "<strikethrough>", "<unclear>"}  # outputs meaning "review"
PROMPTS = {
    "registration_no": "This image is a single field cropped from a student OMR sheet containing a handwritten registration number (digits and letters, e.g. 2572U00739). Transcribe it exactly as written, outputting only the value with no spaces, quotes, labels, or explanation. If the field is empty with nothing written, output <blank>. If a value was written and then struck through or crossed out, output <strikethrough>.",
    "roll_no": "This image is a single field cropped from a student OMR sheet containing a handwritten roll number (digits only, e.g. 7200739). Transcribe it exactly as written, outputting only the value with no spaces, quotes, labels, or explanation. If the field is empty with nothing written, output <blank>. If a value was written and then struck through or crossed out, output <strikethrough>.",
    "course_code": "This image is a single field cropped from a student OMR sheet containing a handwritten course code (a letter followed by digits, e.g. U12028). Transcribe it exactly as written, outputting only the value with no spaces, quotes, labels, or explanation. If the field is empty with nothing written, output <blank>. If a value was written and then struck through or crossed out, output <strikethrough>.",
    "marks_obtained": "This image is a single field cropped from a student OMR sheet containing the handwritten marks obtained (a number, e.g. 45). Transcribe it exactly as written, outputting only the value with no spaces, quotes, labels, or explanation. If the field is empty with nothing written, output <blank>. If a value was written and then struck through or crossed out, output <strikethrough>.",
}

processor = AutoProcessor.from_pretrained(BASE, trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained(
    BASE, dtype=torch.bfloat16, trust_remote_code=True
)
model = PeftModel.from_pretrained(model, ADAPTER).eval()
# model = model.to("cuda")  # or "mps"

def read_field(image_path: str, field: str, max_new_tokens: int = 24) -> str:
    img = Image.open(image_path).convert("RGB")
    messages = [{
        "role": "user",
        "content": [
            {"type": "image", "image": img},
            {"type": "text", "text": PROMPTS[field]},
        ],
    }]
    text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    inputs = processor(text=[text], images=[[img]], return_tensors="pt").to(model.device)
    out = model.generate(**inputs, max_new_tokens=max_new_tokens, do_sample=False)
    value = processor.tokenizer.decode(
        out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True
    ).strip()
    return value  # a value, or a category token in CATEGORIES (-> manual review)

print(read_field("roll_no.jpg", "roll_no"))

Training details

Base: Qwen/Qwen3.5-2B (vision-language)
Method: LoRA (r=16, alpha=32, dropout 0.05, target_modules="all-linear")
Objective: supervised fine-tuning; loss computed only on the answer tokens
Data: ~7k labeled cropped OMR field images across the four fields, including bad images labeled <blank> / <strikethrough>
Precision: bf16, 2 epochs

Intended use & limitations

Built for transcribing the four OMR fields above from pre-cropped field images. It is not a general OCR model and expects a single field per image. Trained on one institution's OMR layout/handwriting, so accuracy may drop on different forms. Bad-field examples were rare in training, so always keep the category tokens as a manual-review path. Inherits the license and usage terms of the base model (Qwen/Qwen3.5-2B).

field

content

example

registration_no

digits and letters

2572U00739

roll_no

digits only

7200739

course_code

a letter followed by digits

U12028

marks_obtained

a number

45

field

accuracy

overall

0.979

course_code

0.995

marks_obtained

0.980

roll_no

0.980

registration_no

0.960

python

import torch
from PIL import Image
from peft import PeftModel
from transformers import AutoModelForImageTextToText, AutoProcessor

BASE = "Qwen/Qwen3.5-2B"
ADAPTER = "jhanwarsid/qwen3.5-2b-omr-lora"

CATEGORIES = {"<blank>", "<strikethrough>", "<unclear>"}  # outputs meaning "review"
PROMPTS = {
    "registration_no": "This image is a single field cropped from a student OMR sheet containing a handwritten registration number (digits and letters, e.g. 2572U00739). Transcribe it exactly as written, outputting only the value with no spaces, quotes, labels, or explanation. If the field is empty with nothing written, output <blank>. If a value was written and then struck through or crossed out, output <strikethrough>.",
    "roll_no": "This image is a single field cropped from a student OMR sheet containing a handwritten roll number (digits only, e.g. 7200739). Transcribe it exactly as written, outputting only the value with no spaces, quotes, labels, or explanation. If the field is empty with nothing written, output <blank>. If a value was written and then struck through or crossed out, output <strikethrough>.",
    "course_code": "This image is a single field cropped from a student OMR sheet containing a handwritten course code (a letter followed by digits, e.g. U12028). Transcribe it exactly as written, outputting only the value with no spaces, quotes, labels, or explanation. If the field is empty with nothing written, output <blank>. If a value was written and then struck through or crossed out, output <strikethrough>.",
    "marks_obtained": "This image is a single field cropped from a student OMR sheet containing the handwritten marks obtained (a number, e.g. 45). Transcribe it exactly as written, outputting only the value with no spaces, quotes, labels, or explanation. If the field is empty with nothing written, output <blank>. If a value was written and then struck through or crossed out, output <strikethrough>.",
}

processor = AutoProcessor.from_pretrained(BASE, trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained(
    BASE, dtype=torch.bfloat16, trust_remote_code=True
)
model = PeftModel.from_pretrained(model, ADAPTER).eval()
# model = model.to("cuda")  # or "mps"

def read_field(image_path: str, field: str, max_new_tokens: int = 24) -> str:
    img = Image.open(image_path).convert("RGB")
    messages = [{
        "role": "user",
        "content": [
            {"type": "image", "image": img},
            {"type": "text", "text": PROMPTS[field]},
        ],
    }]
    text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    inputs = processor(text=[text], images=[[img]], return_tensors="pt").to(model.device)
    out = model.generate(**inputs, max_new_tokens=max_new_tokens, do_sample=False)
    value = processor.tokenizer.decode(
        out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True
    ).strip()
    return value  # a value, or a category token in CATEGORIES (-> manual review)

print(read_field("roll_no.jpg", "roll_no"))

qwen3.5-2b-omr-lora

Get help setting up a custom Dedicated Endpoints.

README

Supported fields

Bad-field categories

Evaluation

Usage

Training details

Intended use & limitations

Explore FriendliAI today

README

Supported fields

Bad-field categories

Evaluation

Usage

Training details

Intended use & limitations