Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Task
Input: raw OCR text from a math exam question
Output:
json
{"stem": "cleaned problem statement","answer_raw": "raw answer if clearly visible, otherwise empty","solution_raw": "","ocr_notes": ["risk tag 1", "risk tag 2"]}
Intended boundary
This adapter is designed to sit on a separate line:
OCR -> OCR rebuilder -> existing GPT teaching chain
It should:
- improve
stem - improve
answer_raw - reduce hallucinated answers
- add conservative OCR risk notes
It should not:
- replace your main GPT explanation model
- solve the math problem
- generate a polished
solution_raw
At the current stage, solution_raw is intentionally kept empty.
Why this adapter exists
The base model can often emit valid JSON, but it tends to:
- hallucinate answers when gold should be empty
- drift away from the intended field semantics
- over-talk beyond the strict OCR rebuild task
This adapter is optimized for a more conservative behavior.
Main test comparison
Evaluation setting:
- base model:
Qwen/Qwen2.5-3B-Instruct - adapter: current best
stage-1protocol-only LoRA - prompt: conservative non-solver prompt
- generation:
max_new_tokens=192 - test set: 30 held-out samples
Metric table
| Metric | Base model | Stage-1 adapter |
|---|---|---|
| JSON parse rate | 80.00% | 76.67% |
stem exact match | 0.00% | 16.67% |
answer_raw exact match | 16.67% | 60.00% |
| empty-answer hallucination | 23.33% | 0.00% |
Visual comparison
JSON parse rate
text
Base model 80.00% ████████████████Stage-1 adapter 76.67% ███████████████
answer_raw exact match
text
Base model 16.67% ███Stage-1 adapter 60.00% ████████████
empty-answer hallucination (lower is better)
text
Base model 23.33% █████Stage-1 adapter 0.00%
Semantic fidelity
We also measured average character-level similarity against gold labels on the same held-out test set.
| Field | Base model | Stage-1 adapter |
|---|---|---|
stem avg similarity | 0.4898 | 0.7217 |
answer_raw avg similarity | 0.4058 | 0.6667 |
ocr_notes avg similarity | 0.1597 | 0.2391 |
Visual comparison
stem average similarity
text
Base model 0.4898 ██████████Stage-1 adapter 0.7217 ██████████████
answer_raw average similarity
text
Base model 0.4058 ████████Stage-1 adapter 0.6667 █████████████
What this means
The adapter gives up a small amount of parse rate, but buys back the behaviors that matter most for this task:
- much better
answer_raw - much better
stem - zero hallucinated answers on gold-empty cases
For an OCR rebuilding module that feeds a larger teaching system, this tradeoff is usually worth it.
Dataset summary
The project used two task buckets during development:
single_problem_rebuild: 204 synthetic/curated samplesmulti_problem_fragment_rebuild: 102 synthetic/curated samples
The released adapter comes from a stage-1 protocol-only training setup that focused on:
- one JSON object only
- fixed field schema
- conservative extraction
- no
solution_rawgeneration
Stage-1 smoke subset:
- train: 32
- dev: 8
Known limitations
solution_rawis intentionally weak and currently fixed to empty.ocr_notesis helpful but not yet fully normalized.- Multi-problem mixed fragments are harder than single-problem OCR cleanup.
- This is a task adapter, not a general OCR foundation model.
Deployment
This repository includes a handler.py for Hugging Face Inference Endpoints custom deployment.
Recommended input:
json
{"inputs": "raw OCR text"}
Recommended output:
json
{"stem": "...","answer_raw": "...","solution_raw": "","ocr_notes": ["..."],"meta": {"raw_ocr_notes": ["model raw notes"]}}
Local usage
python
from transformers import AutoTokenizer, AutoModelForCausalLMfrom peft import PeftModelimport torchbase_model = "Qwen/Qwen2.5-3B-Instruct"adapter_model = "maru979/qwen2.5-3b-teacher-ocr-rebuilder"tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)model = AutoModelForCausalLM.from_pretrained(base_model,torch_dtype=torch.float16,device_map="auto",trust_remote_code=True,)model = PeftModel.from_pretrained(model, adapter_model)model.eval()
If you deploy this adapter as an endpoint, prefer the included handler.py instead of directly exposing raw generation.
Model provider
maru979
Model tree
Base
Qwen/Qwen2.5-3B-Instruct
Adapter
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information