Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Task

Input: raw OCR text from a math exam question

Output:

json

{
"stem": "cleaned problem statement",
"answer_raw": "raw answer if clearly visible, otherwise empty",
"solution_raw": "",
"ocr_notes": ["risk tag 1", "risk tag 2"]
}

Intended boundary

This adapter is designed to sit on a separate line:

OCR -> OCR rebuilder -> existing GPT teaching chain

It should:

  • improve stem
  • improve answer_raw
  • reduce hallucinated answers
  • add conservative OCR risk notes

It should not:

  • replace your main GPT explanation model
  • solve the math problem
  • generate a polished solution_raw

At the current stage, solution_raw is intentionally kept empty.

Why this adapter exists

The base model can often emit valid JSON, but it tends to:

  • hallucinate answers when gold should be empty
  • drift away from the intended field semantics
  • over-talk beyond the strict OCR rebuild task

This adapter is optimized for a more conservative behavior.

Main test comparison

Evaluation setting:

  • base model: Qwen/Qwen2.5-3B-Instruct
  • adapter: current best stage-1 protocol-only LoRA
  • prompt: conservative non-solver prompt
  • generation: max_new_tokens=192
  • test set: 30 held-out samples

Metric table

MetricBase modelStage-1 adapter
JSON parse rate80.00%76.67%
stem exact match0.00%16.67%
answer_raw exact match16.67%60.00%
empty-answer hallucination23.33%0.00%

Visual comparison

JSON parse rate

text

Base model 80.00% ████████████████
Stage-1 adapter 76.67% ███████████████

answer_raw exact match

text

Base model 16.67% ███
Stage-1 adapter 60.00% ████████████

empty-answer hallucination (lower is better)

text

Base model 23.33% █████
Stage-1 adapter 0.00%

Semantic fidelity

We also measured average character-level similarity against gold labels on the same held-out test set.

FieldBase modelStage-1 adapter
stem avg similarity0.48980.7217
answer_raw avg similarity0.40580.6667
ocr_notes avg similarity0.15970.2391

Visual comparison

stem average similarity

text

Base model 0.4898 ██████████
Stage-1 adapter 0.7217 ██████████████

answer_raw average similarity

text

Base model 0.4058 ████████
Stage-1 adapter 0.6667 █████████████

What this means

The adapter gives up a small amount of parse rate, but buys back the behaviors that matter most for this task:

  • much better answer_raw
  • much better stem
  • zero hallucinated answers on gold-empty cases

For an OCR rebuilding module that feeds a larger teaching system, this tradeoff is usually worth it.

Dataset summary

The project used two task buckets during development:

  • single_problem_rebuild: 204 synthetic/curated samples
  • multi_problem_fragment_rebuild: 102 synthetic/curated samples

The released adapter comes from a stage-1 protocol-only training setup that focused on:

  • one JSON object only
  • fixed field schema
  • conservative extraction
  • no solution_raw generation

Stage-1 smoke subset:

  • train: 32
  • dev: 8

Known limitations

  1. solution_raw is intentionally weak and currently fixed to empty.
  2. ocr_notes is helpful but not yet fully normalized.
  3. Multi-problem mixed fragments are harder than single-problem OCR cleanup.
  4. This is a task adapter, not a general OCR foundation model.

Deployment

This repository includes a handler.py for Hugging Face Inference Endpoints custom deployment.

Recommended input:

json

{
"inputs": "raw OCR text"
}

Recommended output:

json

{
"stem": "...",
"answer_raw": "...",
"solution_raw": "",
"ocr_notes": ["..."],
"meta": {
"raw_ocr_notes": ["model raw notes"]
}
}

Local usage

python

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
base_model = "Qwen/Qwen2.5-3B-Instruct"
adapter_model = "maru979/qwen2.5-3b-teacher-ocr-rebuilder"
tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
base_model,
torch_dtype=torch.float16,
device_map="auto",
trust_remote_code=True,
)
model = PeftModel.from_pretrained(model, adapter_model)
model.eval()

If you deploy this adapter as an endpoint, prefer the included handler.py instead of directly exposing raw generation.

Model provider

maru979

Model tree

Base

Qwen/Qwen2.5-3B-Instruct

Adapter

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today