michaelrhs/slide-examiner-8b-qlora API & Inference Endpoint

What it does

A pointwise + pairwise slide examiner: detects semantic slide defects (title/body mismatch, density, narrative order, missing section) and is deliberately trained to abstain on pixel-level geometry (overflow / overlap / alignment / font / color / margin) — those are handled by a symbolic linter, not the VLM. Output is strict contract JSON (PageExamResult / DeckExamResult / PairwiseResult).

Headline results (in-domain held-out, balanced accuracy, modality A = image-only)

Table with columns: S-group semantic, this adapter (8B), zero-shot 8B, zero-shot 30B
S-group semantic	this adapter (8B)	zero-shot 8B	zero-shot 30B
balanced accuracy	1.0	0.639	0.785

The finetuned 8B examiner surpasses the zero-shot 30B model on the S-group while keeping ~0 false-positive rate on geometry (it abstains rather than hallucinating geometry from pixels). eval_loss trajectory: None.

Training

Base: Qwen/Qwen3-VL-8B-Instruct; QLoRA 4-bit (bitsandbytes), LoRA rank 16, alpha 32, 2 epochs, cosine LR 1e-4.
Data: ~5.3K synthetic slides (paired clean/defective), architecture-correct routing (S-group pointwise; geometry restate-from-structure + abstain-under-image; G1/S6 pairwise; S3→linter).
Framework: LLaMA-Factory, template qwen3_vl_nothink.

Usage

python
from peft import PeftModel
from transformers import AutoModelForImageTextToText, AutoProcessor
base = "Qwen/Qwen3-VL-8B-Instruct"
model = AutoModelForImageTextToText.from_pretrained(base, torch_dtype="bfloat16", device_map="auto")
model = PeftModel.from_pretrained(model, "michaelrhs/slide-examiner-8b-qlora")
proc = AutoProcessor.from_pretrained(base)

Adapter files: adapter_config.json, adapter_model.safetensors.

What it does

Headline results (in-domain held-out, balanced accuracy, modality A = image-only)

Table with columns: S-group semantic, this adapter (8B), zero-shot 8B, zero-shot 30B
S-group semantic	this adapter (8B)	zero-shot 8B	zero-shot 30B
balanced accuracy	1.0	0.639	0.785

Training

Base: Qwen/Qwen3-VL-8B-Instruct; QLoRA 4-bit (bitsandbytes), LoRA rank 16, alpha 32, 2 epochs, cosine LR 1e-4.
Data: ~5.3K synthetic slides (paired clean/defective), architecture-correct routing (S-group pointwise; geometry restate-from-structure + abstain-under-image; G1/S6 pairwise; S3→linter).
Framework: LLaMA-Factory, template qwen3_vl_nothink.

Usage

python
from peft import PeftModel
from transformers import AutoModelForImageTextToText, AutoProcessor
base = "Qwen/Qwen3-VL-8B-Instruct"
model = AutoModelForImageTextToText.from_pretrained(base, torch_dtype="bfloat16", device_map="auto")
model = PeftModel.from_pretrained(model, "michaelrhs/slide-examiner-8b-qlora")
proc = AutoProcessor.from_pretrained(base)

Adapter files: adapter_config.json, adapter_model.safetensors.

slide-examiner-8b-qlora

Get help setting up a custom Dedicated Endpoints.

README

What it does

Headline results (in-domain held-out, balanced accuracy, modality A = image-only)

Training

Usage

Explore FriendliAI today

README

What it does

Headline results (in-domain held-out, balanced accuracy, modality A = image-only)

Training

Usage