Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

TL;DR

  • Base: Qwen/Qwen3-VL-8B-Instruct (QLoRA, then merged → BF16).
  • Task: resume page image(s) → structured JSON (23 fields: identity, contact, skills, experiences, educations, languages, certificates, projects, preferences).
  • Why fine-tune: the 23-field schema and the project's formatting rules are baked into the weights, so a one-line prompt replaces the ~280-line schema prompt the 32B base needed.
  • Measured (full 51-sample held-out split, A100, BF16, greedy): 83.9% weighted score, 88.2% unweighted, 88.2% JSON-valid. See Evaluation for the honest caveats.
  • Footprint: ~23 GB VRAM in BF16 at 16K context (vs. ~50 GB for the 32B it replaces).

Intended use

Extracting structured data from resume/CV documents rendered to images (PDF → PNG per page). The model is tuned for a specific downstream schema (below) used by a recruiting/ATS pipeline, including its enum vocabularies (PascalCase country names, a fixed list of roles/technologies/industries). It is most useful when you want one model call to turn a resume into a database-ready record.

It is not a general document-VQA model and should not be used to make automated decisions about candidates — see Out-of-scope.

Input / output schema

Input: one or more page images of a single resume, plus the short instruction the model was trained with (see How to use).

Output: a single JSON object with 23 top-level fields. Scalars are null when absent; list fields default to []; address defaults to {country_name, region_name}.

FieldTypeNotes
first_name, last_namestring
email, phonestring
date_of_birthstringYYYY-MM-DD
desired_positionstringmapped to a fixed role vocabulary
aboutstringfree-text summary
job_experiencenumbertotal years
job_expectations, min_salary, max_salarystring / number
ready_to_relocationbool
work_modes, employment_types, employment_durationsstring[]enum values
hobbiesstring
addressobject{country_name, region_name}
skillsobject[]{skill_name, level}
experiencesobject[]{company_name, job, date_from, date_to, description, country_name}
educationsobject[]{name, degree, location, programme, date_from, date_to, country_name}
languagesobject[]{language_name, level} (level is an int)
certificatesobject[]{certificate_name, certificate_programme, issuing_date, expiring_date}
projectsobject[]{title, summary, used_technologies[], role, industries[]}

Dates are normalized to YYYY-MM-DD (year-only ranges expand to Jan 1 / Dec 31; ongoing roles set date_to: null). Classification fields (desired_position, project role / used_technologies / industries, and all country_name fields) are mapped to predefined option lists, falling back to "Other" when nothing matches.

Real (anonymized) output example:

json

{
"first_name": "Jane",
"last_name": "Doe",
"date_of_birth": null,
"email": "jane@example.com",
"phone": "+1-555-0100",
"desired_position": "Android Developer",
"about": null,
"job_experience": null,
"job_expectations": null,
"min_salary": null,
"max_salary": null,
"ready_to_relocation": false,
"work_modes": [],
"employment_types": [],
"employment_durations": [],
"hobbies": null,
"address": { "country_name": "Uzbekistan", "region_name": "Tashkent" },
"skills": [
{ "skill_name": "Android Development", "level": null },
{ "skill_name": "Kotlin", "level": null },
{ "skill_name": "Firebase", "level": null }
],
"experiences": [
{
"company_name": "Android Development Course",
"job": "Student / Trainee (Android Development)",
"date_from": "2021-01-01",
"date_to": null,
"description": "Android development course focused on Java/Kotlin/Android.",
"country_name": null
}
],
"languages": [
{ "language_name": "Uzbek", "level": 6 },
{ "language_name": "English", "level": 2 },
{ "language_name": "Russian", "level": 0 }
],
"educations": [
{
"name": "Tashkent University of Information Technologies",
"degree": "Bachelor",
"location": "Tashkent",
"programme": "E-Commerce",
"date_from": null,
"date_to": "2019-01-01",
"country_name": "Uzbekistan"
}
],
"certificates": [],
"projects": [
{
"title": "Wallpaper App",
"summary": "Wallpaper app based on MVVM, Coin, Flow, Retrofit.",
"used_technologies": ["Kotlin", "Other"],
"role": "Mobile Developer(IOS/Android)",
"industries": ["Other"]
}
]
}

Training data

  • 513 human-verified resume samples (private internal dataset). Each sample is a PDF rendered to one or more page PNGs plus a verified ground-truth JSON record.
  • Split: 462 train / 51 held-out eval, 90/10, fixed seed 42. Samples whose estimated token length exceeded ~15.2K (1K below the 16,384 context budget) were dropped from training, so the effective training count is ≤462.
  • Page distribution: 276 single-page, 136 two-page, 101 three-or-more-page (up to 8).
  • Language: predominantly English; some records contain non-English values (e.g. Russian/Uzbek company or language names).

The dataset is not released. Code to rebuild splits and bundles is in the repo (src/data_prep.py, src/export_eval_bundle.py).

Training procedure

QLoRA via Unsloth (FastVisionModel) + TRL SFTTrainer. The 4-bit base (unsloth/Qwen3-VL-8B-Instruct-unsloth-bnb-4bit, nf4) was adapted with LoRA on both the vision and language towers (attention + MLP modules), then the adapter was merged back into the full model and published.

Each training example is a single user turn — the page images followed by the combined system+user instruction — with the ground-truth JSON as the assistant target. There is no separate system role; this is why inference uses the same short prompt.

Dtype note: the merge used Unsloth's merged_16bit, and the original upload was labeled "float16", but the published config.json and stored tensors are bfloat16. Treat this model as BF16.

Hyperparameters

HyperparameterValue
MethodQLoRA (4-bit nf4 base + LoRA, merged after training)
LoRA rank / alpha / dropout16 / 16 / 0
Target modulesvision + language layers, attention + MLP (bias="none", no rslora)
Learning rate2e-4
LR scheduler / warmupcosine / 10 steps
Optimizeradamw_8bit
Weight decay0.01
Per-device batch / grad-accum1 / 4 (effective batch 4)
Epochs1
Max sequence length16,384
Precisionbf16 (fp16 fallback if unsupported)
Seed3407
HardwareGoogle Colab L4 (24 GB)

Training time and final loss were not captured from the run.

Evaluation

Measured on 2026-06-05 with notebooks/eval_finetuned.ipynb against the held-out split, using the project's field-weighted scorer (src/evaluation.py). Setup: the published BF16 weights on a single A100, greedy decoding (do_sample=False, max_new_tokens=4096), on the full 51-sample held-out split.

MetricResult
Overall weighted score83.9%
Overall unweighted score88.2%
JSON validity88.2% (45/51 parsed; 6 failures)
Avg. inference~92.0 s/resume
Peak VRAM23.4 GB

Per-field accuracy (worst → best):

FieldAccFieldAcc
skills67.5%ready_to_relocation88.2%
phone74.5%certificates90.8%
desired_position79.2%projects91.0%
address81.2%job_expectations92.7%
experiences81.7%hobbies96.1%
first_name82.3%date_of_birth98.0%
last_name82.3%work_modes98.0%
email84.3%employment_types98.0%
job_experience84.3%employment_durations98.0%
educations84.5%min_salary100.0%
languages87.2%max_salary100.0%
about88.2%

Read these numbers with the following caveats:

  • Full held-out split, single run. These are all 51 held-out samples with greedy decoding — a real measurement, but one run on a modest test set, not a large benchmark.
  • Partial-credit metric. The scorer uses fuzzy string ratios, date/numeric tolerances, and greedy best-match over object arrays, with fields weighted by importance (work experience is weighted highest). It is not strict exact-match and is not comparable to other parsers' published numbers — it is an internal quality signal. The weighted score (83.9%) is below the unweighted (88.2%) because the highest-weighted fields — experiences, skills, identity/contact — are also the hardest ones.
  • The top-scoring fields are mostly "correctly empty." min_salary/max_salary (100%) and date_of_birth, work_modes, employment_types, employment_durations (~98%) are almost always absent in this data, so high scores largely reflect correctly returning empty — not hard extraction.
  • 6/51 invalid JSON (~12%). Most likely 4096-token truncation on long multi-page resumes; downstream code must handle un-parseable output (retry, repair, or shorter prompts).

For context, the model-selection benchmark that led to Qwen3-VL-8B (base models, ~10 samples, not reproducible from committed outputs) is noted in the repo's SESSION_LOG.md; it is not a fine-tuned result and is excluded here.

How to use

Requires a recent transformers (≥4.57 for Qwen3-VL; latest recommended). The published processor carries the correct chat template, so the modern image-in-messages path works without extra utilities.

python

# pip install -U transformers accelerate
import json
from transformers import AutoProcessor, Qwen3VLForConditionalGeneration
model_id = "sukhrobnurali/qwen3vl-resume-parser"
model = Qwen3VLForConditionalGeneration.from_pretrained(
model_id, dtype="auto", device_map="auto", attn_implementation="sdpa",
)
processor = AutoProcessor.from_pretrained(model_id)
# The 23-field schema is baked into the weights, so the short training prompt is all it needs.
SYSTEM_PROMPT = "You are a resume parser. Extract information from resume images into structured JSON."
USER_PROMPT = "Parse this resume and return the structured JSON."
# One entry per page, top to bottom. "url" accepts a local file path or an http(s) URL.
pages = ["resume_page_1.png", "resume_page_2.png"]
messages = [{
"role": "user",
"content": (
[{"type": "text", "text": SYSTEM_PROMPT}]
+ [{"type": "image", "url": p} for p in pages]
+ [{"type": "text", "text": USER_PROMPT}]
),
}]
inputs = processor.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
inputs.pop("token_type_ids", None)
generated = model.generate(**inputs, max_new_tokens=4096, do_sample=False)
trimmed = generated[:, inputs["input_ids"].shape[1]:]
text = processor.batch_decode(trimmed, skip_special_tokens=True)[0]
resume = json.loads(text) # the 23-field record
print(json.dumps(resume, indent=2, ensure_ascii=False))

Use greedy decoding (do_sample=False) for stable structured output. For long multi-page resumes, raise max_new_tokens if you see truncated JSON.

vLLM serving (the original deployment target):

bash

vllm serve sukhrobnurali/qwen3vl-resume-parser \
--dtype bfloat16 --max-model-len 16384 --trust-remote-code

When calling through the OpenAI-compatible API, pass extra_body={"chat_template_kwargs": {"enable_thinking": false}} to keep the model in non-thinking (direct-JSON) mode.

Limitations

  • Domain skew. Training resumes skew toward IT/software roles, and the enum vocabularies (roles, technologies, industries) are IT-centric. Expect degradation on non-technical resumes, unusual layouts, scans/photos, or handwriting.
  • Language. English-dominant; non-English resumes are under-represented.
  • Schema lock-in. The model is tuned to one specific 23-field schema and its enum lists. It will coerce values toward those vocabularies (including "Other"), which may not match a different downstream schema.
  • Invalid JSON happens (~12% on the held-out split). Always parse defensively.
  • Latency. ~90 s/resume on an A100 at 16K context — batch/offline, not real-time.
  • Quantization. BF16 peaks at ~23 GB VRAM; it runs in 4-bit on a 16 GB GPU, but accuracy was only measured in BF16.

Out-of-scope and responsible use

  • No automated candidate decisions. Resume parsing for screening/ranking carries fairness and bias risk. Keep a human in the loop; do not use this model to make or materially influence hiring decisions without review.
  • Not a general VQA / OCR model. It is specialized for this resume schema.
  • PII. Resumes contain personal data. Handle outputs under the applicable privacy law (e.g. GDPR) — secure storage, access control, retention limits, and a lawful basis for processing.
  • Verify before trusting. Outputs are model predictions, not ground truth; validate critical fields (contact info, dates) downstream.

License

Released under Apache-2.0, inherited from the Qwen/Qwen3-VL-8B-Instruct base model.

Citation

bibtex

@misc{nurali2026qwen3vlresumeparser,
title = {qwen3vl-resume-parser: a Qwen3-VL-8B fine-tune for resume-to-JSON extraction},
author = {Nurali, Sukhrob},
year = {2026},
howpublished = {\url{https://huggingface.co/sukhrobnurali/qwen3vl-resume-parser}}
}

Built on Qwen3-VL by the Qwen team; see the Qwen3-VL model card and Unsloth for the training stack.

Author

Sukhrob Nuralisukhrobnurali@gmail.com Hugging Face: @sukhrobnurali · GitHub: @sukhrobnurali

Model provider

sukhrobnurali

Model tree

Base

Qwen/Qwen3-VL-8B-Instruct

Fine-tuned

this model

Modalities

Input

Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today