Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Results (held-out, location-disjoint test set, n = 3,447)
| Model / setting | Risk QWK | Exact-risk accuracy |
|---|---|---|
| Zero-shot base (Qwen3-VL-8B-Instruct) | 0.077 [0.044, 0.108] | 0.544 |
| EG-ARSA (this adapter) | 0.482 [0.454, 0.510] | 0.717 |
Fine-tuning lifts ordinal risk agreement by +0.40 QWK (non-overlapping bootstrap CIs).
Under a blind human-expert evaluation, EG-ARSA is risk-correct 81% of the time vs
58% for Gemini-2.5-flash and 42% for the 31B teacher run leakage-free; fully
automated risk accuracy reproduces the ranking (0.74 / 0.59 / 0.36). Per-class F1 (raw
operating point): Low 0.04 / Medium 0.67 / High 0.77 — residual errors are predominantly
between adjacent risk levels. See the reports/ folder and the paper for the full
evaluation, including the multi-model comparison and human-eval rubric.
Usage
python
import torchfrom transformers import Qwen3VLForConditionalGeneration, AutoProcessorfrom peft import PeftModelfrom PIL import ImageBASE = "Qwen/Qwen3-VL-8B-Instruct"LORA = "Thamed-Chowdhury/eg-arsa-qwen3vl-8b-lora"proc = AutoProcessor.from_pretrained(BASE, trust_remote_code=True)model = Qwen3VLForConditionalGeneration.from_pretrained(BASE, dtype=torch.bfloat16, device_map="cuda")model = PeftModel.from_pretrained(model, LORA).eval()SYSTEM = ("You are a road-safety auditor applying the LGED (Local Government Engineering ""Department) 12-category visual audit methodology to road imagery in Bangladesh.")# The canonical leakage-free instruction is FULL_AUDIT_INSTRUCTION in# prompts/finetune_prompts.py (shipped with the dataset and the code repo).INSTRUCTION = "Audit this road image for safety hazards. Return the structured JSON audit."img = Image.open("road.jpg").convert("RGB")msgs = [{"role": "system", "content": [{"type": "text", "text": SYSTEM}]},{"role": "user", "content": [{"type": "image", "image": img},{"type": "text", "text": INSTRUCTION}]},]text = proc.apply_chat_template(msgs, tokenize=False, add_generation_prompt=True)inputs = proc(text=[text], images=[img], return_tensors="pt").to("cuda")out = model.generate(**inputs, max_new_tokens=1024, do_sample=False)print(proc.tokenizer.decode(out[0, inputs["input_ids"].shape[1]:], skip_special_tokens=True))
For exact parity with the paper, use the canonical
FULL_AUDIT_INSTRUCTION(full road scene) /SINGLE_HAZARD_INSTRUCTIONfromprompts/finetune_prompts.pyand 1024-px native resolution. See the code repo for the wrapped inference helper (apps/streetview_infer.py) and the evaluation pipeline.
Training
- Base: Qwen3-VL-8B-Instruct, vision encoder frozen.
- LoRA: r=16, α=32, dropout 0.05, targets = LM attention
q/k/v/o_proj. - Precision/memory: bf16, gradient checkpointing, effective batch 16 (micro 2 × accum 8).
- Resolution: 1024 px (native max dimension, selected by a zero-shot resolution probe).
- Objective: per-task normalized cross-entropy, weights hazard 1.0 / risk 1.0 /
recommendation 0.5, with per-record loss masking by
tasks_available. - Imbalance: train-only logit adjustment (τ=1) on the risk-token logits (priors Low 225 / Med 6,340 / High 9,517); raw logits at inference, with an optional post-hoc operating-point offset fit on validation.
- Schedule: LR 1e-4 cosine, 3% warmup, 2 epochs, early-stop on validation QWK.
- Data: BD-ARSA (train 16,082 records).
- Compute: a single NVIDIA A100 40 GB, ≈ 6.3 h.
Intended use & limitations
Intended use. Proactive, low-cost screening of rural/suburban (LGED-class) road imagery to surface visible safety hazards and an interpretable risk rating where formal Road Safety Audits are unaffordable. It is a decision-support tool, not a replacement for a formal multidisciplinary RSA.
Limitations / out of scope.
- Audits a single street-view image: full road geometry and the non-visual extremes of
skid_resistance(friction) anddrainage(wet-weather behaviour) are recoverable only in obvious cases. All 12 categories are reported; these two score lowest. - Targets the rural/suburban LGED road class. National highways (RHD) and dense city streets (City Corporations) fall under other jurisdictions and are future work.
License
This LoRA adapter is released under Apache-2.0. The base model
Qwen/Qwen3-VL-8B-Instruct is governed by its own license; only the adapter is
redistributed here. The BD-ARSA dataset is released under CC BY 4.0.
Citation
bibtex
@article{chowdhury_egarsa,title = {EG-ARSA: An Expert-Grounded Open Dataset and Model for Visual Road SafetyAuditing in Low-Resource Settings},author = {Chowdhury, Md Thamed Bin Zaman and Hossain, Moazzem},year = {2026},note = {Preprint. Code: https://github.com/Thamed-Chowdhury/EG-ARSA}}
Acknowledgements
The expert ground truth derives from on-site Road Safety Audits conducted by faculty of the Accident Research Institute (ARI), Bangladesh University of Engineering and Technology (BUET), commissioned by the Local Government Engineering Department (LGED) under the World Bank–financed Second Rural Transport Improvement Project (RTIP-II, Additional Financing; P166295).
Model provider
Thamed-Chowdhury
Model tree
Base
Qwen/Qwen3-VL-8B-Instruct
Adapter
this model
Modalities
Input
Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information