Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Results (held-out, location-disjoint test set, n = 3,447)

Model / settingRisk QWKExact-risk accuracy
Zero-shot base (Qwen3-VL-8B-Instruct)0.077 [0.044, 0.108]0.544
EG-ARSA (this adapter)0.482 [0.454, 0.510]0.717

Fine-tuning lifts ordinal risk agreement by +0.40 QWK (non-overlapping bootstrap CIs). Under a blind human-expert evaluation, EG-ARSA is risk-correct 81% of the time vs 58% for Gemini-2.5-flash and 42% for the 31B teacher run leakage-free; fully automated risk accuracy reproduces the ranking (0.74 / 0.59 / 0.36). Per-class F1 (raw operating point): Low 0.04 / Medium 0.67 / High 0.77 — residual errors are predominantly between adjacent risk levels. See the reports/ folder and the paper for the full evaluation, including the multi-model comparison and human-eval rubric.

Usage

python

import torch
from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
from peft import PeftModel
from PIL import Image
BASE = "Qwen/Qwen3-VL-8B-Instruct"
LORA = "Thamed-Chowdhury/eg-arsa-qwen3vl-8b-lora"
proc = AutoProcessor.from_pretrained(BASE, trust_remote_code=True)
model = Qwen3VLForConditionalGeneration.from_pretrained(BASE, dtype=torch.bfloat16, device_map="cuda")
model = PeftModel.from_pretrained(model, LORA).eval()
SYSTEM = ("You are a road-safety auditor applying the LGED (Local Government Engineering "
"Department) 12-category visual audit methodology to road imagery in Bangladesh.")
# The canonical leakage-free instruction is FULL_AUDIT_INSTRUCTION in
# prompts/finetune_prompts.py (shipped with the dataset and the code repo).
INSTRUCTION = "Audit this road image for safety hazards. Return the structured JSON audit."
img = Image.open("road.jpg").convert("RGB")
msgs = [
{"role": "system", "content": [{"type": "text", "text": SYSTEM}]},
{"role": "user", "content": [{"type": "image", "image": img},
{"type": "text", "text": INSTRUCTION}]},
]
text = proc.apply_chat_template(msgs, tokenize=False, add_generation_prompt=True)
inputs = proc(text=[text], images=[img], return_tensors="pt").to("cuda")
out = model.generate(**inputs, max_new_tokens=1024, do_sample=False)
print(proc.tokenizer.decode(out[0, inputs["input_ids"].shape[1]:], skip_special_tokens=True))

For exact parity with the paper, use the canonical FULL_AUDIT_INSTRUCTION (full road scene) / SINGLE_HAZARD_INSTRUCTION from prompts/finetune_prompts.py and 1024-px native resolution. See the code repo for the wrapped inference helper (apps/streetview_infer.py) and the evaluation pipeline.

Training

  • Base: Qwen3-VL-8B-Instruct, vision encoder frozen.
  • LoRA: r=16, α=32, dropout 0.05, targets = LM attention q/k/v/o_proj.
  • Precision/memory: bf16, gradient checkpointing, effective batch 16 (micro 2 × accum 8).
  • Resolution: 1024 px (native max dimension, selected by a zero-shot resolution probe).
  • Objective: per-task normalized cross-entropy, weights hazard 1.0 / risk 1.0 / recommendation 0.5, with per-record loss masking by tasks_available.
  • Imbalance: train-only logit adjustment (τ=1) on the risk-token logits (priors Low 225 / Med 6,340 / High 9,517); raw logits at inference, with an optional post-hoc operating-point offset fit on validation.
  • Schedule: LR 1e-4 cosine, 3% warmup, 2 epochs, early-stop on validation QWK.
  • Data: BD-ARSA (train 16,082 records).
  • Compute: a single NVIDIA A100 40 GB, ≈ 6.3 h.

Intended use & limitations

Intended use. Proactive, low-cost screening of rural/suburban (LGED-class) road imagery to surface visible safety hazards and an interpretable risk rating where formal Road Safety Audits are unaffordable. It is a decision-support tool, not a replacement for a formal multidisciplinary RSA.

Limitations / out of scope.

  • Audits a single street-view image: full road geometry and the non-visual extremes of skid_resistance (friction) and drainage (wet-weather behaviour) are recoverable only in obvious cases. All 12 categories are reported; these two score lowest.
  • Targets the rural/suburban LGED road class. National highways (RHD) and dense city streets (City Corporations) fall under other jurisdictions and are future work.

License

This LoRA adapter is released under Apache-2.0. The base model Qwen/Qwen3-VL-8B-Instruct is governed by its own license; only the adapter is redistributed here. The BD-ARSA dataset is released under CC BY 4.0.

Citation

bibtex

@article{chowdhury_egarsa,
title = {EG-ARSA: An Expert-Grounded Open Dataset and Model for Visual Road Safety
Auditing in Low-Resource Settings},
author = {Chowdhury, Md Thamed Bin Zaman and Hossain, Moazzem},
year = {2026},
note = {Preprint. Code: https://github.com/Thamed-Chowdhury/EG-ARSA}
}

Acknowledgements

The expert ground truth derives from on-site Road Safety Audits conducted by faculty of the Accident Research Institute (ARI), Bangladesh University of Engineering and Technology (BUET), commissioned by the Local Government Engineering Department (LGED) under the World Bank–financed Second Rural Transport Improvement Project (RTIP-II, Additional Financing; P166295).

Model provider

Thamed-Chowdhury

Thamed-Chowdhury

Model tree

Base

Qwen/Qwen3-VL-8B-Instruct

Adapter

this model

Modalities

Input

Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today