Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Highlights

  • ROUGE-L 0.581 on 3,770 DriveLM samples — 3.7× the zero-shot baseline (0.157).
  • Behavior recovery without a data fix. At lr=2e-4 the behavior category collapsed to ROUGE-L 0.036 (terse mode collapse); at lr=1e-4 on the same natural-distribution data it scores 0.877 — almost matching the stratified-data run (0.911). The behavior collapse turned out to be an LR effect, not a data effect.
  • Adapter is 12.8 MB — same rank/alpha as the other variants.

Eval results (3,770-sample DriveLM front-arc, vLLM)

MetricBaselineThis adapter (lr=1e-4)Δ
ROUGE-10.1660.591+0.425
ROUGE-L0.1570.581+0.424
Token-F10.1170.544+0.427
Exact match0.4%41.9%+41.5 pp
Mean per-request latency1,420 ms2,098 ms+678 ms

Per question category (ROUGE-L)

CategoryNBaselineThis adapterΔ
perception1,7380.2170.533+0.316
prediction1,1810.0970.696+0.599
planning8130.1070.503+0.396
behavior380.3050.877+0.572

The behavior win is the headline differentiator from the lr=2e-4 variant — see "Position in the ablation series" below.

Training Details

Base modelQwen/Qwen3.5-0.8B
Adapter typeQLoRA (NF4 4-bit base + LoRA r=8)
LoRA rank / alpha8 / 16
Target modulesq_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Vision towerFrozen
Training samples1,024 (natural distribution: 492 perception / 311 prediction / 211 planning / 10 behavior)
Camera modefront-arc (3 cameras, ≤448 px long edge)
Epochs1
Learning rate1e-4 (PEFT default is 2e-4)
Effective batch size1 × grad-accum 2
Label maskingLoss only on assistant tokens (prompt masked to −100)
HardwareSingle NVIDIA RTX 2070 SUPER (8 GB)
Training wall clock~20 minutes
Final epoch-average loss0.417

Position in the ablation series

ConfigSamplinglrEpochsOverall RLBehavior RL
nat-1024 (canonical sibling)natural2e-410.5410.036 ⚠️
lr1e4 (this adapter)natural1e-410.5810.877
lr5e4natural5e-410.5400.022 ⚠️
stratifieduniform stratified2e-410.5180.911
proportional + lr1e4proportional w/ floor1e-41(see proportional repo)(see proportional repo)

Limitations

  1. Train/eval overlap. Training set is a subset of the eval set.
  2. No referent-token grounding (<c1,CAM_FRONT,x,y> ignored).
  3. No CAN-bus signal access for behavior ego-velocity attributes.
  4. nuScenes-mini scope — 38 frames, 6 scenes, daylight bias.
  5. Latency — produces longer outputs than the lr=2e-4 sibling (+1 second mean latency).

Usage

python

from peft import PeftModel
from transformers import AutoProcessor, AutoModelForImageTextToText
base = AutoModelForImageTextToText.from_pretrained("Qwen/Qwen3.5-0.8B", trust_remote_code=True)
processor = AutoProcessor.from_pretrained("Qwen/Qwen3.5-0.8B", trust_remote_code=True)
model = PeftModel.from_pretrained(base, "pranavthombare/qwen3.5-0.8b-drivelm-lora-lr1e4").eval()

License

Apache-2.0.

Model provider

pranavthombare

pranavthombare

Model tree

Base

Qwen/Qwen3.5-0.8B

Adapter

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today