Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Eval results (3,770-sample DriveLM front-arc, vLLM)

MetricBaselineThis adapter (prop + lr=1e-4)Δ
ROUGE-10.1660.627+0.461
ROUGE-20.0690.257+0.188
ROUGE-L0.1570.621+0.464
Token-F10.1170.602+0.485
Exact match0.4%47.4%+47.0 pp
Mean per-request latency1,420 ms1,858 ms+438 ms

Per question category (ROUGE-L)

CategoryNBaselineThis adapter
perception1,7380.2170.625
prediction1,1810.0970.682
planning8130.1070.543
behavior380.3050.201

Best-of-series for three of four categories. Behavior is the trade-off (next section).

Position in the ablation series

ConfigSamplinglrOverall RLPerceptionPredictionPlanningBehavior
nat 2e-4natural2e-40.5410.4890.6590.5020.036
nat 1e-4natural1e-40.5810.5330.6960.5030.877
nat 5e-4natural5e-40.5400.5130.6170.5090.022
stratifieduniform2e-40.5180.6150.3680.5070.911
prop 1e-4 (this)proportional w/ floor1e-40.6210.6250.6820.5430.201

Different configs win different production targets:

  • For behavior-heavy use cases (ego-status, predictability) → use nat 1e-4
  • For overall quality + perception/prediction/planning → use this adapter (prop 1e-4)

The trade-off: why behavior is 0.201 here vs 0.877 in lr1e4

Proportional sampling injects all 38 behavior samples × 4 upsample = 152 instances into training — identical to the uniform-stratified variant. So the behavior gradient signal is the same.

The difference is in the competing other-category gradients. Proportional sampling preserves the natural answer-pattern distribution within perception/prediction/planning (e.g. prediction stays No-heavy at 85/15/40/110 instead of forced 50/50/50/100). This is harder to fit — the LoRA's r=8 capacity gets pulled toward the dominant patterns of the larger categories. The 152 behavior signals get partially crowded out.

A weighted variant with behavior upsample 8× or 12× would likely close the behavior gap while keeping the overall wins. That's the obvious next experiment.

Training Details

Base modelQwen/Qwen3.5-0.8B
Adapter typeQLoRA (NF4 4-bit base + LoRA r=8)
LoRA rank / alpha8 / 16
Target modulesq_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Vision towerFrozen
SamplingProportional within each category × answer-pattern, min-floor 15
Training samples902 (250 perception + 250 prediction + 250 planning + 38 behavior × 4)
Camera modefront-arc (3 cameras, ≤448 px long edge)
Epochs1
Learning rate1e-4
Effective batch size1 × grad-accum 2
Label maskingLoss only on assistant tokens (prompt masked to −100)
HardwareSingle NVIDIA RTX 2070 SUPER (8 GB)
Training wall clock~17 minutes
Final epoch-average loss0.440

Reproducing this adapter

bash

DRIVELM_TRAIN__SAMPLING=proportional \
DRIVELM_TRAIN__LR=1e-4 \
DRIVELM_TRAIN__OUTPUT_DIR=models/qwen-lora-prop-lr1e4 \
.venv/bin/python src/train/finetune.py

The proportional sampler is in src/data/pipeline.py::proportional_samples.

Usage

python

from peft import PeftModel
from transformers import AutoProcessor, AutoModelForImageTextToText
base = AutoModelForImageTextToText.from_pretrained("Qwen/Qwen3.5-0.8B", trust_remote_code=True)
processor = AutoProcessor.from_pretrained("Qwen/Qwen3.5-0.8B", trust_remote_code=True)
model = PeftModel.from_pretrained(base, "pranavthombare/qwen3.5-0.8b-drivelm-lora-proportional").eval()

Limitations

  1. Train/eval overlap. Training set is a subset of the eval set.
  2. Behavior trade-off. This adapter scores 0.201 on behavior vs 0.877 for the lr=1e-4 natural sibling. Choose the right adapter for your use case.
  3. No referent-token grounding (<c1,CAM_FRONT,x,y> ignored).
  4. No CAN-bus signal access for behavior ego-velocity attributes.
  5. nuScenes-mini scope — 38 frames, 6 scenes, daylight bias.

License

Apache-2.0.

Framework versions

  • PEFT 0.19.1
  • transformers (HuggingFace main as of training date)
  • bitsandbytes 0.49.2

Model provider

pranavthombare

pranavthombare

Model tree

Base

Qwen/Qwen3.5-0.8B

Adapter

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today