Dedicated Endpoints
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Highlights
- ROUGE-L 0.581 on 3,770 DriveLM samples — 3.7× the zero-shot baseline (0.157).
- Behavior recovery without a data fix. At lr=2e-4 the behavior category collapsed to ROUGE-L 0.036 (terse mode collapse); at lr=1e-4 on the same natural-distribution data it scores 0.877 — almost matching the stratified-data run (0.911). The behavior collapse turned out to be an LR effect, not a data effect.
- Adapter is 12.8 MB — same rank/alpha as the other variants.
Eval results (3,770-sample DriveLM front-arc, vLLM)
| Metric | Baseline | This adapter (lr=1e-4) | Δ |
|---|---|---|---|
| ROUGE-1 | 0.166 | 0.591 | +0.425 |
| ROUGE-L | 0.157 | 0.581 | +0.424 |
| Token-F1 | 0.117 | 0.544 | +0.427 |
| Exact match | 0.4% | 41.9% | +41.5 pp |
| Mean per-request latency | 1,420 ms | 2,098 ms | +678 ms |
Per question category (ROUGE-L)
| Category | N | Baseline | This adapter | Δ |
|---|---|---|---|---|
| perception | 1,738 | 0.217 | 0.533 | +0.316 |
| prediction | 1,181 | 0.097 | 0.696 | +0.599 |
| planning | 813 | 0.107 | 0.503 | +0.396 |
| behavior | 38 | 0.305 | 0.877 | +0.572 |
The behavior win is the headline differentiator from the lr=2e-4 variant — see "Position in the ablation series" below.
Training Details
| Base model | Qwen/Qwen3.5-0.8B |
| Adapter type | QLoRA (NF4 4-bit base + LoRA r=8) |
| LoRA rank / alpha | 8 / 16 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Vision tower | Frozen |
| Training samples | 1,024 (natural distribution: 492 perception / 311 prediction / 211 planning / 10 behavior) |
| Camera mode | front-arc (3 cameras, ≤448 px long edge) |
| Epochs | 1 |
| Learning rate | 1e-4 (PEFT default is 2e-4) |
| Effective batch size | 1 × grad-accum 2 |
| Label masking | Loss only on assistant tokens (prompt masked to −100) |
| Hardware | Single NVIDIA RTX 2070 SUPER (8 GB) |
| Training wall clock | ~20 minutes |
| Final epoch-average loss | 0.417 |
Position in the ablation series
| Config | Sampling | lr | Epochs | Overall RL | Behavior RL |
|---|---|---|---|---|---|
| nat-1024 (canonical sibling) | natural | 2e-4 | 1 | 0.541 | 0.036 ⚠️ |
| lr1e4 (this adapter) | natural | 1e-4 | 1 | 0.581 ⭐ | 0.877 ⭐ |
| lr5e4 | natural | 5e-4 | 1 | 0.540 | 0.022 ⚠️ |
| stratified | uniform stratified | 2e-4 | 1 | 0.518 | 0.911 |
| proportional + lr1e4 | proportional w/ floor | 1e-4 | 1 | (see proportional repo) | (see proportional repo) |
Limitations
- Train/eval overlap. Training set is a subset of the eval set.
- No referent-token grounding (
<c1,CAM_FRONT,x,y>ignored). - No CAN-bus signal access for behavior ego-velocity attributes.
- nuScenes-mini scope — 38 frames, 6 scenes, daylight bias.
- Latency — produces longer outputs than the lr=2e-4 sibling (+1 second mean latency).
Usage
python
from peft import PeftModelfrom transformers import AutoProcessor, AutoModelForImageTextToTextbase = AutoModelForImageTextToText.from_pretrained("Qwen/Qwen3.5-0.8B", trust_remote_code=True)processor = AutoProcessor.from_pretrained("Qwen/Qwen3.5-0.8B", trust_remote_code=True)model = PeftModel.from_pretrained(base, "pranavthombare/qwen3.5-0.8b-drivelm-lora-lr1e4").eval()
License
Apache-2.0.
Model provider
pranavthombare
Model tree
Base
Qwen/Qwen3.5-0.8B
Adapter
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information