NIyueeE
Qwen3.5-0.8B-cocreator
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Model Details
- Base model: Qwen/Qwen3.5-0.8B
- Dataset: NIyueeE/cocreator-driving-scene - 1,227 driving scene samples, each with multi-frame video and causal text descriptions
- Fine-tuning method: QLoRA (4-bit) via Unsloth
- Vision: Native multimodal (image+text)
Training
Platform
Google Colab (colab.research.google.com) with NVIDIA A100-SXM4-40GB.
Training Log
markdown
Unsloth 2026.5.5: Fast Qwen3_5 patching. Transformers: 5.5.0.NVIDIA A100-SXM4-40GB. Num GPUs = 1. Max memory: 39.494 GB.Torch: 2.10.0+cu128. CUDA: 8.0. CUDA Toolkit: 12.8. Triton: 3.6.0Bfloat16 = TRUE. FA [Xformers = 0.0.35. FA2 = False]Num examples = 1,227 | Num Epochs = 7 | Total steps = 50Batch size per device = 128 | Gradient accumulation steps = 1Total batch size (128 x 1 x 1) = 128Trainable parameters = 13,181,952 of 866,167,872 (1.52% trained)
Loss Curve

Training Script
See finetune_cocreator_coclab.ipynb for the complete fine-tuning notebook.
Hyperparameters
| Parameter | Value |
|---|---|
| LoRA r | 16 |
| LoRA alpha | 16 |
| LoRA dropout | 0 |
| Target modules | all-linear |
| Fine-tuned layers | vision + language + attention + MLP |
| Optimizer | adamw_8bit |
| Learning rate | 5e-5 (cosine schedule) |
| Max steps | 50 |
| Epochs | 7 |
| Gradient checkpointing | unsloth |
| Resolution | 800×450 (resized) |
Usage
python
from transformers import AutoModel, AutoTokenizerimport torchmodel = AutoModel.from_pretrained("NIyueeE/Qwen3.5-0.8B-cocreator",torch_dtype=torch.bfloat16,trust_remote_code=True,)tokenizer = AutoTokenizer.from_pretrained("NIyueeE/Qwen3.5-0.8B-cocreator",trust_remote_code=True,)
Intended Use
This model is fine-tuned for driving scene causal understanding. It takes multi-frame driving images as input and generates causal relationship text descriptions. The primary use case is as a feature extractor in the ReCogDrive autonomous driving VLA training pipeline.
License
Apache 2.0
Model provider
NIyueeE
Model tree
Base
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information