NIyueeE

Qwen3.5-0.8B-cocreator

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Model Details

  • Base model: Qwen/Qwen3.5-0.8B
  • Dataset: NIyueeE/cocreator-driving-scene - 1,227 driving scene samples, each with multi-frame video and causal text descriptions
  • Fine-tuning method: QLoRA (4-bit) via Unsloth
  • Vision: Native multimodal (image+text)

Training

Platform

Google Colab (colab.research.google.com) with NVIDIA A100-SXM4-40GB.

Training Log

markdown

Unsloth 2026.5.5: Fast Qwen3_5 patching. Transformers: 5.5.0.
NVIDIA A100-SXM4-40GB. Num GPUs = 1. Max memory: 39.494 GB.
Torch: 2.10.0+cu128. CUDA: 8.0. CUDA Toolkit: 12.8. Triton: 3.6.0
Bfloat16 = TRUE. FA [Xformers = 0.0.35. FA2 = False]
Num examples = 1,227 | Num Epochs = 7 | Total steps = 50
Batch size per device = 128 | Gradient accumulation steps = 1
Total batch size (128 x 1 x 1) = 128
Trainable parameters = 13,181,952 of 866,167,872 (1.52% trained)

Loss Curve

Training loss

Training Script

See finetune_cocreator_coclab.ipynb for the complete fine-tuning notebook.

Hyperparameters

Table
ParameterValue
LoRA r16
LoRA alpha16
LoRA dropout0
Target modulesall-linear
Fine-tuned layersvision + language + attention + MLP
Optimizeradamw_8bit
Learning rate5e-5 (cosine schedule)
Max steps50
Epochs7
Gradient checkpointingunsloth
Resolution800×450 (resized)

Usage

python

from transformers import AutoModel, AutoTokenizer
import torch
model = AutoModel.from_pretrained(
"NIyueeE/Qwen3.5-0.8B-cocreator",
torch_dtype=torch.bfloat16,
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
"NIyueeE/Qwen3.5-0.8B-cocreator",
trust_remote_code=True,
)

Intended Use

This model is fine-tuned for driving scene causal understanding. It takes multi-frame driving images as input and generates causal relationship text descriptions. The primary use case is as a feature extractor in the ReCogDrive autonomous driving VLA training pipeline.

License

Apache 2.0

Model provider

NIyueeE

Model tree

Base

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today