Qwen3.5-0.8B-cocreator API & Inference Endpoint

Model Details

Base model: Qwen/Qwen3.5-0.8B
Dataset: NIyueeE/cocreator-driving-scene - 1,227 driving scene samples, each with multi-frame video and causal text descriptions
Fine-tuning method: QLoRA (4-bit) via Unsloth
Vision: Native multimodal (image+text)

Training

Platform

Google Colab (colab.research.google.com) with NVIDIA A100-SXM4-40GB.

Training Log

markdown
Unsloth 2026.5.5: Fast Qwen3_5 patching. Transformers: 5.5.0.
NVIDIA A100-SXM4-40GB. Num GPUs = 1. Max memory: 39.494 GB.
Torch: 2.10.0+cu128. CUDA: 8.0. CUDA Toolkit: 12.8. Triton: 3.6.0
Bfloat16 = TRUE. FA [Xformers = 0.0.35. FA2 = False]

Num examples = 1,227 | Num Epochs = 7 | Total steps = 50
Batch size per device = 128 | Gradient accumulation steps = 1
Total batch size (128 x 1 x 1) = 128
Trainable parameters = 13,181,952 of 866,167,872 (1.52% trained)

Loss Curve

Training loss

Training Script

See finetune_cocreator_coclab.ipynb for the complete fine-tuning notebook.

Hyperparameters

Table with columns: Parameter, Value
Parameter	Value
LoRA r	16
LoRA alpha	16
LoRA dropout	0
Target modules	all-linear
Fine-tuned layers	vision + language + attention + MLP
Optimizer	adamw_8bit
Learning rate	5e-5 (cosine schedule)
Max steps	50
Epochs	7

Usage

python
from transformers import AutoModel, AutoTokenizer
import torch

model = AutoModel.from_pretrained(
    "NIyueeE/Qwen3.5-0.8B-cocreator",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
    "NIyueeE/Qwen3.5-0.8B-cocreator",
    trust_remote_code=True,
)

Intended Use

This model is fine-tuned for driving scene causal understanding. It takes multi-frame driving images as input and generates causal relationship text descriptions. The primary use case is as a feature extractor in the ReCogDrive autonomous driving VLA training pipeline.

License

Apache 2.0

Model Details

Base model: Qwen/Qwen3.5-0.8B
Dataset: NIyueeE/cocreator-driving-scene - 1,227 driving scene samples, each with multi-frame video and causal text descriptions
Fine-tuning method: QLoRA (4-bit) via Unsloth
Vision: Native multimodal (image+text)

Training

Platform

Google Colab (colab.research.google.com) with NVIDIA A100-SXM4-40GB.

Training Log

markdown
Unsloth 2026.5.5: Fast Qwen3_5 patching. Transformers: 5.5.0.
NVIDIA A100-SXM4-40GB. Num GPUs = 1. Max memory: 39.494 GB.
Torch: 2.10.0+cu128. CUDA: 8.0. CUDA Toolkit: 12.8. Triton: 3.6.0
Bfloat16 = TRUE. FA [Xformers = 0.0.35. FA2 = False]

Num examples = 1,227 | Num Epochs = 7 | Total steps = 50
Batch size per device = 128 | Gradient accumulation steps = 1
Total batch size (128 x 1 x 1) = 128
Trainable parameters = 13,181,952 of 866,167,872 (1.52% trained)

Loss Curve

Training loss

Training Script

See finetune_cocreator_coclab.ipynb for the complete fine-tuning notebook.

Hyperparameters

Table with columns: Parameter, Value
Parameter	Value
LoRA r	16
LoRA alpha	16
LoRA dropout	0
Target modules	all-linear
Fine-tuned layers	vision + language + attention + MLP
Optimizer	adamw_8bit
Learning rate	5e-5 (cosine schedule)
Max steps	50
Epochs	7

Usage

python
from transformers import AutoModel, AutoTokenizer
import torch

model = AutoModel.from_pretrained(
    "NIyueeE/Qwen3.5-0.8B-cocreator",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
    "NIyueeE/Qwen3.5-0.8B-cocreator",
    trust_remote_code=True,
)

Intended Use

License

Apache 2.0

Qwen3.5-0.8B-cocreator

README

Model Details

Training

Platform

Training Log

Loss Curve

Training Script

Hyperparameters

Usage

Intended Use

License

Explore FriendliAI today

README

Model Details

Training

Platform

Training Log

Loss Curve

Training Script

Hyperparameters

Usage

Intended Use

License