Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

What was changed

  • Quantized with bitsandbytes NF4 double-quant (bnb_4bit_quant_type=nf4, bnb_4bit_compute_dtype=bfloat16)
  • Visual tower layers kept at bf16 (llm_int8_skip_modules) — required for correct image inference
  • lm_head.weight kept at bf16 for output quality

Model family

ModelTypeBase model
Qwen/Qwen3.5-4Bf16 · VLM · source
techwithsergiu/Qwen3.5-4B-bnb-4bitBNB NF4 · VLMQwen/Qwen3.5-4B
techwithsergiu/Qwen3.5-text-4Bbf16 · text-onlyQwen/Qwen3.5-4B
techwithsergiu/Qwen3.5-text-4B-bnb-4bitBNB NF4 · text-onlyQwen3.5-text-4B
techwithsergiu/Qwen3.5-text-4B-GGUFGGUF quantsQwen3.5-text-4B

The visual tower is a bf16 overhead that scales with model size (~0.19 GB for 0.8B, ~0.62 GB for 2B/4B, ~0.85 GB for 9B). BNB-quantized models are roughly 40% of the original f16 size (exact ratio varies by size).

Fine-tuning

Text-only LoRA fine-tuning — use the text-only BNB variant as training base: techwithsergiu/Qwen3.5-text-4B-bnb-4bit

Training pipeline (QLoRA · Unsloth · TRL): github.com/techwithsergiu/qwen-qlora-train

VLM (image + text) fine-tuning — refer to the official Unsloth guide: unsloth.ai/docs/models/qwen3.5/fine-tune

Pipeline diagram

Conversion

Converted using qwen35-toolkit — a Python toolkit for BNB quantization, visual tower removal, verification and HF Hub publishing of Qwen3.5 models.


Acknowledgements

Based on Qwen/Qwen3.5-4B by the Qwen Team. If you use this model in research, please cite the original:

bibtex

@misc{qwen3.5,
title = {{Qwen3.5}: Towards Native Multimodal Agents},
author = {{Qwen Team}},
month = {February},
year = {2026},
url = {https://qwen.ai/blog?id=qwen3.5}
}

Model provider

sitatech

sitatech

Model tree

Base

Qwen/Qwen3.5-4B

Quantized

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today