Qwen3.5-0.8B-Base API & Inference Endpoint

Qwen3.5 Highlights

Qwen3.5 features the following enhancement:

Unified Vision-Language Foundation: Early fusion training on multimodal tokens achieves cross-generational parity with Qwen3 and outperforms Qwen3-VL models across reasoning, coding, agents, and visual understanding benchmarks.
Efficient Hybrid Architecture: Gated Delta Networks combined with sparse Mixture-of-Experts deliver high-throughput inference with minimal latency and cost overhead.
Scalable RL Generalization: Reinforcement learning scaled across million-agent environments with progressively complex task distributions for robust real-world adaptability.
Global Linguistic Coverage: Expanded support to 201 languages and dialects, enabling inclusive, worldwide deployment with nuanced cultural and regional understanding.
Next-Generation Training Infrastructure: Near-100% multimodal training efficiency compared to text-only training and asynchronous RL frameworks supporting massive-scale agent scaffolds and environment orchestration.

For more details, please refer to our blog post Qwen3.5.

Model Overview

Type: Causal Language Model with Vision Encoder
Training Stage: Pre-training & Post-training
Language Model
- Number of Parameters: 0.8B
- Hidden Dimension: 1024
- Token Embedding: 248320 (Padded)
- Number of Layers: 24
- Hidden Layout: 6 × (3 × (Gated DeltaNet → FFN) → 1 × (Gated Attention → FFN))
- Gated DeltaNet:
  - Number of Linear Attention Heads: 16 for V and 16 for QK
  - Head Dimension: 128
- Gated Attention:
  - Number of Attention Heads: 8 for Q and 2 for KV
  - Head Dimension: 256
  - Rotary Position Embedding Dimension: 64

Citation

If you find our work helpful, feel free to give us a cite.

bibtex
@misc{qwen3.5,
    title  = {{Qwen3.5}: Towards Native Multimodal Agents},
    author = {{Qwen Team}},
    month  = {February},
    year   = {2026},
    url    = {https://qwen.ai/blog?id=qwen3.5}
}

Qwen3.5 Highlights

Qwen3.5 features the following enhancement:

Unified Vision-Language Foundation: Early fusion training on multimodal tokens achieves cross-generational parity with Qwen3 and outperforms Qwen3-VL models across reasoning, coding, agents, and visual understanding benchmarks.
Efficient Hybrid Architecture: Gated Delta Networks combined with sparse Mixture-of-Experts deliver high-throughput inference with minimal latency and cost overhead.
Scalable RL Generalization: Reinforcement learning scaled across million-agent environments with progressively complex task distributions for robust real-world adaptability.
Global Linguistic Coverage: Expanded support to 201 languages and dialects, enabling inclusive, worldwide deployment with nuanced cultural and regional understanding.
Next-Generation Training Infrastructure: Near-100% multimodal training efficiency compared to text-only training and asynchronous RL frameworks supporting massive-scale agent scaffolds and environment orchestration.

For more details, please refer to our blog post Qwen3.5.

Model Overview

Type: Causal Language Model with Vision Encoder
Training Stage: Pre-training & Post-training
Language Model
- Number of Parameters: 0.8B
- Hidden Dimension: 1024
- Token Embedding: 248320 (Padded)
- Number of Layers: 24
- Hidden Layout: 6 × (3 × (Gated DeltaNet → FFN) → 1 × (Gated Attention → FFN))
- Gated DeltaNet:
  - Number of Linear Attention Heads: 16 for V and 16 for QK
  - Head Dimension: 128
- Gated Attention:
  - Number of Attention Heads: 8 for Q and 2 for KV
  - Head Dimension: 256
  - Rotary Position Embedding Dimension: 64

Citation

If you find our work helpful, feel free to give us a cite.

bibtex
@misc{qwen3.5,
    title  = {{Qwen3.5}: Towards Native Multimodal Agents},
    author = {{Qwen Team}},
    month  = {February},
    year   = {2026},
    url    = {https://qwen.ai/blog?id=qwen3.5}
}

Qwen3.5-0.8B-Base

README

Qwen3.5 Highlights

Model Overview

Citation

Explore FriendliAI today

README

Qwen3.5 Highlights

Model Overview

Citation