eewer

eewer

Qwen3.5-4B-Thinking-Preservation

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Qwen3.5-4B-Thinking-Preservation

Derived from Qwen/Qwen3.5-4B with a single change to the chat template:

  • Thinking is always preserved across multi-turn history (append-only). Every assistant turn keeps its <think>...</think> reasoning, not just the latest one.
  • No enable/disable toggle. The generation prompt always opens <think>; passing enable_thinking=False has no effect.

This makes multi-turn agent training match evaluation (the model always sees its own prior reasoning). Model weights are identical to Qwen3.5-4B; only the chat template differs. Vision weights are unchanged (not trained in the terminal-agent recipes).

Model provider

eewer

eewer

Model tree

Base

Qwen/Qwen3.5-4B

Fine-tuned

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today