eewer

Qwen3.5-4B-Thinking-Preservation

Deploy Dedicated

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Qwen3.5-4B-Thinking-Preservation

Derived from Qwen/Qwen3.5-4B with a single change to the chat template:

Thinking is always preserved across multi-turn history (append-only). Every assistant turn keeps its <think>...</think> reasoning, not just the latest one.
No enable/disable toggle. The generation prompt always opens <think>; passing enable_thinking=False has no effect.

This makes multi-turn agent training match evaluation (the model always sees its own prior reasoning). Model weights are identical to Qwen3.5-4B; only the chat template differs. Vision weights are unchanged (not trained in the terminal-agent recipes).

Model provider

eewer

Model tree

Base

Qwen/Qwen3.5-4B

Fine-tuned

this model

Modalities

Input

Video, Text, Image

Output

Text