eewer
Qwen3.5-4B-Thinking-Preservation
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Qwen3.5-4B-Thinking-Preservation
Derived from Qwen/Qwen3.5-4B with a single change to the chat template:
- Thinking is always preserved across multi-turn history (append-only). Every
assistant turn keeps its
<think>...</think>reasoning, not just the latest one. - No enable/disable toggle. The generation prompt always opens
<think>; passingenable_thinking=Falsehas no effect.
This makes multi-turn agent training match evaluation (the model always sees its own prior reasoning). Model weights are identical to Qwen3.5-4B; only the chat template differs. Vision weights are unchanged (not trained in the terminal-agent recipes).
Model provider
eewer
Model tree
Base
Qwen/Qwen3.5-4B
Fine-tuned
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information