Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Purpose
Single-model naive baseline ("merged-soup") in the PL-MoE rebuttal 4-way comparison: no_think-only / think-only / merged-soup / PL-MoE.
Merge details
- Per-tensor arithmetic mean, weights 0.5 / 0.5
- Accumulation in float32, cast back to original bf16
- Config + tokenizer copied verbatim from the no_think parent (both parents byte-identical)
- 399 tensors averaged; key sets / shapes / dtypes verified identical before merge
Model provider
ShourenWSR
Model tree
Base
ShourenWSR/Qwen3-4B-Dense-Think-30k
Base
ShourenWSR/Qwen3-4B-Dense-NoThink-30k
Merged
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information