Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: otherSource Models
- GRPO target model:
Daniel031203/qwen-4b-thinking-stage3-grpo-lora - MTP/nextn tensor source model:
unsloth/Qwen3.5-4B
What Was Changed
MTP/nextn tensors were extracted from unsloth/Qwen3.5-4B and injected into a prepared HF copy of the GRPO target model.
The original GRPO target model and MTP source model were not modified.
Injected tensor file:
mtp_heads.safetensors
The active tensor index is:
model.safetensors.index.json
Compatibility Notes
Preflight found these non-structural config differences:
model_type: targetqwen3, MTP sourceqwen3_5architectures: targetQwen3ForCausalLM, MTP sourceQwen3_5ForConditionalGeneration
No checked structural mismatch was reported for hidden size, layer count, attention heads, KV heads, intermediate size, vocab size, RoPE theta, or max position embeddings.
Caveat
The MTP tensors were transplanted from the source/base-family model. They are expected to be shape-compatible, but they were not specifically trained on this final merged target. This is an engineering compatibility release, not a guarantee of optimal speculative decoding quality.
Config Snapshot
model_type:qwen3architectures:['Qwen3ForCausalLM']hidden_size:2560num_hidden_layers:36num_attention_heads:32num_key_value_heads:8vocab_size:151936
Files Included
This repository includes tokenizer/config files, model safetensors shards, the active safetensors index, and mtp_heads.safetensors.
Model provider
Daniel031203
Model tree
Base
Daniel031203/qwen-4b-thinking-stage3-grpo-lora
Base
unsloth/Qwen3.5-4B
Merged
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information