Daniel031203/qwen-4b-thinking-grpo-mtp API & Inference Endpoint

Source Models

GRPO target model: Daniel031203/qwen-4b-thinking-stage3-grpo-lora
MTP/nextn tensor source model: unsloth/Qwen3.5-4B

What Was Changed

MTP/nextn tensors were extracted from unsloth/Qwen3.5-4B and injected into a prepared HF copy of the GRPO target model.

The original GRPO target model and MTP source model were not modified.

Injected tensor file:

mtp_heads.safetensors

The active tensor index is:

model.safetensors.index.json

Compatibility Notes

Preflight found these non-structural config differences:

model_type: target qwen3, MTP source qwen3_5
architectures: target Qwen3ForCausalLM, MTP source Qwen3_5ForConditionalGeneration

No checked structural mismatch was reported for hidden size, layer count, attention heads, KV heads, intermediate size, vocab size, RoPE theta, or max position embeddings.

Caveat

The MTP tensors were transplanted from the source/base-family model. They are expected to be shape-compatible, but they were not specifically trained on this final merged target. This is an engineering compatibility release, not a guarantee of optimal speculative decoding quality.

Config Snapshot

model_type: qwen3
architectures: ['Qwen3ForCausalLM']
hidden_size: 2560
num_hidden_layers: 36
num_attention_heads: 32
num_key_value_heads: 8
:

Files Included

This repository includes tokenizer/config files, model safetensors shards, the active safetensors index, and mtp_heads.safetensors.

Source Models

GRPO target model: Daniel031203/qwen-4b-thinking-stage3-grpo-lora
MTP/nextn tensor source model: unsloth/Qwen3.5-4B

What Was Changed

MTP/nextn tensors were extracted from unsloth/Qwen3.5-4B and injected into a prepared HF copy of the GRPO target model.

The original GRPO target model and MTP source model were not modified.

Injected tensor file:

mtp_heads.safetensors

The active tensor index is:

model.safetensors.index.json

Compatibility Notes

Preflight found these non-structural config differences:

model_type: target qwen3, MTP source qwen3_5
architectures: target Qwen3ForCausalLM, MTP source Qwen3_5ForConditionalGeneration

No checked structural mismatch was reported for hidden size, layer count, attention heads, KV heads, intermediate size, vocab size, RoPE theta, or max position embeddings.

Caveat

Config Snapshot

model_type: qwen3
architectures: ['Qwen3ForCausalLM']
hidden_size: 2560
num_hidden_layers: 36
num_attention_heads: 32
num_key_value_heads: 8
:

Files Included

This repository includes tokenizer/config files, model safetensors shards, the active safetensors index, and mtp_heads.safetensors.

qwen-4b-thinking-grpo-mtp

Get help setting up a custom Dedicated Endpoints.

README

Source Models

What Was Changed

Compatibility Notes

Caveat

Config Snapshot

Files Included

Explore FriendliAI today

README

Source Models

What Was Changed

Compatibility Notes

Caveat

Config Snapshot

Files Included