Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: other

Source Models

  • GRPO target model: Daniel031203/qwen-4b-thinking-stage3-grpo-lora
  • MTP/nextn tensor source model: unsloth/Qwen3.5-4B

What Was Changed

MTP/nextn tensors were extracted from unsloth/Qwen3.5-4B and injected into a prepared HF copy of the GRPO target model.

The original GRPO target model and MTP source model were not modified.

Injected tensor file:

  • mtp_heads.safetensors

The active tensor index is:

  • model.safetensors.index.json

Compatibility Notes

Preflight found these non-structural config differences:

  • model_type: target qwen3, MTP source qwen3_5
  • architectures: target Qwen3ForCausalLM, MTP source Qwen3_5ForConditionalGeneration

No checked structural mismatch was reported for hidden size, layer count, attention heads, KV heads, intermediate size, vocab size, RoPE theta, or max position embeddings.

Caveat

The MTP tensors were transplanted from the source/base-family model. They are expected to be shape-compatible, but they were not specifically trained on this final merged target. This is an engineering compatibility release, not a guarantee of optimal speculative decoding quality.

Config Snapshot

  • model_type: qwen3
  • architectures: ['Qwen3ForCausalLM']
  • hidden_size: 2560
  • num_hidden_layers: 36
  • num_attention_heads: 32
  • num_key_value_heads: 8
  • vocab_size: 151936

Files Included

This repository includes tokenizer/config files, model safetensors shards, the active safetensors index, and mtp_heads.safetensors.

Model provider

Daniel031203

Model tree

Base

Daniel031203/qwen-4b-thinking-stage3-grpo-lora

Base

unsloth/Qwen3.5-4B

Merged

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today