Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: other

Base model

ItemValue
Base modelJoaoZaokk/Qwen3-4B-Thinking-2507-MiniMax-M2.1-Distill-heretic
Architecture familyQwen3
Parameter count4B
FormatHugging Face Transformers / safetensors
Tensor typeF16
Fine-tuning methodQLoRA / LoRA
Final stateMerged model

Training datasets

DatasetSamples usedNotes
iamtarun/python_code_instructions_18k_alpaca5,000Python instruction/code examples
m-a-p/CodeFeedback-Filtered-Instruction5,000Code instruction and feedback examples

A SWE-smith trajectory experiment was tested separately, but it was not used in this final merged version.

LoRA configuration

ParameterValue
LoRA rank16
LoRA alpha32
LoRA dropout0.05
Sequence length2048
Epochs per stage1
Quantized loading4-bit NF4
Trainable parameters~33M
Trainable percentage~0.81%

Target modules:

  • q_proj
  • k_proj
  • v_proj
  • o_proj
  • gate_proj
  • up_proj
  • down_proj

Training stages

StageInput adapterDatasetOutput adapter
1Base modelPython instructions 5kheretic_F_lora_python_5000
2heretic_F_lora_python_5000CodeFeedback 5kheretic_F_lora_python5000_codefeedback5000
FinalBase model + final adapterMergeFull safetensors model

Training environment

ComponentVersion
Python3.11
PyTorch2.11.0+cu128
CUDA12.8
Transformers5.10.2
Datasets5.0.0
Accelerate1.13.0
PEFT0.19.1
bitsandbytes0.49.2
sentencepiece0.2.1
tiktoken0.13.0
protobuf7.35.0
pandas3.0.3
pyarrow24.0.0

Training GPU:

  • NVIDIA GeForce RTX 3080 Ti 12 GB

Intended use

This model is intended for local experimentation with:

  • Python code generation
  • code explanation
  • simple debugging
  • instruction-following tests
  • downstream conversion to GGUF, AWQ, GPTQ, or OpenVINO formats

Notes

This is an experimental model. It may produce incorrect code, unsafe suggestions, or hallucinated explanations. Outputs should be reviewed before use in production or security-sensitive environments.

Hardware compatibility estimate

This table is an approximate guide for the current merged F16 safetensors version.

Hardware / VRAMStatusNotes
6 GB VRAM🔴 UnlikelyF16 weights are too large without heavy offload or quantization.
8 GB VRAM🔴 Very tightMay fail or require CPU offload. Use GGUF/AWQ/INT4 instead.
10 GB VRAM🟡 PossibleMay run with low context and careful memory settings.
12 GB VRAM🟢 LikelyTested training/inference workflow on RTX 3080 Ti 12 GB with 4-bit loading.
16 GB VRAM🟢 GoodComfortable for normal local inference.
24 GB VRAM🟢 Very goodRecommended for larger context, conversion, quantization, and experiments.
32 GB+ RAM CPU-only🟡 PossibleSlow. Better with GGUF quantized versions.

Quantized versions

Planned/recommended export formats:

FormatStatusExpected use
F16 safetensors🟢 CurrentFull merged model, best source for conversion.
AWQ 4-bit🟡 PlannedBetter for GPU/server inference, mainly CUDA/Linux or compatible runtimes.
OpenVINO INT4 / AWQ-style compression🟢 Planned for Intel ArcRecommended path for Intel Arc/OpenVINO.
GGUF Q5_K_M / Q6_K / Q8_0🟡 PlannedRecommended for LM Studio, llama.cpp, Ollama, CPU/GPU mixed inference.

Practical recommendation

For this repository, use the current F16 safetensors model as the master model.

For actual local use:

  • RTX 3080 Ti 12 GB or better: F16 may work, but quantized versions are preferred.
  • RTX 3090 24 GB: F16 and quantization workflows are much more comfortable.
  • Intel Arc: convert this model to OpenVINO INT4 instead of using CUDA-focused AWQ.
  • Low VRAM systems: wait for GGUF or INT4 builds.

Model provider

JoaoZaokk

Model tree

Base

JoaoZaokk/Qwen3-4B-Thinking-2507-MiniMax-M2.1-Distill-heretic

Fine-tuned

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today