Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: otherBase model
| Item | Value |
|---|---|
| Base model | JoaoZaokk/Qwen3-4B-Thinking-2507-MiniMax-M2.1-Distill-heretic |
| Architecture family | Qwen3 |
| Parameter count | 4B |
| Format | Hugging Face Transformers / safetensors |
| Tensor type | F16 |
| Fine-tuning method | QLoRA / LoRA |
| Final state | Merged model |
Training datasets
| Dataset | Samples used | Notes |
|---|---|---|
iamtarun/python_code_instructions_18k_alpaca | 5,000 | Python instruction/code examples |
m-a-p/CodeFeedback-Filtered-Instruction | 5,000 | Code instruction and feedback examples |
A SWE-smith trajectory experiment was tested separately, but it was not used in this final merged version.
LoRA configuration
| Parameter | Value |
|---|---|
| LoRA rank | 16 |
| LoRA alpha | 32 |
| LoRA dropout | 0.05 |
| Sequence length | 2048 |
| Epochs per stage | 1 |
| Quantized loading | 4-bit NF4 |
| Trainable parameters | ~33M |
| Trainable percentage | ~0.81% |
Target modules:
q_projk_projv_projo_projgate_projup_projdown_proj
Training stages
| Stage | Input adapter | Dataset | Output adapter |
|---|---|---|---|
| 1 | Base model | Python instructions 5k | heretic_F_lora_python_5000 |
| 2 | heretic_F_lora_python_5000 | CodeFeedback 5k | heretic_F_lora_python5000_codefeedback5000 |
| Final | Base model + final adapter | Merge | Full safetensors model |
Training environment
| Component | Version |
|---|---|
| Python | 3.11 |
| PyTorch | 2.11.0+cu128 |
| CUDA | 12.8 |
| Transformers | 5.10.2 |
| Datasets | 5.0.0 |
| Accelerate | 1.13.0 |
| PEFT | 0.19.1 |
| bitsandbytes | 0.49.2 |
| sentencepiece | 0.2.1 |
| tiktoken | 0.13.0 |
| protobuf | 7.35.0 |
| pandas | 3.0.3 |
| pyarrow | 24.0.0 |
Training GPU:
- NVIDIA GeForce RTX 3080 Ti 12 GB
Intended use
This model is intended for local experimentation with:
- Python code generation
- code explanation
- simple debugging
- instruction-following tests
- downstream conversion to GGUF, AWQ, GPTQ, or OpenVINO formats
Notes
This is an experimental model. It may produce incorrect code, unsafe suggestions, or hallucinated explanations. Outputs should be reviewed before use in production or security-sensitive environments.
Model provider
JoaoZaokk
Model tree
Base
JoaoZaokk/Qwen3-4B-Thinking-2507-MiniMax-M2.1-Distill-heretic
Fine-tuned
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information