JoaoZaokk

Qwen3-4B-Thinking-2507-Heretic-CodeFeedback

README

License: other

Base model

Table with columns: Item, Value
Item	Value
Base model	`JoaoZaokk/Qwen3-4B-Thinking-2507-MiniMax-M2.1-Distill-heretic`
Architecture family	Qwen3
Parameter count	4B
Format	Hugging Face Transformers / safetensors
Tensor type	F16
Fine-tuning method	QLoRA / LoRA
Final state	Merged model

Training datasets

Table with columns: Dataset, Samples used, Notes
Dataset	Samples used	Notes
`iamtarun/python_code_instructions_18k_alpaca`	5,000	Python instruction/code examples
`m-a-p/CodeFeedback-Filtered-Instruction`	5,000	Code instruction and feedback examples

A SWE-smith trajectory experiment was tested separately, but it was not used in this final merged version.

LoRA configuration

Table with columns: Parameter, Value
Parameter	Value
LoRA rank	16
LoRA alpha	32
LoRA dropout	0.05
Sequence length	2048
Epochs per stage	1
Quantized loading	4-bit NF4
Trainable parameters	~33M
Trainable percentage	~0.81%

Target modules:

q_proj
k_proj
v_proj
o_proj
gate_proj
up_proj
down_proj

Training stages

Table with columns: Stage, Input adapter, Dataset, Output adapter
Stage	Input adapter	Dataset	Output adapter
1	Base model	Python instructions 5k	`heretic_F_lora_python_5000`
2	`heretic_F_lora_python_5000`	CodeFeedback 5k	`heretic_F_lora_python5000_codefeedback5000`
Final	Base model + final adapter	Merge	Full safetensors model

Training environment

Table with columns: Component, Version
Component	Version
Python	3.11
PyTorch	2.11.0+cu128
CUDA	12.8
Transformers	5.10.2
Datasets	5.0.0
Accelerate	1.13.0
PEFT	0.19.1
bitsandbytes	0.49.2
sentencepiece	0.2.1

Training GPU:

NVIDIA GeForce RTX 3080 Ti 12 GB

Intended use

This model is intended for local experimentation with:

Python code generation
code explanation
simple debugging
instruction-following tests
downstream conversion to GGUF, AWQ, GPTQ, or OpenVINO formats

Notes

This is an experimental model. It may produce incorrect code, unsafe suggestions, or hallucinated explanations. Outputs should be reviewed before use in production or security-sensitive environments.

Hardware compatibility estimate

This table is an approximate guide for the current merged F16 safetensors version.

Table with columns: Hardware / VRAM, Status, Notes
Hardware / VRAM	Status	Notes
6 GB VRAM	🔴 Unlikely	F16 weights are too large without heavy offload or quantization.
8 GB VRAM	🔴 Very tight	May fail or require CPU offload. Use GGUF/AWQ/INT4 instead.
10 GB VRAM	🟡 Possible	May run with low context and careful memory settings.
12 GB VRAM	🟢 Likely	Tested training/inference workflow on RTX 3080 Ti 12 GB with 4-bit loading.
16 GB VRAM	🟢 Good	Comfortable for normal local inference.

Quantized versions

Planned/recommended export formats:

Table with columns: Format, Status, Expected use
Format	Status	Expected use
F16 safetensors	🟢 Current	Full merged model, best source for conversion.
AWQ 4-bit	🟡 Planned	Better for GPU/server inference, mainly CUDA/Linux or compatible runtimes.
OpenVINO INT4 / AWQ-style compression	🟢 Planned for Intel Arc	Recommended path for Intel Arc/OpenVINO.
GGUF Q5_K_M / Q6_K / Q8_0	🟡 Planned	Recommended for LM Studio, llama.cpp, Ollama, CPU/GPU mixed inference.

Practical recommendation

For this repository, use the current F16 safetensors model as the master model.

For actual local use:

RTX 3080 Ti 12 GB or better: F16 may work, but quantized versions are preferred.
RTX 3090 24 GB: F16 and quantization workflows are much more comfortable.
Intel Arc: convert this model to OpenVINO INT4 instead of using CUDA-focused AWQ.
Low VRAM systems: wait for GGUF or INT4 builds.

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Model Details

Model Provider

JoaoZaokk

Model Tree

Base

JoaoZaokk/Qwen3-4B-Thinking-2507-MiniMax-M2.1-Distill-heretic

Fine-tuned

this model

Input Modalities

Text

Output Modalities