JoaoZaokk/heretic-f-python-codefeedback API & Inference Endpoint

Base model

Item	Value
Base model	`JoaoZaokk/Qwen3-4B-Thinking-2507-MiniMax-M2.1-Distill-heretic`
Architecture family	Qwen3
Parameter count	4B
Format	Hugging Face Transformers / safetensors
Tensor type	F16
Fine-tuning method	QLoRA / LoRA
Final state	Merged model

Training datasets

Dataset	Samples used	Notes
`iamtarun/python_code_instructions_18k_alpaca`	5,000	Python instruction/code examples
`m-a-p/CodeFeedback-Filtered-Instruction`	5,000	Code instruction and feedback examples

A SWE-smith trajectory experiment was tested separately, but it was not used in this final merged version.

LoRA configuration

Parameter	Value
LoRA rank	16
LoRA alpha	32
LoRA dropout	0.05
Sequence length	2048
Epochs per stage	1
Quantized loading	4-bit NF4
Trainable parameters	~33M
Trainable percentage	~0.81%

Target modules:

q_proj
k_proj
v_proj
o_proj
gate_proj
up_proj
down_proj

Training stages

Stage	Input adapter	Dataset	Output adapter
1	Base model	Python instructions 5k	`heretic_F_lora_python_5000`
2	`heretic_F_lora_python_5000`	CodeFeedback 5k	`heretic_F_lora_python5000_codefeedback5000`
Final	Base model + final adapter	Merge	Full safetensors model

Training environment

Component	Version
Python	3.11
PyTorch	2.11.0+cu128
CUDA	12.8
Transformers	5.10.2
Datasets	5.0.0
Accelerate	1.13.0
PEFT	0.19.1
bitsandbytes	0.49.2
sentencepiece	0.2.1
tiktoken	0.13.0
protobuf	7.35.0
pandas	3.0.3
pyarrow	24.0.0

Training GPU:

NVIDIA GeForce RTX 3080 Ti 12 GB

Intended use

This model is intended for local experimentation with:

Python code generation
code explanation
simple debugging
instruction-following tests
downstream conversion to GGUF, AWQ, GPTQ, or OpenVINO formats

Notes

This is an experimental model. It may produce incorrect code, unsafe suggestions, or hallucinated explanations. Outputs should be reviewed before use in production or security-sensitive environments.

heretic-f-python-codefeedback

Get help setting up a custom Dedicated Endpoints.

README