JoaoZaokk

Qwen3-4B-Thinking-2507-Heretic-CodeFeedback-OpenCodeInstruct-Learning-LoRA

Deploy Dedicated

README

License: other

Intended purpose

This LoRA is kept and published as an experimental branch for:

code explanation
learning-oriented coding assistance
understanding programming problems
step-by-step reasoning around code
comparing OpenCodeInstruct-style behavior against a stricter code-output model

It is not ideal for:

agentic coding
test-driven code generation
benchmark-style exact function output
tools that require the model to return only executable code
coding agents that must avoid prose/explanation unless asked

Why this is not the main version

A small local before/after Python code benchmark showed that this OpenCodeInstruct continuation reduced exact-code benchmark performance.

Table with columns: Model, Adapter, Passed, Pass rate, Avg tokens/s
Model	Adapter	Passed	Pass rate	Avg tokens/s
Before	`heretic_F_lora_python5000_codefeedback5000`	9/10	90.00%	8.38
After	`SAFE_OPENCODE_5000_1024_20260607_153327`	6/10	60.00%	8.41

Delta:

Table with columns: Metric, Value
Metric	Value
Passes	-3
Pass rate	-30.00%
Avg tokens/s	+0.03

The post-training adapter was worse on strict executable-code tasks, especially when the expected output was a compact Python function or class.

However, this does not mean the adapter is useless. It likely shifted behavior toward a more explanatory, learning-oriented style. That can be useful for users who want to understand code, reason through tasks, or receive more guided programming explanations.

Training configuration

Table with columns: Item, Value
Item	Value
Base model	`JoaoZaokk/Qwen3-4B-Thinking-2507-Heretic-CodeFeedback`
Adapter input	`heretic_F_lora_python5000_codefeedback5000`
Dataset	`nvidia/OpenCodeInstruct`
Samples used	5,000
Sequence length	1024
Epochs	1
Learning rate	5e-6

Training result

Table with columns: Metric, Value
Metric	Value
Train runtime	6258 seconds
Runtime	1h 44m 18s
Samples/second	0.799
Steps/second	0.1
Final train loss	0.3913
First logged loss	0.6957
Last logged loss	0.3623
Minimum logged loss	0.3441

The training run completed successfully after reducing sequence length and using a more conservative GPU configuration.

Benchmark files

The local benchmark artifacts are included in this repository under:

text
benchmark/

Files:

text
benchmark/before_summary.md
benchmark/after_summary.md
benchmark/COMPARISON.md
benchmark/comparison.json
benchmark/before_results.jsonl
benchmark/after_results.jsonl

Recommended usage

Use this adapter when you want a model that may be more comfortable explaining code and reasoning through programming tasks.

For stricter agentic coding or benchmark-style executable output, prefer the original merged CodeFeedback model:

JoaoZaokk/Qwen3-4B-Thinking-2507-Heretic-CodeFeedback

Loading example

python
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
import torch

base_model = "JoaoZaokk/Qwen3-4B-Thinking-2507-Heretic-CodeFeedback"
adapter = "JoaoZaokk/Qwen3-4B-Thinking-2507-Heretic-CodeFeedback-OpenCodeInstruct-Learning-LoRA"

tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
)

model = AutoModelForCausalLM.from_pretrained(
    base_model,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)

model = PeftModel.from_pretrained(model, adapter)
model.eval()

Important notes

This is an experimental LoRA adapter.

It should not be treated as a universal improvement over the previous CodeFeedback model. It is published for transparency, comparison, and reproducibility.

The benchmark results suggest that it is worse for strict agentic coding, but potentially useful for learning-oriented coding assistance.

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Model Details

Model Provider