Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: otherIntended purpose
This LoRA is kept and published as an experimental branch for:
- code explanation
- learning-oriented coding assistance
- understanding programming problems
- step-by-step reasoning around code
- comparing OpenCodeInstruct-style behavior against a stricter code-output model
It is not ideal for:
- agentic coding
- test-driven code generation
- benchmark-style exact function output
- tools that require the model to return only executable code
- coding agents that must avoid prose/explanation unless asked
Why this is not the main version
A small local before/after Python code benchmark showed that this OpenCodeInstruct continuation reduced exact-code benchmark performance.
| Model | Adapter | Passed | Pass rate | Avg tokens/s |
|---|---|---|---|---|
| Before | heretic_F_lora_python5000_codefeedback5000 | 9/10 | 90.00% | 8.38 |
| After | SAFE_OPENCODE_5000_1024_20260607_153327 | 6/10 | 60.00% | 8.41 |
Delta:
| Metric | Value |
|---|---|
| Passes | -3 |
| Pass rate | -30.00% |
| Avg tokens/s | +0.03 |
The post-training adapter was worse on strict executable-code tasks, especially when the expected output was a compact Python function or class.
However, this does not mean the adapter is useless. It likely shifted behavior toward a more explanatory, learning-oriented style. That can be useful for users who want to understand code, reason through tasks, or receive more guided programming explanations.
Training configuration
| Item | Value |
|---|---|
| Base model | JoaoZaokk/Qwen3-4B-Thinking-2507-Heretic-CodeFeedback |
| Adapter input | heretic_F_lora_python5000_codefeedback5000 |
| Dataset | nvidia/OpenCodeInstruct |
| Samples used | 5,000 |
| Sequence length | 1024 |
| Epochs | 1 |
| Learning rate | 5e-6 |
| Training method | QLoRA / LoRA |
| Quantized loading during training | 4-bit NF4 |
Training result
| Metric | Value |
|---|---|
| Train runtime | 6258 seconds |
| Runtime | 1h 44m 18s |
| Samples/second | 0.799 |
| Steps/second | 0.1 |
| Final train loss | 0.3913 |
| First logged loss | 0.6957 |
| Last logged loss | 0.3623 |
| Minimum logged loss | 0.3441 |
The training run completed successfully after reducing sequence length and using a more conservative GPU configuration.
Benchmark files
The local benchmark artifacts are included in this repository under:
text
benchmark/
Files:
text
benchmark/before_summary.mdbenchmark/after_summary.mdbenchmark/COMPARISON.mdbenchmark/comparison.jsonbenchmark/before_results.jsonlbenchmark/after_results.jsonl
Recommended usage
Use this adapter when you want a model that may be more comfortable explaining code and reasoning through programming tasks.
For stricter agentic coding or benchmark-style executable output, prefer the original merged CodeFeedback model:
JoaoZaokk/Qwen3-4B-Thinking-2507-Heretic-CodeFeedback
Loading example
python
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfigfrom peft import PeftModelimport torchbase_model = "JoaoZaokk/Qwen3-4B-Thinking-2507-Heretic-CodeFeedback"adapter = "JoaoZaokk/Qwen3-4B-Thinking-2507-Heretic-CodeFeedback-OpenCodeInstruct-Learning-LoRA"tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)bnb_config = BitsAndBytesConfig(load_in_4bit=True,bnb_4bit_quant_type="nf4",bnb_4bit_compute_dtype=torch.float16,bnb_4bit_use_double_quant=True,)model = AutoModelForCausalLM.from_pretrained(base_model,quantization_config=bnb_config,device_map="auto",trust_remote_code=True,)model = PeftModel.from_pretrained(model, adapter)model.eval()
Important notes
This is an experimental LoRA adapter.
It should not be treated as a universal improvement over the previous CodeFeedback model. It is published for transparency, comparison, and reproducibility.
The benchmark results suggest that it is worse for strict agentic coding, but potentially useful for learning-oriented coding assistance.
Model provider
JoaoZaokk
Model tree
Base
JoaoZaokk/Qwen3-4B-Thinking-2507-Heretic-CodeFeedback
Adapter
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information