JoaoZaokk
Qwen3-4B-Thinking-2507-Heretic-CodeFeedback-Agentic-Tessa-1K-LoRA
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: otherStatus
This is a candidate / experimental adapter, not a claimed major improvement.
I'll be testing some datasets to make the model better for coding, it a tiny improvement, not a game changer, but compared to the previous one this model didn't get worse.
In a small local Python coding benchmark, this adapter preserved the previous score:
| Model | Adapter | Passed | Pass rate | Avg tokens/s |
|---|---|---|---|---|
| Before | heretic_F_lora_python5000_codefeedback5000 | 9/10 | 90.00% | 7.80 |
| After | heretic_F_lora_tessa_agentic_1000_test | 9/10 | 90.00% | 7.86 |
Delta:
| Metric | Value |
|---|---|
| Passes | 0 |
| Pass rate | 0.00% |
| Avg tokens/s | +0.05 |
Unlike the OpenCodeInstruct continuation experiment, this Tessa-based adapter did not regress on the small strict-code benchmark.
Training configuration
| Item | Value |
|---|---|
| Base model | JoaoZaokk/Qwen3-4B-Thinking-2507-Heretic-CodeFeedback |
| Input adapter | heretic_F_lora_python5000_codefeedback5000 |
| Dataset | smirki/Agentic-Coding-Tessa |
| Samples used | 1,000 |
| Sequence length | 1024 |
| Epochs | 1 |
| Learning rate | 1e-6 |
| Training method | QLoRA / LoRA |
| Quantized loading during training | 4-bit NF4 |
Benchmark files
Benchmark artifacts are included under:
text
benchmark/
Files:
text
benchmark/before_summary.mdbenchmark/after_summary.mdbenchmark/COMPARISON.mdbenchmark/before_results.jsonlbenchmark/after_results.jsonl
Intended use
This adapter is intended for testing:
- agentic coding behavior
- coding assistance
- code generation
- code explanation
- tool-use style coding responses
- continued fine-tuning experiments
It should be compared against the main CodeFeedback model before use in any serious coding workflow.
Loading example
python
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfigfrom peft import PeftModelimport torchbase_model = "JoaoZaokk/Qwen3-4B-Thinking-2507-Heretic-CodeFeedback"adapter = "JoaoZaokk/Qwen3-4B-Thinking-2507-Heretic-CodeFeedback-Agentic-Tessa-1K-LoRA"tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)bnb_config = BitsAndBytesConfig(load_in_4bit=True,bnb_4bit_quant_type="nf4",bnb_4bit_compute_dtype=torch.float16,bnb_4bit_use_double_quant=True,)model = AutoModelForCausalLM.from_pretrained(base_model,quantization_config=bnb_config,device_map="auto",trust_remote_code=True,)model = PeftModel.from_pretrained(model, adapter)model.eval()
Important notes
This is an experimental LoRA adapter.
The benchmark used here is small and should not be treated as a formal coding leaderboard. It is mainly useful for local before/after regression testing.
This adapter preserved the current local benchmark score, but further testing is needed before treating it as a better general-purpose coding model.
Model provider
JoaoZaokk
Model tree
Base
JoaoZaokk/Qwen3-4B-Thinking-2507-Heretic-CodeFeedback
Adapter
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information