JoaoZaokk

Qwen3-4B-Thinking-2507-Heretic-CodeFeedback-Agentic-Tessa-1K-LoRA

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: other

Status

This is a candidate / experimental adapter, not a claimed major improvement.

I'll be testing some datasets to make the model better for coding, it a tiny improvement, not a game changer, but compared to the previous one this model didn't get worse.

In a small local Python coding benchmark, this adapter preserved the previous score:

Table
ModelAdapterPassedPass rateAvg tokens/s
Beforeheretic_F_lora_python5000_codefeedback50009/1090.00%7.80
Afterheretic_F_lora_tessa_agentic_1000_test9/1090.00%7.86

Delta:

Table
MetricValue
Passes0
Pass rate0.00%
Avg tokens/s+0.05

Unlike the OpenCodeInstruct continuation experiment, this Tessa-based adapter did not regress on the small strict-code benchmark.

Training configuration

Table
ItemValue
Base modelJoaoZaokk/Qwen3-4B-Thinking-2507-Heretic-CodeFeedback
Input adapterheretic_F_lora_python5000_codefeedback5000
Datasetsmirki/Agentic-Coding-Tessa
Samples used1,000
Sequence length1024
Epochs1
Learning rate1e-6
Training methodQLoRA / LoRA
Quantized loading during training4-bit NF4

Benchmark files

Benchmark artifacts are included under:

text

benchmark/

Files:

text

benchmark/before_summary.md
benchmark/after_summary.md
benchmark/COMPARISON.md
benchmark/before_results.jsonl
benchmark/after_results.jsonl

Intended use

This adapter is intended for testing:

  • agentic coding behavior
  • coding assistance
  • code generation
  • code explanation
  • tool-use style coding responses
  • continued fine-tuning experiments

It should be compared against the main CodeFeedback model before use in any serious coding workflow.

Loading example

python

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
import torch
base_model = "JoaoZaokk/Qwen3-4B-Thinking-2507-Heretic-CodeFeedback"
adapter = "JoaoZaokk/Qwen3-4B-Thinking-2507-Heretic-CodeFeedback-Agentic-Tessa-1K-LoRA"
tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True,
)
model = AutoModelForCausalLM.from_pretrained(
base_model,
quantization_config=bnb_config,
device_map="auto",
trust_remote_code=True,
)
model = PeftModel.from_pretrained(model, adapter)
model.eval()

Important notes

This is an experimental LoRA adapter.

The benchmark used here is small and should not be treated as a formal coding leaderboard. It is mainly useful for local before/after regression testing.

This adapter preserved the current local benchmark score, but further testing is needed before treating it as a better general-purpose coding model.

Model provider

JoaoZaokk

Model tree

Base

JoaoZaokk/Qwen3-4B-Thinking-2507-Heretic-CodeFeedback

Adapter

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today