JoaoZaokk

Qwen3-4B-Thinking-2507-Heretic-CodeFeedback-Agentic-Tessa-8K-2048-Experimental-LoRA

README

License: other

Status

This is an experimental adapter kept for transparency, comparison, and future analysis.

In a small local Python coding benchmark, this adapter regressed compared with the previous CodeFeedback checkpoint.

Table with columns: Model, Adapter, Passed, Pass rate, Avg tokens/s
Model	Adapter	Passed	Pass rate	Avg tokens/s
Before	`heretic_F_lora_python5000_codefeedback5000`	9/10	90.00%	9.54
After	`NOITE_3090_TESSA_8000_2048`	7/10	70.00%	9.28

Delta:

Table with columns: Metric, Value
Metric	Value
Passes	-2
Pass rate	-20.00%
Avg tokens/s	-0.26

Observed behavior

The adapter did not fail completely, but it became worse at strict executable-code output.

Observed regressions:

flatten failed with a type error.
valid_parentheses failed to output executable code.
lru_cache remained incomplete.
The model showed more explanatory / agentic behavior instead of always returning compact executable code.

This suggests that the larger Agentic Tessa continuation pushed the model toward a more verbose agentic-assistant style, which may be useful for some workflows but is worse for strict code-output benchmarks.

Training configuration

Table with columns: Item, Value
Item	Value
Base model	`JoaoZaokk/Qwen3-4B-Thinking-2507-Heretic-CodeFeedback`
Input adapter	`heretic_F_lora_tessa_agentic_1000_test`
Dataset	`smirki/Agentic-Coding-Tessa`
Samples used	8,000
Sequence length	2048
Epochs	1
Learning rate	7e-7

Training result

Table with columns: Metric, Value
Metric	Value
Train runtime	9421 seconds
Runtime	2h 37m 00s
Samples/second	0.849
Steps/second	0.106
Final train loss	1.178
First logged loss	1.509
Last logged loss	1.072

Benchmark files

Benchmark artifacts are included under:

text
benchmark/

Files:

text
benchmark/before_summary.md
benchmark/after_summary.md
benchmark/COMPARISON.md
benchmark/comparison.json
benchmark/before_results.jsonl
benchmark/after_results.jsonl

Intended use

This adapter is intended for:

comparison against the previous CodeFeedback checkpoint
studying regression from larger agentic fine-tuning
analyzing output-style drift
future experiments with smaller learning rates or filtered datasets

It is not recommended for strict agentic coding workflows that require compact executable code output.

For the stronger current baseline, prefer:

JoaoZaokk/Qwen3-4B-Thinking-2507-Heretic-CodeFeedback

Loading example

python
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
import torch

base_model = "JoaoZaokk/Qwen3-4B-Thinking-2507-Heretic-CodeFeedback"
adapter = "JoaoZaokk/Qwen3-4B-Thinking-2507-Heretic-CodeFeedback-Agentic-Tessa-8K-2048-Experimental-LoRA"

tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
)

model = AutoModelForCausalLM.from_pretrained(
    base_model,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)

model = PeftModel.from_pretrained(model, adapter)
model.eval()

Important notes

This is an experimental LoRA adapter.

It should not be treated as a universal improvement over the previous CodeFeedback model.

The benchmark used here is small and should not be treated as a formal coding leaderboard. It is mainly useful for local before/after regression testing.

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Model Details

Model Provider