josephmayo

Qwen2.5-agentic-7B-SLM-LoRA

README

License: apache-2.0

Current Proof Gate

Kaggle proof kernel: holykeys/qwen25-coder-agentic-slm-v5-rescue

Evaluation set: 50 HumanEval/MBPP-style coding tasks used for fast iteration.

Table with columns: Phase, Greedy pass@1, Coverage@K, Selected@K, Repair, Final
Phase	Greedy pass@1	Coverage@K	Selected@K	Repair	Final
Qwen2.5-Coder-7B reference harness	37/50	40/50	40/50	2/50	42/50
v5 7B adapter primary	37/50	42/50	42/50	2/50	44/50
14B rescue on primary misses	1/6	3/6	3/6	1/6	4/6
v5 combined rescue system	38/50	45/50	45/50	3/50	48/50

Lift

Against the Qwen2.5-Coder-7B reference harness result of 42/50:

LoRA-only primary system: 44/50, a +2/50 absolute improvement.
LoRA-only percentage-point lift: +4 points.
LoRA-only relative lift: +4.76%.
Full v5 rescue system: 48/50, a +6/50 absolute improvement.
Full system percentage-point lift: +12 points.
Full system relative lift: +14.29%.
Failure reduction: from 8 misses to 2 misses, a 75% reduction in failures on this gate.

The honest conclusion: the LoRA alone is a small gain. The meaningful progress is from the deterministic verifier/rescue system.

What This Is Not

This is not a claimed Claude Sonnet 4.5 replacement.

This is not a broad SWE-bench win.

This is not a proof that the raw 7B weights beat frontier models.

The release is a reproducible intermediate artifact: a compact coding model plus a verifier-oriented harness that shows a measurable improvement on a fast gate.

Required Next Benchmarks

The current gate is intentionally small. It is useful for fast iteration only. Before making larger claims, the next evaluation batch must include:

LiveCodeBench: fresh contest-style coding problems, preferably recent slices only.
BigCodeBench: broader function-level and library-use coding tasks.
SWE-bench Lite or Verified subset: repository patching with real tests.
Agentic edit tasks: file editing, test execution, patch generation, and repair loops.
Cost and latency: wall-clock time, tokens generated, GPU class, and estimated dollar cost.
Abstention rate: how often the system refuses to answer or returns no valid patch.
Invalid-output rate: markdown leakage, missing entrypoint, syntax errors, test leakage, and prose leakage.
Selector diagnostics: coverage@K, selected@K, selector gap, repair@1, and false-positive verifier selections.

Recommended Evaluation Policy

Do not push all training/eval/release work inside one notebook.

Use deterministic batches:

Baseline batch: run the base model first, no training.
Candidate batch: run the candidate model/harness on the exact same tasks.
Failure batch: collect failed tasks, failed code, verifier output, and minimal repair.
Repair batch: train or prompt only on verified repair data.
Proof batch: rerun held-out tests immediately.
Release batch: publish only if the proof gate beats the previous best.

Every batch should emit JSON summaries, task-level CSV, rollouts, error signatures, and environment metadata.

Files

adapter_model.safetensors: LoRA adapter.
adapter_config.json: PEFT configuration.
v5_rescue_release_summary.json: exact proof-run summary.
v5_rescue_eval_before_after_full_code.csv: task-level proof-run table.

Usage

python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_id = "Qwen/Qwen2.5-Coder-7B-Instruct"
adapter_id = "josephmayo/Qwen2.5-Coder-7B-agentic-SLM-LoRA"

tokenizer = AutoTokenizer.from_pretrained(base_id, trust_remote_code=True)
base = AutoModelForCausalLM.from_pretrained(base_id, device_map="auto", trust_remote_code=True)
model = PeftModel.from_pretrained(base, adapter_id)

For best results, use the model inside a strict code-only verifier harness. Do not evaluate it only by casual chat prompts.

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Model Details

Model Provider

josephmayo

Model Tree

Base

Qwen/Qwen2.5-Coder-7B-Instruct

Adapter

this model

Input Modalities

Text

Output Modalities