Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Current Proof Gate

Kaggle proof kernel: holykeys/qwen25-coder-agentic-slm-v5-rescue

Evaluation set: 50 HumanEval/MBPP-style coding tasks used for fast iteration.

PhaseGreedy pass@1Coverage@KSelected@KRepairFinal
Qwen2.5-Coder-7B reference harness37/5040/5040/502/5042/50
v5 7B adapter primary37/5042/5042/502/5044/50
14B rescue on primary misses1/63/63/61/64/6
v5 combined rescue system38/5045/5045/503/5048/50

Lift

Against the Qwen2.5-Coder-7B reference harness result of 42/50:

  • LoRA-only primary system: 44/50, a +2/50 absolute improvement.
  • LoRA-only percentage-point lift: +4 points.
  • LoRA-only relative lift: +4.76%.
  • Full v5 rescue system: 48/50, a +6/50 absolute improvement.
  • Full system percentage-point lift: +12 points.
  • Full system relative lift: +14.29%.
  • Failure reduction: from 8 misses to 2 misses, a 75% reduction in failures on this gate.

The honest conclusion: the LoRA alone is a small gain. The meaningful progress is from the deterministic verifier/rescue system.

What This Is Not

This is not a claimed Claude Sonnet 4.5 replacement.

This is not a broad SWE-bench win.

This is not a proof that the raw 7B weights beat frontier models.

The release is a reproducible intermediate artifact: a compact coding model plus a verifier-oriented harness that shows a measurable improvement on a fast gate.

Required Next Benchmarks

The current gate is intentionally small. It is useful for fast iteration only. Before making larger claims, the next evaluation batch must include:

  • LiveCodeBench: fresh contest-style coding problems, preferably recent slices only.
  • BigCodeBench: broader function-level and library-use coding tasks.
  • SWE-bench Lite or Verified subset: repository patching with real tests.
  • Agentic edit tasks: file editing, test execution, patch generation, and repair loops.
  • Cost and latency: wall-clock time, tokens generated, GPU class, and estimated dollar cost.
  • Abstention rate: how often the system refuses to answer or returns no valid patch.
  • Invalid-output rate: markdown leakage, missing entrypoint, syntax errors, test leakage, and prose leakage.
  • Selector diagnostics: coverage@K, selected@K, selector gap, repair@1, and false-positive verifier selections.

Recommended Evaluation Policy

Do not push all training/eval/release work inside one notebook.

Use deterministic batches:

  1. Baseline batch: run the base model first, no training.
  2. Candidate batch: run the candidate model/harness on the exact same tasks.
  3. Failure batch: collect failed tasks, failed code, verifier output, and minimal repair.
  4. Repair batch: train or prompt only on verified repair data.
  5. Proof batch: rerun held-out tests immediately.
  6. Release batch: publish only if the proof gate beats the previous best.

Every batch should emit JSON summaries, task-level CSV, rollouts, error signatures, and environment metadata.

Files

  • adapter_model.safetensors: LoRA adapter.
  • adapter_config.json: PEFT configuration.
  • v5_rescue_release_summary.json: exact proof-run summary.
  • v5_rescue_eval_before_after_full_code.csv: task-level proof-run table.

Usage

python

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base_id = "Qwen/Qwen2.5-Coder-7B-Instruct"
adapter_id = "josephmayo/Qwen2.5-Coder-7B-agentic-SLM-LoRA"
tokenizer = AutoTokenizer.from_pretrained(base_id, trust_remote_code=True)
base = AutoModelForCausalLM.from_pretrained(base_id, device_map="auto", trust_remote_code=True)
model = PeftModel.from_pretrained(base, adapter_id)

For best results, use the model inside a strict code-only verifier harness. Do not evaluate it only by casual chat prompts.

Model provider

josephmayo

Model tree

Base

Qwen/Qwen2.5-Coder-7B-Instruct

Adapter

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today