Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Current Proof Gate
Kaggle proof kernel: holykeys/qwen25-coder-agentic-slm-v5-rescue
Evaluation set: 50 HumanEval/MBPP-style coding tasks used for fast iteration.
| Phase | Greedy pass@1 | Coverage@K | Selected@K | Repair | Final |
|---|---|---|---|---|---|
| Qwen2.5-Coder-7B reference harness | 37/50 | 40/50 | 40/50 | 2/50 | 42/50 |
| v5 7B adapter primary | 37/50 | 42/50 | 42/50 | 2/50 | 44/50 |
| 14B rescue on primary misses | 1/6 | 3/6 | 3/6 | 1/6 | 4/6 |
| v5 combined rescue system | 38/50 | 45/50 | 45/50 | 3/50 | 48/50 |
Lift
Against the Qwen2.5-Coder-7B reference harness result of 42/50:
- LoRA-only primary system:
44/50, a+2/50absolute improvement. - LoRA-only percentage-point lift:
+4 points. - LoRA-only relative lift:
+4.76%. - Full v5 rescue system:
48/50, a+6/50absolute improvement. - Full system percentage-point lift:
+12 points. - Full system relative lift:
+14.29%. - Failure reduction: from
8misses to2misses, a75%reduction in failures on this gate.
The honest conclusion: the LoRA alone is a small gain. The meaningful progress is from the deterministic verifier/rescue system.
What This Is Not
This is not a claimed Claude Sonnet 4.5 replacement.
This is not a broad SWE-bench win.
This is not a proof that the raw 7B weights beat frontier models.
The release is a reproducible intermediate artifact: a compact coding model plus a verifier-oriented harness that shows a measurable improvement on a fast gate.
Required Next Benchmarks
The current gate is intentionally small. It is useful for fast iteration only. Before making larger claims, the next evaluation batch must include:
- LiveCodeBench: fresh contest-style coding problems, preferably recent slices only.
- BigCodeBench: broader function-level and library-use coding tasks.
- SWE-bench Lite or Verified subset: repository patching with real tests.
- Agentic edit tasks: file editing, test execution, patch generation, and repair loops.
- Cost and latency: wall-clock time, tokens generated, GPU class, and estimated dollar cost.
- Abstention rate: how often the system refuses to answer or returns no valid patch.
- Invalid-output rate: markdown leakage, missing entrypoint, syntax errors, test leakage, and prose leakage.
- Selector diagnostics: coverage@K, selected@K, selector gap, repair@1, and false-positive verifier selections.
Recommended Evaluation Policy
Do not push all training/eval/release work inside one notebook.
Use deterministic batches:
- Baseline batch: run the base model first, no training.
- Candidate batch: run the candidate model/harness on the exact same tasks.
- Failure batch: collect failed tasks, failed code, verifier output, and minimal repair.
- Repair batch: train or prompt only on verified repair data.
- Proof batch: rerun held-out tests immediately.
- Release batch: publish only if the proof gate beats the previous best.
Every batch should emit JSON summaries, task-level CSV, rollouts, error signatures, and environment metadata.
Files
adapter_model.safetensors: LoRA adapter.adapter_config.json: PEFT configuration.v5_rescue_release_summary.json: exact proof-run summary.v5_rescue_eval_before_after_full_code.csv: task-level proof-run table.
Usage
python
from peft import PeftModelfrom transformers import AutoModelForCausalLM, AutoTokenizerbase_id = "Qwen/Qwen2.5-Coder-7B-Instruct"adapter_id = "josephmayo/Qwen2.5-Coder-7B-agentic-SLM-LoRA"tokenizer = AutoTokenizer.from_pretrained(base_id, trust_remote_code=True)base = AutoModelForCausalLM.from_pretrained(base_id, device_map="auto", trust_remote_code=True)model = PeftModel.from_pretrained(base, adapter_id)
For best results, use the model inside a strict code-only verifier harness. Do not evaluate it only by casual chat prompts.
Model provider
josephmayo
Model tree
Base
Qwen/Qwen2.5-Coder-7B-Instruct
Adapter
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information