josephmayo

qwen2.5-coder-adapter

README

License: apache-2.0

What Changed

Base model: Qwen/Qwen2.5-Coder-1.5B-Instruct
Training method: QLoRA/LoRA adapter
Hardware: Kaggle 2x Tesla T4
Training budget: 140 steps, 1721 train rows after filtering
Data description: manually curated coding data mixed with publicly available coding instruction data. Dataset names and training rows are intentionally not included in this repo.

Same-Size Proof

This comparison is against the same base model and same parameter class: Qwen/Qwen2.5-Coder-1.5B-Instruct before training versus this adapter on top of that base.

Evaluation: 50 HumanEval tasks + 50 MBPP tasks.

Table with columns: Metric, Base Greedy, Forge SLM Adapter + Sampling/Repair
Metric	Base Greedy	Forge SLM Adapter + Sampling/Repair
Total pass	45 / 100	53 / 100
HumanEval	41 / 50	45 / 50
MBPP	4 / 50	8 / 50
Absolute lift	-	+8.0 percentage points
Relative pass-count lift	-	+17.78%

This is not yet a claim of beating frontier models. It is a same-size proof that the SLM adapter plus execution-selected sampling/repair moved the 1.5B coding base upward on two standard coding eval subsets.

Proof Files

See proofs/:

eval_before_after_full_code.csv: raw generations, extracted code, pass/fail, and errors.
before_greedy_full_code.csv: baseline greedy generations.
release_summary_sanitized.json: run metrics and config with dataset names redacted.
trainer_log_history.json: training logs.
nvidia_smi.txt: Kaggle GPU proof.

Usage

python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_id = "Qwen/Qwen2.5-Coder-1.5B-Instruct"
adapter_id = "josephmayo/Qwen2.5-Coder-1.5B-Forge-SLM"

tokenizer = AutoTokenizer.from_pretrained(adapter_id)
base = AutoModelForCausalLM.from_pretrained(base_id, device_map="auto", torch_dtype="auto")
model = PeftModel.from_pretrained(base, adapter_id)
model.eval()

For benchmark-style tasks, use strict code-only prompting and run generated code against tests. The reported after score uses sampling/repair, not just single greedy decoding.

Limitations

This is an adapter release, not a merged full-weight model.
The eval is a 100-task subset: 50 HumanEval + 50 MBPP.
The after score uses adapter + sampling/repair, so it should be compared to agentic coding usage rather than pure greedy decoding.
Training data is described but not published in this repo.

Evidence files

Run evidence for this release is stored in the repository under evidence/:

evidence/hf_release_qwen25_coder_agentic_slm_v5_lora_v5_rescue_release_summary.json

These files are compact local/Kaggle run artifacts used to document training, evaluation, merge, or quantization evidence for this model family.

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Model Details

Model Provider

josephmayo

Model Tree

Base

Qwen/Qwen2.5-Coder-1.5B-Instruct

Adapter

this model

Input Modalities

Text

Output Modalities