chiuratto-AIgourakis/sounio-qwen25-coder-7b-lora API & Inference Endpoint

Why it exists

The base Qwen2.5-Coder, asked for Sounio, writes Rust (println!, let x =, no effects). This adapter makes it write idiomatic Sounio (fn main() with IO, print_int, the effect system).

Evaluation (held-out, functional)

Measured with a held-out functional harness — compile-rate (souc check) + run-pass (souc run → expected stdout) on a 5% validation split never seen in training. Same checker for every model (fair ranking).

Table with columns: model, base, compile-rate, run-pass (gold)
model	base	compile-rate	run-pass (gold)
base (no adapter)	Qwen2.5-Coder-7B-Instruct	6/45	0/6
this adapter	7B-Instruct + LoRA	19/45	1/6
prior 1.5B LoRA	Qwen2.5-Coder-1.5B + LoRA	4/45	0/6

→ ~3.2× the base and ~4.75× the prior 1.5B LoRA on compile-rate; the only variant producing a fully-correct running program (run-pass).

Caveat (honest): the checker was the integration-branch souc while the held-out files are from main (stdlib drift), so the absolute ceiling was ~27/45 — the relative ranking is the reliable signal.

Training

QLoRA (4-bit base), rank 32 / alpha 64, seq_len 2048, 3 epochs over ~3,357 Sounio source files (~4.5M tokens), final train loss 0.34 / ppl ~1.2. Trained on a single NVIDIA RTX A5000 (axolotl).

Usage (vLLM hot-adapter)

bash
vllm serve Qwen/Qwen2.5-Coder-7B-Instruct \
  --enable-lora --max-lora-rank 32 \
  --lora-modules sounio=chiuratto-AIgourakis/sounio-qwen25-coder-7b-lora
# then request model="sounio"

Or with PEFT:

python
from peft import PeftModel
from transformers import AutoModelForCausalLM
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Coder-7B-Instruct")
model = PeftModel.from_pretrained(base, "chiuratto-AIgourakis/sounio-qwen25-coder-7b-lora")

Why it exists

The base Qwen2.5-Coder, asked for Sounio, writes Rust (println!, let x =, no effects). This adapter makes it write idiomatic Sounio (fn main() with IO, print_int, the effect system).

Evaluation (held-out, functional)

Table with columns: model, base, compile-rate, run-pass (gold)
model	base	compile-rate	run-pass (gold)
base (no adapter)	Qwen2.5-Coder-7B-Instruct	6/45	0/6
this adapter	7B-Instruct + LoRA	19/45	1/6
prior 1.5B LoRA	Qwen2.5-Coder-1.5B + LoRA	4/45	0/6

→ ~3.2× the base and ~4.75× the prior 1.5B LoRA on compile-rate; the only variant producing a fully-correct running program (run-pass).

Training

QLoRA (4-bit base), rank 32 / alpha 64, seq_len 2048, 3 epochs over ~3,357 Sounio source files (~4.5M tokens), final train loss 0.34 / ppl ~1.2. Trained on a single NVIDIA RTX A5000 (axolotl).

Usage (vLLM hot-adapter)

bash
vllm serve Qwen/Qwen2.5-Coder-7B-Instruct \
  --enable-lora --max-lora-rank 32 \
  --lora-modules sounio=chiuratto-AIgourakis/sounio-qwen25-coder-7b-lora
# then request model="sounio"

Or with PEFT:

python
from peft import PeftModel
from transformers import AutoModelForCausalLM
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Coder-7B-Instruct")
model = PeftModel.from_pretrained(base, "chiuratto-AIgourakis/sounio-qwen25-coder-7b-lora")

sounio-qwen25-coder-7b-lora

Get help setting up a custom Dedicated Endpoints.

README

Why it exists

Evaluation (held-out, functional)

Training

Usage (vLLM hot-adapter)

Explore FriendliAI today

README

Why it exists

Evaluation (held-out, functional)

Training

Usage (vLLM hot-adapter)