Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Why it exists
The base Qwen2.5-Coder, asked for Sounio, writes Rust (println!, let x =, no effects). This adapter
makes it write idiomatic Sounio (fn main() with IO, print_int, the effect system).
Evaluation (held-out, functional)
Measured with a held-out functional harness — compile-rate (souc check) + run-pass (souc run → expected
stdout) on a 5% validation split never seen in training. Same checker for every model (fair ranking).
| model | base | compile-rate | run-pass (gold) |
|---|---|---|---|
| base (no adapter) | Qwen2.5-Coder-7B-Instruct | 6/45 | 0/6 |
| this adapter | 7B-Instruct + LoRA | 19/45 | 1/6 |
| prior 1.5B LoRA | Qwen2.5-Coder-1.5B + LoRA | 4/45 | 0/6 |
→ ~3.2× the base and ~4.75× the prior 1.5B LoRA on compile-rate; the only variant producing a fully-correct running program (run-pass).
Caveat (honest): the checker was the integration-branch souc while the held-out files are from main
(stdlib drift), so the absolute ceiling was ~27/45 — the relative ranking is the reliable signal.
Training
QLoRA (4-bit base), rank 32 / alpha 64, seq_len 2048, 3 epochs over ~3,357 Sounio source files (~4.5M tokens), final train loss 0.34 / ppl ~1.2. Trained on a single NVIDIA RTX A5000 (axolotl).
Usage (vLLM hot-adapter)
bash
vllm serve Qwen/Qwen2.5-Coder-7B-Instruct \--enable-lora --max-lora-rank 32 \--lora-modules sounio=chiuratto-AIgourakis/sounio-qwen25-coder-7b-lora# then request model="sounio"
Or with PEFT:
python
from peft import PeftModelfrom transformers import AutoModelForCausalLMbase = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Coder-7B-Instruct")model = PeftModel.from_pretrained(base, "chiuratto-AIgourakis/sounio-qwen25-coder-7b-lora")
Model provider
chiuratto-AIgourakis
Model tree
Base
Qwen/Qwen2.5-Coder-7B-Instruct
Adapter
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information