skrrt-sh
raif-qwen3-4b-lora
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Results (parse = decodes; fidelity = byte-exact round-trip)
| group | parse | fidelity | n |
|---|---|---|---|
| valid (in-training shapes) | 97% | 95% | 64 |
| holdout (withheld shapes) | 98% | 95% | 64 |
- valid = held-out split of in-training shapes; holdout = shapes withheld from training entirely.
- Token cost: ~10% fewer than minified JSON on real function-call data (cross-tokenizer).
Training
| base | unsloth/Qwen3-4B-Instruct-2507 |
| method | LoRA (PEFT) via unsloth |
| rank / alpha | 32 / 64 |
| lora_dropout | 0.05 |
| learning rate | 0.0001 (constant) |
| seq length | 2048 |
| epochs / examples | 2.56 / 48000 |
| final train / eval loss | 0.10483384132385254 / 0.10513444989919662 |
Data: synthetic RAIF examples (with mechanism-carrier shapes) augmented with
real tool-call argument objects from glaiveai/glaive-function-calling-v2
(Apache-2.0), kept only where they round-trip losslessly. Full recipe:
RECIPE.md.
Usage
python
from peft import PeftModelfrom transformers import AutoModelForCausalLM, AutoTokenizerbase = AutoModelForCausalLM.from_pretrained("unsloth/Qwen3-4B-Instruct-2507")tok = AutoTokenizer.from_pretrained("skrrt-sh/raif-qwen3-4b-lora")model = PeftModel.from_pretrained(base, "skrrt-sh/raif-qwen3-4b-lora")
This model emits RAIF, not JSON — decode it at the output boundary with the
official codec (pure-stdlib, no bun, nothing to clone):
sh
pip install raif-format # or: uv add raif-format
python
from raif import decode # installs as `raif-format`, imports as `raif`result = decode(model_output) # {"ok", "value", "repairs"}data = result["value"] if result["ok"] else None # ordinary JSON, ready downstream
decode_lenient() recovers the intact leaves of a truncated stream. The codec is
the same one used to score this model, kept byte-identical across Python and
TypeScript by a shared conformance corpus.
Links
- Format spec & reference codec: https://github.com/skrrt-sh/raif-standard
- Decoder:
raif-formaton PyPI ·raif-formaton npm - Training recipe, eval & the other RAIF models: https://github.com/skrrt-sh/raif-lora
License & attribution
Derivative of Qwen2.5 — Apache-2.0 (the Qwen2.5 small bases are Apache-2.0 licensed).
Trained in part on glaiveai/glaive-function-calling-v2 (Apache-2.0) — attribute Glaive AI.
Model provider
skrrt-sh
Model tree
Base
unsloth/Qwen3-4B-Instruct-2507
Adapter
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information