skrrt-sh

raif-qwen3-4b-lora

README

License: apache-2.0

Results (parse = decodes; fidelity = byte-exact round-trip)

Table with columns: group, parse, fidelity, n
group	parse	fidelity	n
valid (in-training shapes)	97%	95%	64
holdout (withheld shapes)	98%	95%	64

valid = held-out split of in-training shapes; holdout = shapes withheld from training entirely.
Token cost: ~10% fewer than minified JSON on real function-call data (cross-tokenizer).

Training

Table

base	`unsloth/Qwen3-4B-Instruct-2507`
method	LoRA (PEFT) via unsloth
rank / alpha	32 / 64
lora_dropout	0.05
learning rate	0.0001 (constant)
seq length	2048
epochs / examples	2.56 / 48000
final train / eval loss	0.10483384132385254 / 0.10513444989919662

Data: synthetic RAIF examples (with mechanism-carrier shapes) augmented with real tool-call argument objects from glaiveai/glaive-function-calling-v2 (Apache-2.0), kept only where they round-trip losslessly. Full recipe: RECIPE.md.

Usage

python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained("unsloth/Qwen3-4B-Instruct-2507")
tok = AutoTokenizer.from_pretrained("skrrt-sh/raif-qwen3-4b-lora")
model = PeftModel.from_pretrained(base, "skrrt-sh/raif-qwen3-4b-lora")

This model emits RAIF, not JSON — decode it at the output boundary with the official codec (pure-stdlib, no bun, nothing to clone):

sh
pip install raif-format        # or: uv add raif-format

python
from raif import decode        # installs as `raif-format`, imports as `raif`

result = decode(model_output)  # {"ok", "value", "repairs"}
data = result["value"] if result["ok"] else None   # ordinary JSON, ready downstream

decode_lenient() recovers the intact leaves of a truncated stream. The codec is the same one used to score this model, kept byte-identical across Python and TypeScript by a shared conformance corpus.

Serve with vLLM

For server deployments, raif-vllm is a single vLLM plugin that makes a stock OpenAI endpoint speak RAIF transparently: existing clients get RAIF on tools and response_format with no proxy and no client changes — the model emits compact RAIF-G and the plugin decodes it to JSON at the boundary, so the decode() step above moves server-side. Verified end-to-end on vLLM 0.19.

sh
pip install raif-vllm
VLLM_PLUGINS=raif vllm serve Qwen/Qwen3-4B-Instruct-2507 \
  --enable-lora --lora-modules raif=skrrt-sh/raif-qwen3-4b-lora \
  --max-lora-rank 32 --max-model-len 8192 \
  --chat-template "$(raif-vllm-chat-template qwen-4b)" \
  --reasoning-parser raif --enable-auto-tool-choice --tool-call-parser raif

The tools-ignoring chat template ships inside the wheel; raif-vllm-chat-template qwen-4b resolves its path. The plugin strips Qwen3's leading <think> block at the decode boundary automatically — no client change.

License & attribution

Derivative of Qwen2.5 — Apache-2.0 (the Qwen2.5 small bases are Apache-2.0 licensed). Trained in part on glaiveai/glaive-function-calling-v2 (Apache-2.0) — attribute Glaive AI.

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Model Details

Model Provider

skrrt-sh

Model Tree

Base

unsloth/Qwen3-4B-Instruct-2507

Adapter

this model

Input Modalities

Text

Output Modalities

Text

Supported Functionality

Dedicated Endpoints

Container

Explore FriendliAI today

Get started Talk to an engineer

README

License: apache-2.0

Results (parse = decodes; fidelity = byte-exact round-trip)

Table with columns: group, parse, fidelity, n
group	parse	fidelity	n
valid (in-training shapes)	97%	95%	64
holdout (withheld shapes)	98%	95%	64

valid = held-out split of in-training shapes; holdout = shapes withheld from training entirely.
Token cost: ~10% fewer than minified JSON on real function-call data (cross-tokenizer).

Training

Table

base	`unsloth/Qwen3-4B-Instruct-2507`
method	LoRA (PEFT) via unsloth
rank / alpha	32 / 64
lora_dropout	0.05
learning rate	0.0001 (constant)
seq length	2048
epochs / examples	2.56 / 48000
final train / eval loss	0.10483384132385254 / 0.10513444989919662

Usage

python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained("unsloth/Qwen3-4B-Instruct-2507")
tok = AutoTokenizer.from_pretrained("skrrt-sh/raif-qwen3-4b-lora")
model = PeftModel.from_pretrained(base, "skrrt-sh/raif-qwen3-4b-lora")

This model emits RAIF, not JSON — decode it at the output boundary with the official codec (pure-stdlib, no bun, nothing to clone):

sh
pip install raif-format        # or: uv add raif-format

python
from raif import decode        # installs as `raif-format`, imports as `raif`

result = decode(model_output)  # {"ok", "value", "repairs"}
data = result["value"] if result["ok"] else None   # ordinary JSON, ready downstream

Serve with vLLM

sh
pip install raif-vllm
VLLM_PLUGINS=raif vllm serve Qwen/Qwen3-4B-Instruct-2507 \
  --enable-lora --lora-modules raif=skrrt-sh/raif-qwen3-4b-lora \
  --max-lora-rank 32 --max-model-len 8192 \
  --chat-template "$(raif-vllm-chat-template qwen-4b)" \
  --reasoning-parser raif --enable-auto-tool-choice --tool-call-parser raif

License & attribution

Derivative of Qwen2.5 — Apache-2.0 (the Qwen2.5 small bases are Apache-2.0 licensed). Trained in part on glaiveai/glaive-function-calling-v2 (Apache-2.0) — attribute Glaive AI.

raif-qwen3-4b-lora

README

Results (parse = decodes; fidelity = byte-exact round-trip)

Training

Usage

Serve with vLLM

Links

License & attribution

Explore FriendliAI today

README

Results (parse = decodes; fidelity = byte-exact round-trip)

Training

Usage

Serve with vLLM

Links

License & attribution