KwabsHug

qwen3-0.6b-schema-gym-lora

README

License: apache-2.0

Results

The base model and tuned adapter were evaluated greedily on the same first 100 held-out prompts. Evaluation rejects Markdown fences, extra keys, missing fields, empty values, and duplicate JSON keys.

Table with columns: Metric, Base, Tuned
Metric	Base	Tuned
First-pass schema compliance	0/100	100/100
Semantic preservation	0/100	98/100
Safety semantic preservation	0/50	50/50
Dream semantic preservation	0/50	48/50

The two remaining failures were difficult Dream examples where valid, schema-compliant JSON copied the wrong subject. The failures are retained in the published metrics rather than repaired or removed.

Training

Base checkpoint: Qwen/Qwen3-0.6B
Training load path: unsloth/qwen3-0.6b-unsloth-bnb-4bit
Dataset: 500 deterministic synthetic training examples
Validation pool: 200 disjoint examples
Modal GPU: NVIDIA L4
Steps: 126
Effective batch size: 8
LoRA rank and alpha: 16
Learning rate: 2e-4
Seed: 150626
Training time: 146.742 seconds
Train loss: 0.52823

The dataset contains equal Dream and Safety task coverage, with standard, difficult, adversarial, and repair categories. Train and validation prompts have zero overlap.

Loading

python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_id = "Qwen/Qwen3-0.6B"
adapter_id = "KwabsHug/qwen3-0.6b-schema-gym-lora"

tokenizer = AutoTokenizer.from_pretrained(adapter_id)
model = AutoModelForCausalLM.from_pretrained(base_id, device_map="auto")
model = PeftModel.from_pretrained(model, adapter_id)

Use the exact system and user contracts represented in the dataset. The adapter has not been evaluated as a drop-in structured-output solution for unrelated schemas.

Limitations

Training data is synthetic and template-controlled.
Only 100 of the 200 held-out examples were used in the final Modal run.
The reported result is one deterministic run, not a multi-seed estimate.
Semantic checks cover controlled fields, safety wording, action allowlists, and escalation triggers; they do not prove broad factual correctness.
Safety Planner outputs are benchmark artifacts, not professional guidance.
The adapter was trained through a 4-bit Unsloth load path and has not yet been benchmarked after publication from a clean environment.

Reproducibility

Run ID: full-20260615-101457-d99920d9

The source repository includes the deterministic generator, validation tests, Modal training program, raw per-example metrics, and retrospective analyzer.

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Model Details

Model Provider

KwabsHug

Model Tree

Base

Qwen/Qwen3-0.6B

Adapter

this model

Input Modalities

Text

Output Modalities