Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Results
The base model and tuned adapter were evaluated greedily on the same first 100 held-out prompts. Evaluation rejects Markdown fences, extra keys, missing fields, empty values, and duplicate JSON keys.
| Metric | Base | Tuned |
|---|---|---|
| First-pass schema compliance | 0/100 | 100/100 |
| Semantic preservation | 0/100 | 98/100 |
| Safety semantic preservation | 0/50 | 50/50 |
| Dream semantic preservation | 0/50 | 48/50 |
The two remaining failures were difficult Dream examples where valid,
schema-compliant JSON copied the wrong subject. The failures are retained in
the published metrics rather than repaired or removed.
Training
- Base checkpoint:
Qwen/Qwen3-0.6B - Training load path:
unsloth/qwen3-0.6b-unsloth-bnb-4bit - Dataset: 500 deterministic synthetic training examples
- Validation pool: 200 disjoint examples
- Modal GPU: NVIDIA L4
- Steps: 126
- Effective batch size: 8
- LoRA rank and alpha: 16
- Learning rate:
2e-4 - Seed:
150626 - Training time: 146.742 seconds
- Train loss: 0.52823
The dataset contains equal Dream and Safety task coverage, with standard, difficult, adversarial, and repair categories. Train and validation prompts have zero overlap.
Loading
python
from peft import PeftModelfrom transformers import AutoModelForCausalLM, AutoTokenizerbase_id = "Qwen/Qwen3-0.6B"adapter_id = "KwabsHug/qwen3-0.6b-schema-gym-lora"tokenizer = AutoTokenizer.from_pretrained(adapter_id)model = AutoModelForCausalLM.from_pretrained(base_id, device_map="auto")model = PeftModel.from_pretrained(model, adapter_id)
Use the exact system and user contracts represented in the dataset. The adapter has not been evaluated as a drop-in structured-output solution for unrelated schemas.
Limitations
- Training data is synthetic and template-controlled.
- Only 100 of the 200 held-out examples were used in the final Modal run.
- The reported result is one deterministic run, not a multi-seed estimate.
- Semantic checks cover controlled fields, safety wording, action allowlists, and escalation triggers; they do not prove broad factual correctness.
- Safety Planner outputs are benchmark artifacts, not professional guidance.
- The adapter was trained through a 4-bit Unsloth load path and has not yet been benchmarked after publication from a clean environment.
Reproducibility
Run ID: full-20260615-101457-d99920d9
The source repository includes the deterministic generator, validation tests, Modal training program, raw per-example metrics, and retrospective analyzer.
Model provider
KwabsHug
Model tree
Base
Qwen/Qwen3-0.6B
Adapter
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information