Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Recipe
- LoRA r=32 α=64, all-linear, completion-only loss (TRL SFTTrainer), 2 epochs
- Data: a deterministic integer-only TypeScript engine (byte-identical between the browser game and the Node data generator — golden-trace proven) ran ~600 duels against a humanized bot gauntlet; counterfactual rows render the SAME state under all 6 plans only where the expert's actions disagree (3× weighted) — conditioning is only learnable where plans disagree
- Expert teacher: plan-conditioned scripted policy with rhythm-reading (matched-plan win rate 72% vs the gauntlet; beats the button-masher 95%)
Honest evals (the interesting part)
Four model generations, every gate pre-registered and reported:
| gen | held-out action agreement | plan-conditioning story |
|---|---|---|
| v1 | 77.1% | plan-DEAF (ΔP 0.07; no-plan ablation cost just 1.8pts) |
| v2 | 69.5% | conditioning 2-3× base; rush↔turtle style separation 0.23 vs teacher 0.60 |
| v3 | 73.5% | teacher-identical style separation (0.602 vs 0.600) — strike-suppression learned |
| v4 (this) | 73.9% | strongest fighter; plan-conditioning regressed (data dilution — see Field Notes) |
The pre-registered ΔP gate FAILED on every generation as registered — then we measured the gate's ceiling and found the teacher itself couldn't pass the direction threshold. Full post-mortems (yardstick calibration, the counterfactual-data fix, the engine design bug a playtest exposed) live in the project Field Notes.
Reproduce
bash
# data (Node, deterministic): npx tsx training/datagen/run.tsmodal run train_bc.py --data-dir /vol/data/duels_v4 --run-name bc_v4 --epochs 2modal run intervention.py::intervention_entrypoint --run-name bc_v4 --data-dir /vol/data/duels_v4
Trained on Modal (A100); served for dev via vLLM with structured_outputs.choice
over the action alphabet. The shipped Space runs 100% in-browser per organizer
guidance — this model is the published artifact + the dev/video opponent.
(LoRA adapter repo — see the merged repo for standalone weights.)
Model provider
Jainamshahhh
Model tree
Base
Qwen/Qwen2.5-1.5B-Instruct
Adapter
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information