Jainamshahhh/parry-tactician-1.5b-lora API & Inference Endpoint

Recipe

LoRA r=32 α=64, all-linear, completion-only loss (TRL SFTTrainer), 2 epochs
Data: a deterministic integer-only TypeScript engine (byte-identical between the browser game and the Node data generator — golden-trace proven) ran ~600 duels against a humanized bot gauntlet; counterfactual rows render the SAME state under all 6 plans only where the expert's actions disagree (3× weighted) — conditioning is only learnable where plans disagree
Expert teacher: plan-conditioned scripted policy with rhythm-reading (matched-plan win rate 72% vs the gauntlet; beats the button-masher 95%)

Honest evals (the interesting part)

Four model generations, every gate pre-registered and reported:

gen	held-out action agreement	plan-conditioning story
v1	77.1%	plan-DEAF (ΔP 0.07; no-plan ablation cost just 1.8pts)
v2	69.5%	conditioning 2-3× base; rush↔turtle style separation 0.23 vs teacher 0.60
v3	73.5%	teacher-identical style separation (0.602 vs 0.600) — strike-suppression learned
v4 (this)	73.9%	strongest fighter; plan-conditioning regressed (data dilution — see Field Notes)

The pre-registered ΔP gate FAILED on every generation as registered — then we measured the gate's ceiling and found the teacher itself couldn't pass the direction threshold. Full post-mortems (yardstick calibration, the counterfactual-data fix, the engine design bug a playtest exposed) live in the project Field Notes.

Reproduce

bash
# data (Node, deterministic): npx tsx training/datagen/run.ts
modal run train_bc.py --data-dir /vol/data/duels_v4 --run-name bc_v4 --epochs 2
modal run intervention.py::intervention_entrypoint --run-name bc_v4 --data-dir /vol/data/duels_v4

Trained on Modal (A100); served for dev via vLLM with structured_outputs.choice over the action alphabet. The shipped Space runs 100% in-browser per organizer guidance — this model is the published artifact + the dev/video opponent.

(LoRA adapter repo — see the merged repo for standalone weights.)

parry-tactician-1.5b-lora

Get help setting up a custom Dedicated Endpoints.

README

Recipe

Honest evals (the interesting part)

Reproduce

Explore FriendliAI today

parry-tactician-1.5b-lora