Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Recipe

  • LoRA r=32 α=64, all-linear, completion-only loss (TRL SFTTrainer), 2 epochs
  • Data: a deterministic integer-only TypeScript engine (byte-identical between the browser game and the Node data generator — golden-trace proven) ran ~600 duels against a humanized bot gauntlet; counterfactual rows render the SAME state under all 6 plans only where the expert's actions disagree (3× weighted) — conditioning is only learnable where plans disagree
  • Expert teacher: plan-conditioned scripted policy with rhythm-reading (matched-plan win rate 72% vs the gauntlet; beats the button-masher 95%)

Honest evals (the interesting part)

Four model generations, every gate pre-registered and reported:

genheld-out action agreementplan-conditioning story
v177.1%plan-DEAF (ΔP 0.07; no-plan ablation cost just 1.8pts)
v269.5%conditioning 2-3× base; rush↔turtle style separation 0.23 vs teacher 0.60
v373.5%teacher-identical style separation (0.602 vs 0.600) — strike-suppression learned
v4 (this)73.9%strongest fighter; plan-conditioning regressed (data dilution — see Field Notes)

The pre-registered ΔP gate FAILED on every generation as registered — then we measured the gate's ceiling and found the teacher itself couldn't pass the direction threshold. Full post-mortems (yardstick calibration, the counterfactual-data fix, the engine design bug a playtest exposed) live in the project Field Notes.

Reproduce

bash

# data (Node, deterministic): npx tsx training/datagen/run.ts
modal run train_bc.py --data-dir /vol/data/duels_v4 --run-name bc_v4 --epochs 2
modal run intervention.py::intervention_entrypoint --run-name bc_v4 --data-dir /vol/data/duels_v4

Trained on Modal (A100); served for dev via vLLM with structured_outputs.choice over the action alphabet. The shipped Space runs 100% in-browser per organizer guidance — this model is the published artifact + the dev/video opponent.

(LoRA adapter repo — see the merged repo for standalone weights.)

Model provider

Jainamshahhh

Model tree

Base

Qwen/Qwen2.5-1.5B-Instruct

Adapter

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today