build-small-hackathon

mind-of-tashi-micro-sft

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Output contract

Emits a <think>…</think> block then one JSON line {"move": …, "taunt": …}. The <think> is product (rendered to the player), not debug. The host parses defensively and falls back to a legal move if generation is malformed.

Training

  • Method: TRL SFTTrainer, completion-only loss (masked to the assistant turn), Modal L4, bf16, seq 4096, 3 epochs, bs=1 / grad_accum=4, warmup 10%, LR 2e-4.
  • Data: build-small-hackathon/mind-of-tashi-selfplay, configs sft + sft_multiturn — self-play traces vs a frontier-API teacher pool, plus real-player matches.
  • Recipe: TRL SFTTrainer on Modal L4 — the full hparams are above (training scripts are run off-Space and kept private).

⚠️ norm_topk_prob — required for llama.cpp

The base ships norm_topk_prob=false (raw top-k expert routing), but llama.cpp's qwen3moe graph hardcodes norm_w=true and ignores the GGUF expert_weights_norm key. A checkpoint trained with false produces garbage on every llama.cpp runtime. This model is trained with norm_topk_prob=true so the weights match llama.cpp's renormalised routing — that is what makes the GGUF coherent.

Eval

  • Format gate (<think> + parseable {move,taunt} + legal move): 20/20 via transformers; via llama.cpp on the GGUF, f16 18/20, Q4_K_M 20/20, ~19–20/20 bilingual across 5 personas.
  • Ladder gauntlet (mirror match across the 10-persona ladder vs tier-matched teachers): baseline 80/100 (8W/2L).

Usage

python

from transformers import AutoModelForCausalLM, AutoTokenizer
tok = AutoTokenizer.from_pretrained("build-small-hackathon/mind-of-tashi-micro-sft")
model = AutoModelForCausalLM.from_pretrained("build-small-hackathon/mind-of-tashi-micro-sft")
# messages = [{"role":"system","content": <persona prompt>},
# {"role":"user","content": <arena state + history>}]

Part of the bundle

Game Space · self-play dataset · SFT model (this) + GGUF · OpenEnv gym · GRPO model + GGUF — all under build-small-hackathon/mind-of-tashi-*.

Model provider

build-small-hackathon

Model tree

Base

kshitijthakkar/loggenix-moe-0.4B-0.2A-sft-s3.1

Fine-tuned

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today