build-small-hackathon
mind-of-tashi-micro-sft
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Output contract
Emits a <think>…</think> block then one JSON line {"move": …, "taunt": …}.
The <think> is product (rendered to the player), not debug. The host parses
defensively and falls back to a legal move if generation is malformed.
Training
- Method: TRL
SFTTrainer, completion-only loss (masked to the assistant turn), Modal L4, bf16, seq 4096, 3 epochs, bs=1 / grad_accum=4, warmup 10%, LR 2e-4. - Data:
build-small-hackathon/mind-of-tashi-selfplay, configssft+sft_multiturn— self-play traces vs a frontier-API teacher pool, plus real-player matches. - Recipe: TRL
SFTTraineron Modal L4 — the full hparams are above (training scripts are run off-Space and kept private).
⚠️ norm_topk_prob — required for llama.cpp
The base ships norm_topk_prob=false (raw top-k expert routing), but
llama.cpp's qwen3moe graph hardcodes norm_w=true and ignores the GGUF
expert_weights_norm key. A checkpoint trained with false produces garbage
on every llama.cpp runtime. This model is trained with norm_topk_prob=true
so the weights match llama.cpp's renormalised routing — that is what makes the
GGUF coherent.
Eval
- Format gate (
<think>+ parseable{move,taunt}+ legal move): 20/20 via transformers; via llama.cpp on the GGUF, f16 18/20, Q4_K_M 20/20, ~19–20/20 bilingual across 5 personas. - Ladder gauntlet (mirror match across the 10-persona ladder vs tier-matched teachers): baseline 80/100 (8W/2L).
Usage
python
from transformers import AutoModelForCausalLM, AutoTokenizertok = AutoTokenizer.from_pretrained("build-small-hackathon/mind-of-tashi-micro-sft")model = AutoModelForCausalLM.from_pretrained("build-small-hackathon/mind-of-tashi-micro-sft")# messages = [{"role":"system","content": <persona prompt>},# {"role":"user","content": <arena state + history>}]
Part of the bundle
Game Space · self-play dataset · SFT model (this) + GGUF · OpenEnv gym ·
GRPO model + GGUF — all under build-small-hackathon/mind-of-tashi-*.
Model provider
build-small-hackathon
Model tree
Base
kshitijthakkar/loggenix-moe-0.4B-0.2A-sft-s3.1
Fine-tuned
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information