Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

What the adapter does

Every reply is one JSON object and nothing else:

json

{"text": "The sky looks blue because sunlight bounces off the tiny bits of air, and blue bounces the most! Want to find out why grass is green next?",
"mood": "happy",
"gesture": "index"}
  • text — what Pip says out loud: 1–3 short sentences, small words a young child knows, gentle with wrong answers, no emoji / markdown / symbols (plain spoken words only).
  • mood — one of ["neutral","happy","angry","sad","fear","disgust","love"]; drives the avatar's facial expression.
  • gesture — one of ["handup","index","ok","thumbup","thumbdown","side","shrug","namaste"] or null; drives the avatar's body.

The base model's hybrid-reasoning <think> toggle is pinned off (enable_thinking=False): the MiniCPM5 ChatML template prefills an empty <think></think>, so there is no reasoning trace — just the kid-facing line.

The adapter is trained to cover the four things Pip says live: answering a child's raise-hand interruption, delivering a lesson segment, encouragement / greetings / gentle wrong-answer handling, and safe redirects (off-topic, personal, medical, or dangerous requests → a friendly "let's pick something we can learn about!" or "a grown-up can help with that").


Training configuration

Base modelopenbmb/MiniCPM5-1B-SFT (standard LlamaForCausalLM, 1.08B params, Apache-2.0)
MethodLoRA (PEFT), assistant-only loss masking
Rank / alpha / dropoutr=32, α=64, dropout=0.05, bias=none
Target modulesattention + MLP linears: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Trainable params22.4M (~2% of the model)
Epochs3
Learning rate2e-4, cosine schedule, 3% warmup
Precisionbf16
Effective batch size32 (per-device 8 × grad-accum 4)
Max sequence length1024 tokens
Hardware / timeModal, single A10 GPU, ~12 minutes
Final train loss1.30

Loss masking. MiniCPM5's ChatML template prefills <think></think> in the generation prompt, so a stored assistant turn and an inference prompt render differently. Rather than prefix-diff rendered turns, the trainer tokenizes the exact inference prompt (add_generation_prompt=True, enable_thinking=False) and trains only on the final assistant turn — the {text,mood,gesture} JSON plus the <|im_end|> terminator. This matches how the model is called at inference, token-for-token.

Training data

~2,016 synthetic, in-voice examples (1,866 train / 150 held-out gold eval), generated by a multi-agent workflow and then put through a deterministic, production-faithful validation gate: every user turn and every assistant text must pass the same text_is_safe denylist the live Space applies, spoken text must be plain (no markdown / emoji / symbols), mood ∈ enum, gesture ∈ enum | null, turns must alternate and end on the assistant turn.

Balanced to the target category mix:

CategoryTargetActual
Answer a raise-hand interruption (+ nudge back)30%31.3%
Deliver a lesson segment30%31.2%
Encouragement / greetings / gentle wrong-answer / chit-chat25%24.0%
Safe redirects (off-topic / personal / medical / dangerous)15%13.5%

Evaluation

Automated contract eval on the 150 held-out gold examples (greedy decode, reproducible run-to-run), scored with the same pip_core gate the production /brain endpoint applies downstream:

MetricResult
Valid, parseable JSON (non-empty text)100%
Valid mood / gesture enums100%
Safe spoken text99.3%
Fully contract-correct (JSON + enums + safe)99.3%
Average text length~142 chars

How to load

This is a PEFT/LoRA adapter — load the base model first, then apply the adapter.

python

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
BASE = "openbmb/MiniCPM5-1B-SFT"
ADAPTER = "build-small-hackathon/professor-pip-minicpm5-1b-lora"
tok = AutoTokenizer.from_pretrained(BASE)
base = AutoModelForCausalLM.from_pretrained(BASE, torch_dtype=torch.bfloat16)
model = PeftModel.from_pretrained(base, ADAPTER)
model.eval()
SYSTEM = (
"You are Professor Pip, a warm and playful teacher with a friendly 3D body "
"on screen. You teach children aged 5 to 10.\n"
"How you talk:\n"
"- Say only 1 to 3 short sentences. Use small, simple words a young child knows.\n"
"- Be cheerful, patient, and encouraging. Celebrate effort.\n"
"- Explain ideas with tiny stories and everyday comparisons a child would get.\n"
"- Never use emoji, markdown, lists, or symbols in what you say out loud. Plain spoken words only.\n"
"- If a child gets something wrong, be gentle: 'So close! Let's try once more.'\n"
"Staying safe (very important):\n"
"- Only talk about kind, learning topics. If asked about something scary, grown-up, "
"dangerous, or not for kids, gently steer back to learning or say a grown-up can help.\n"
"- Never ask for or repeat a child's personal information.\n"
"- Never give medical, safety, or dangerous how-to instructions; say to ask a grown-up.\n"
"Always reply with ONE JSON object and nothing else:\n"
'{"text": "what you say out loud", '
'"mood": one of ["neutral","happy","angry","sad","fear","disgust","love"], '
'"gesture": one of ["handup","index","ok","thumbup","thumbdown","side","shrug","namaste"] or null}\n'
'For a kind teacher, mood is usually "happy", "neutral", or "love".'
)
messages = [
{"role": "system", "content": SYSTEM},
{"role": "user", "content": "Why is the sky blue?"},
]
# enable_thinking=False -> no <think> trace; just the kid-facing JSON line.
enc = tok.apply_chat_template(
messages, add_generation_prompt=True, enable_thinking=False,
return_tensors="pt", return_dict=True,
).to(model.device)
im_end = tok.convert_tokens_to_ids("<|im_end|>")
out = model.generate(
**enc, max_new_tokens=160, do_sample=False,
eos_token_id=[tok.eos_token_id, im_end],
pad_token_id=tok.pad_token_id or tok.eos_token_id,
)
print(tok.decode(out[0][enc["input_ids"].shape[-1]:], skip_special_tokens=True).strip())
# -> {"text": "...", "mood": "happy", "gesture": "index"}

To merge the adapter into the base weights (e.g. before GGUF conversion):

python

merged = model.merge_and_unload()
merged.save_pretrained("professor-pip-minicpm5-1b-merged")

For deployment, the merged model is converted to GGUF and quantized to Q4_K_M (688 MB) and Q8_0 (1.15 GB), then served with llama.cpp (via llama-cpp-python) on Modal — see the GGUF repo. When prompting the GGUF directly, build the MiniCPM5 ChatML prompt with no leading <s> and an empty <think></think> prefill (byte-identical to training), and stop on ["<|im_end|>", "</s>"].


Intended use

  • Powering the spoken raise-hand Q&A and short encouragement / redirect lines in the Professor Pip kids-teacher avatar.
  • A reference example of fine-tuning a small (1B) model for a voice + structured output contract rather than for raw knowledge.

This adapter is built to be paired with deterministic application code: in the Space, premade lesson segments are spoken verbatim via TTS, and all child-safety checks run server-side (a curated denylist + leetspeak normalization on every child input and every spoken line) — they are non-bypassable and do not depend on the model. No child audio or PII is persisted.

Limitations

  • Narrow on purpose. The adapter is excellent at the short, contract-locked live-voice task but degrades long-form course authoring. In the Space, "make your own lesson" therefore uses a deterministic template fallback, not this model. Knowing what to fine-tune for (voice + contract) and where to keep deterministic code was a deliberate engineering choice.
  • Not a knowledge source. A 1B model can be factually wrong; the JSON contract and tone are what's locked in, not encyclopedic accuracy. Outputs should be treated as a friendly classroom voice, not authoritative information.
  • Safety is in the app, not the weights. The ~99.3% safe-text eval number is on in-distribution gold data. Do not rely on the model alone for child safety — keep the server-side input/output safety gate in front of it.
  • English only, tuned for ages 5–10, and trained on synthetic data; it has not been evaluated outside that audience and register.
  • Requires the MiniCPM5 ChatML template with enable_thinking=False; other prompt formats will not reliably produce the single-JSON-object contract.

Training & framework

  • Framework: PEFT, 🤗 Transformers (>=5.6), Accelerate, Datasets
  • Base model: openbmb/MiniCPM5-1B-SFT (Apache-2.0)
  • License: Apache-2.0

If you use this adapter, please credit the base model authors (OpenBMB / MiniCPM) and the Professor Pip Build Small Hackathon project.

Model provider

build-small-hackathon

Model tree

Base

openbmb/MiniCPM5-1B-SFT

Adapter

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today