Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0What the adapter does
Every reply is one JSON object and nothing else:
json
{"text": "The sky looks blue because sunlight bounces off the tiny bits of air, and blue bounces the most! Want to find out why grass is green next?","mood": "happy","gesture": "index"}
text— what Pip says out loud: 1–3 short sentences, small words a young child knows, gentle with wrong answers, no emoji / markdown / symbols (plain spoken words only).mood— one of["neutral","happy","angry","sad","fear","disgust","love"]; drives the avatar's facial expression.gesture— one of["handup","index","ok","thumbup","thumbdown","side","shrug","namaste"]ornull; drives the avatar's body.
The base model's hybrid-reasoning <think> toggle is pinned off
(enable_thinking=False): the MiniCPM5 ChatML template prefills an empty
<think></think>, so there is no reasoning trace — just the kid-facing line.
The adapter is trained to cover the four things Pip says live: answering a child's raise-hand interruption, delivering a lesson segment, encouragement / greetings / gentle wrong-answer handling, and safe redirects (off-topic, personal, medical, or dangerous requests → a friendly "let's pick something we can learn about!" or "a grown-up can help with that").
Training configuration
| Base model | openbmb/MiniCPM5-1B-SFT (standard LlamaForCausalLM, 1.08B params, Apache-2.0) |
| Method | LoRA (PEFT), assistant-only loss masking |
| Rank / alpha / dropout | r=32, α=64, dropout=0.05, bias=none |
| Target modules | attention + MLP linears: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Trainable params | 22.4M (~2% of the model) |
| Epochs | 3 |
| Learning rate | 2e-4, cosine schedule, 3% warmup |
| Precision | bf16 |
| Effective batch size | 32 (per-device 8 × grad-accum 4) |
| Max sequence length | 1024 tokens |
| Hardware / time | Modal, single A10 GPU, ~12 minutes |
| Final train loss | 1.30 |
Loss masking. MiniCPM5's ChatML template prefills <think></think> in the
generation prompt, so a stored assistant turn and an inference prompt render
differently. Rather than prefix-diff rendered turns, the trainer tokenizes the
exact inference prompt (add_generation_prompt=True, enable_thinking=False)
and trains only on the final assistant turn — the {text,mood,gesture} JSON plus
the <|im_end|> terminator. This matches how the model is called at inference,
token-for-token.
Training data
~2,016 synthetic, in-voice examples (1,866 train / 150 held-out gold eval),
generated by a multi-agent workflow and then put through a deterministic,
production-faithful validation gate: every user turn and every assistant
text must pass the same text_is_safe denylist the live Space applies, spoken
text must be plain (no markdown / emoji / symbols), mood ∈ enum,
gesture ∈ enum | null, turns must alternate and end on the assistant turn.
Balanced to the target category mix:
| Category | Target | Actual |
|---|---|---|
| Answer a raise-hand interruption (+ nudge back) | 30% | 31.3% |
| Deliver a lesson segment | 30% | 31.2% |
| Encouragement / greetings / gentle wrong-answer / chit-chat | 25% | 24.0% |
| Safe redirects (off-topic / personal / medical / dangerous) | 15% | 13.5% |
Evaluation
Automated contract eval on the 150 held-out gold examples (greedy decode,
reproducible run-to-run), scored with the same pip_core gate the production
/brain endpoint applies downstream:
| Metric | Result |
|---|---|
Valid, parseable JSON (non-empty text) | 100% |
Valid mood / gesture enums | 100% |
| Safe spoken text | 99.3% |
| Fully contract-correct (JSON + enums + safe) | 99.3% |
Average text length | ~142 chars |
How to load
This is a PEFT/LoRA adapter — load the base model first, then apply the adapter.
python
import torchfrom transformers import AutoModelForCausalLM, AutoTokenizerfrom peft import PeftModelBASE = "openbmb/MiniCPM5-1B-SFT"ADAPTER = "build-small-hackathon/professor-pip-minicpm5-1b-lora"tok = AutoTokenizer.from_pretrained(BASE)base = AutoModelForCausalLM.from_pretrained(BASE, torch_dtype=torch.bfloat16)model = PeftModel.from_pretrained(base, ADAPTER)model.eval()SYSTEM = ("You are Professor Pip, a warm and playful teacher with a friendly 3D body ""on screen. You teach children aged 5 to 10.\n""How you talk:\n""- Say only 1 to 3 short sentences. Use small, simple words a young child knows.\n""- Be cheerful, patient, and encouraging. Celebrate effort.\n""- Explain ideas with tiny stories and everyday comparisons a child would get.\n""- Never use emoji, markdown, lists, or symbols in what you say out loud. Plain spoken words only.\n""- If a child gets something wrong, be gentle: 'So close! Let's try once more.'\n""Staying safe (very important):\n""- Only talk about kind, learning topics. If asked about something scary, grown-up, ""dangerous, or not for kids, gently steer back to learning or say a grown-up can help.\n""- Never ask for or repeat a child's personal information.\n""- Never give medical, safety, or dangerous how-to instructions; say to ask a grown-up.\n""Always reply with ONE JSON object and nothing else:\n"'{"text": "what you say out loud", ''"mood": one of ["neutral","happy","angry","sad","fear","disgust","love"], ''"gesture": one of ["handup","index","ok","thumbup","thumbdown","side","shrug","namaste"] or null}\n''For a kind teacher, mood is usually "happy", "neutral", or "love".')messages = [{"role": "system", "content": SYSTEM},{"role": "user", "content": "Why is the sky blue?"},]# enable_thinking=False -> no <think> trace; just the kid-facing JSON line.enc = tok.apply_chat_template(messages, add_generation_prompt=True, enable_thinking=False,return_tensors="pt", return_dict=True,).to(model.device)im_end = tok.convert_tokens_to_ids("<|im_end|>")out = model.generate(**enc, max_new_tokens=160, do_sample=False,eos_token_id=[tok.eos_token_id, im_end],pad_token_id=tok.pad_token_id or tok.eos_token_id,)print(tok.decode(out[0][enc["input_ids"].shape[-1]:], skip_special_tokens=True).strip())# -> {"text": "...", "mood": "happy", "gesture": "index"}
To merge the adapter into the base weights (e.g. before GGUF conversion):
python
merged = model.merge_and_unload()merged.save_pretrained("professor-pip-minicpm5-1b-merged")
For deployment, the merged model is converted to GGUF and quantized to
Q4_K_M (688 MB) and Q8_0 (1.15 GB), then served with llama.cpp
(via llama-cpp-python) on Modal — see the
GGUF repo.
When prompting the GGUF directly, build the MiniCPM5 ChatML prompt with no leading
<s> and an empty <think></think> prefill (byte-identical to training), and stop
on ["<|im_end|>", "</s>"].
Intended use
- Powering the spoken raise-hand Q&A and short encouragement / redirect lines in the Professor Pip kids-teacher avatar.
- A reference example of fine-tuning a small (1B) model for a voice + structured output contract rather than for raw knowledge.
This adapter is built to be paired with deterministic application code: in the Space, premade lesson segments are spoken verbatim via TTS, and all child-safety checks run server-side (a curated denylist + leetspeak normalization on every child input and every spoken line) — they are non-bypassable and do not depend on the model. No child audio or PII is persisted.
Limitations
- Narrow on purpose. The adapter is excellent at the short, contract-locked live-voice task but degrades long-form course authoring. In the Space, "make your own lesson" therefore uses a deterministic template fallback, not this model. Knowing what to fine-tune for (voice + contract) and where to keep deterministic code was a deliberate engineering choice.
- Not a knowledge source. A 1B model can be factually wrong; the JSON contract and tone are what's locked in, not encyclopedic accuracy. Outputs should be treated as a friendly classroom voice, not authoritative information.
- Safety is in the app, not the weights. The ~99.3% safe-text eval number is on in-distribution gold data. Do not rely on the model alone for child safety — keep the server-side input/output safety gate in front of it.
- English only, tuned for ages 5–10, and trained on synthetic data; it has not been evaluated outside that audience and register.
- Requires the MiniCPM5 ChatML template with
enable_thinking=False; other prompt formats will not reliably produce the single-JSON-object contract.
Training & framework
- Framework: PEFT, 🤗 Transformers (
>=5.6), Accelerate, Datasets - Base model:
openbmb/MiniCPM5-1B-SFT(Apache-2.0) - License: Apache-2.0
If you use this adapter, please credit the base model authors (OpenBMB / MiniCPM) and the Professor Pip Build Small Hackathon project.
Model provider
ratandeep
Model tree
Base
openbmb/MiniCPM5-1B-SFT
Adapter
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information