build-small-hackathon

compliment-forest-minicpm5-1b

Deploy Dedicated

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

Training

Base: openbmb/MiniCPM5-1B (Llama architecture, about 1.08B parameters)
Data: build-small-hackathon/compliment-forest-sft
Method: 4-bit NF4 QLoRA on Modal
LoRA: rank 16, alpha 32, dropout 0.05
Targets: attention and MLP projections
Sequence length: 2,048
Epochs: 2
Learning rate: 2e-4 with cosine decay
Runtime thinking mode: disabled for deterministic JSON generation

The dataset was filtered for JSON validity, concrete situation grounding, non-toxic positivity, and short first-person spells. This model is for whimsical encouragement; it is not a therapist or a substitute for professional support.

Inference

Use the base model's chat template with enable_thinking=False. The app enforces the output with Pydantic and retries malformed generations at most twice.

The repository also includes a Q4_K_M GGUF build for local llama.cpp inference.