SwarmandBee

DiabeticDaily-9B

Deploy Dedicated

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Beat-base — proven

Held-out perplexity vs base Qwen3.5-9B (text never trained on):

Table
	held-out loss	perplexity
Base Qwen3.5-9B	1.3625	3.906
DiabeticDaily-9B	0.8079	2.243
Δ	−0.555 (+40.7% better)

Verdict: BEAT BASE ✅. Models the domain ~41% better than base — and its perplexity (2.24) is nearly the 27B anchor's (2.05): the knowledge survives the shrink. That's the distillation-ladder thesis, proven.

How it was cooked

Base: Qwen/Qwen3.5-9B (Apache-2.0). Data: the same deeded OpenDiabetic corpus as the 27B anchor.
Recipe: LoRA r64/α32 on attn+mlp, LR 1e-5, cosine, early-stop overcook guard. Merged bf16.