Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: mitArchitecture
- Llama (HF
LlamaForCausalLM) — RoPE, RMSNorm, SwiGLU, no biases, tied embeddings - 12 layers · 768 hidden · 12 heads · 2048 FFN
- 1024 sequence length
- 110,119,680 parameters
Tokenizer
Joint byte-level BPE, 32,768 vocab, trained on a balanced 50M-char sample from each of EN/NL/ZH. The same tokenizer is shared across all three languages (see the data card for why a joint tokenizer is required: ZH is 6.8% Latin script).
Training
- Data:
BabyLM-community/babylm-eng+babylm-nld+babylm-zho(BabyBabelLM 2026 100M tier). Full corpora loaded in memory and shuffled (the Hub layout is category-clustered; streaming with reasonable buffers produces a biased sample). - Mixture: byte-premium-uniform — equal share of reference tokens per language (1/3 each), achieved by deficit-driven selection, not uniform doc sampling (mean doc sizes differ across languages).
- Optimizer: AdamW (β₁=0.9, β₂=0.95, wd=0.1), lr 6e-4, cosine to 10%, 100-step warmup
- Compute: 4× NVIDIA A10G (23 GB), bf16, DDP, micro-batch 16 × grad-accum 2 (eff. batch 128 sequences = 131k tokens/step)
- Tokens consumed at this checkpoint: 100,000,000 byte-premium-adjusted reference tokens
- Per-language epochs at this checkpoint: ≈1.0 each (within the BabyLM ≤10-epoch cap)
Revisions
The chck_{N}M revisions match the BabyLM eval pipeline's fast-eval naming:
markdown
chck_1M, chck_2M, ..., chck_9M, chck_10M, chck_20M, ..., chck_90M, chck_100M
Use revision=chck_NM to load any milestone. The default (main) is chck_100M.
How to evaluate
bash
git clone https://github.com/babylm-org/babylm-evalcd babylm-eval/multilingualbash scripts/zeroshot_model.sh --model_name Shamima/babylm-2026-multilingual-uniform-100Mbash scripts/zeroshot_model_fast_all.sh --model_name Shamima/babylm-2026-multilingual-uniform-100M
Citation
markdown
@misc{babylm-2026-uniform,title = {BabyLM 2026 MultiLingual baseline (byte-premium-uniform)},author = {Hossain, Shamima},year = {2026},url = {https://huggingface.co/Shamima/babylm-2026-multilingual-uniform-100M}}
Companion repo with audit, scaffold, and ablation configs: https://github.com/silvererudite/bb-lm-challenge-sub
Model provider
Shamima
Model tree
Base
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information