Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: mitArchitecture
- Llama (HF
LlamaForCausalLM) — RoPE, RMSNorm, SwiGLU, no biases, tied embeddings - 12 layers · 768 hidden · 12 heads · 2048 FFN
- 1024 sequence length
- 110,119,680 parameters
- Tokenizer: joint byte-level BPE 32 768 (same as v1; reused so the two are directly comparable)
Training
- Data: BabyBabelLM 2026 100M tier (EN/NL/ZH); full corpora loaded in memory and shuffled
- Mixture: byte-premium-uniform via deficit-driven selection (1/3 of reference tokens per language)
- Optimiser: AdamW (β1=0.9, β2=0.95, wd=0.1)
- LR: 6e-4 peak, WSD schedule (warmup 200 → constant peak → linear 25% decay tail to 6e-5)
- Compute: 4× NVIDIA A10G (23 GB), bf16, DDP, micro-batch 16 × grad-accum 2 (eff. batch 128 sequences = 131k tokens/step)
- Tokens consumed at this checkpoint: 100,016,896 byte-premium-adjusted reference tokens (= 1 epoch over the corpus)
- Per-language epochs at this checkpoint: ~1.0 each (well within the BabyLM ≤10-epoch cap)
Revisions
19 fast-eval branches: chck_1M, chck_2M, …, chck_9M, chck_10M, chck_20M, …, chck_90M, chck_100M.
main is chck_100M.
How to evaluate
bash
git clone https://github.com/babylm-org/babylm-evalcd babylm-eval/multilingualbash scripts/zeroshot_model.sh --model_name Shamima/babylm-2026-multilingual-uniform-100M-v2bash scripts/zeroshot_model_fast_all.sh --model_name Shamima/babylm-2026-multilingual-uniform-100M-v2
Comparison vs v1
See https://github.com/silvererudite/bb-lm-challenge-sub for the iteration log, scaffold, and ablation configs.
Citation
markdown
@misc{babylm-2026-uniform-v2,title = {BabyLM 2026 MultiLingual baseline v2 (WSD schedule)},author = {Hossain, Shamima},year = {2026},url = {https://huggingface.co/Shamima/babylm-2026-multilingual-uniform-100M-v2}}
Model provider
Shamima
Model tree
Base
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information