Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: mit

Architecture

  • Llama (HF LlamaForCausalLM) — RoPE, RMSNorm, SwiGLU, no biases, tied embeddings
  • 12 layers · 768 hidden · 12 heads · 2048 FFN
  • 1024 sequence length
  • 110,119,680 parameters
  • Tokenizer: joint byte-level BPE 32 768 (same as v1; reused so the two are directly comparable)

Training

  • Data: BabyBabelLM 2026 100M tier (EN/NL/ZH); full corpora loaded in memory and shuffled
  • Mixture: byte-premium-uniform via deficit-driven selection (1/3 of reference tokens per language)
  • Optimiser: AdamW (β1=0.9, β2=0.95, wd=0.1)
  • LR: 6e-4 peak, WSD schedule (warmup 200 → constant peak → linear 25% decay tail to 6e-5)
  • Compute: 4× NVIDIA A10G (23 GB), bf16, DDP, micro-batch 16 × grad-accum 2 (eff. batch 128 sequences = 131k tokens/step)
  • Tokens consumed at this checkpoint: 100,016,896 byte-premium-adjusted reference tokens (= 1 epoch over the corpus)
  • Per-language epochs at this checkpoint: ~1.0 each (well within the BabyLM ≤10-epoch cap)

Revisions

19 fast-eval branches: chck_1M, chck_2M, …, chck_9M, chck_10M, chck_20M, …, chck_90M, chck_100M. main is chck_100M.

How to evaluate

bash

git clone https://github.com/babylm-org/babylm-eval
cd babylm-eval/multilingual
bash scripts/zeroshot_model.sh --model_name Shamima/babylm-2026-multilingual-uniform-100M-v2
bash scripts/zeroshot_model_fast_all.sh --model_name Shamima/babylm-2026-multilingual-uniform-100M-v2

Comparison vs v1

See https://github.com/silvererudite/bb-lm-challenge-sub for the iteration log, scaffold, and ablation configs.

Citation

markdown

@misc{babylm-2026-uniform-v2,
title = {BabyLM 2026 MultiLingual baseline v2 (WSD schedule)},
author = {Hossain, Shamima},
year = {2026},
url = {https://huggingface.co/Shamima/babylm-2026-multilingual-uniform-100M-v2}
}

Model provider

Shamima

Shamima

Model tree

Base

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today