Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: mit

Architecture

  • Llama (HF LlamaForCausalLM) — RoPE, RMSNorm, SwiGLU, no biases, tied embeddings
  • 12 layers · 768 hidden · 12 heads · 2048 FFN
  • 1024 sequence length
  • 110,119,680 parameters

Tokenizer

Joint byte-level BPE, 32,768 vocab, trained on a balanced 50M-char sample from each of EN/NL/ZH. The same tokenizer is shared across all three languages (see the data card for why a joint tokenizer is required: ZH is 6.8% Latin script).

Training

  • Data: BabyLM-community/babylm-eng + babylm-nld + babylm-zho (BabyBabelLM 2026 100M tier). Full corpora loaded in memory and shuffled (the Hub layout is category-clustered; streaming with reasonable buffers produces a biased sample).
  • Mixture: byte-premium-uniform — equal share of reference tokens per language (1/3 each), achieved by deficit-driven selection, not uniform doc sampling (mean doc sizes differ across languages).
  • Optimizer: AdamW (β₁=0.9, β₂=0.95, wd=0.1), lr 6e-4, cosine to 10%, 100-step warmup
  • Compute: 4× NVIDIA A10G (23 GB), bf16, DDP, micro-batch 16 × grad-accum 2 (eff. batch 128 sequences = 131k tokens/step)
  • Tokens consumed at this checkpoint: 100,000,000 byte-premium-adjusted reference tokens
  • Per-language epochs at this checkpoint: ≈1.0 each (within the BabyLM ≤10-epoch cap)

Revisions

The chck_{N}M revisions match the BabyLM eval pipeline's fast-eval naming:

markdown

chck_1M, chck_2M, ..., chck_9M, chck_10M, chck_20M, ..., chck_90M, chck_100M

Use revision=chck_NM to load any milestone. The default (main) is chck_100M.

How to evaluate

bash

git clone https://github.com/babylm-org/babylm-eval
cd babylm-eval/multilingual
bash scripts/zeroshot_model.sh --model_name Shamima/babylm-2026-multilingual-uniform-100M
bash scripts/zeroshot_model_fast_all.sh --model_name Shamima/babylm-2026-multilingual-uniform-100M

Citation

markdown

@misc{babylm-2026-uniform,
title = {BabyLM 2026 MultiLingual baseline (byte-premium-uniform)},
author = {Hossain, Shamima},
year = {2026},
url = {https://huggingface.co/Shamima/babylm-2026-multilingual-uniform-100M}
}

Companion repo with audit, scaffold, and ablation configs: https://github.com/silvererudite/bb-lm-challenge-sub

Model provider

Shamima

Shamima

Model tree

Base

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today