SupraLabs

Supra-1.5-50M-Base-exp

README

License: apache-2.0

Architecture

The model keeps the original Supra-50M architecture and tokenizer:

Table with columns: Specification, Value
Specification	Value
Architecture	`LlamaForCausalLM`
Parameters	~50M
Vocabulary Size	32,000
Hidden Size	512
Layers	12
Attention Heads	8
KV Heads	4
Context Length	5,120 tokens
Tokenizer	Original Supra byte-level BPE tokenizer

Continued Pretraining Objective

This is CPT, not instruction fine-tuning. Training uses packed raw text with standard causal language-modeling loss:

labels = input_ids
all non-pad tokens are trained
no response-only masking
no system/user/assistant masking
no LoRA adapters in the default run

Data Mix

The current local training mix prepared for this run is:

3,000,000,062 CPT tokens
- 30% Tool Calling
- 30% ChatML Conversations
- 25% Factual Text (articles, essays, blogs)
- 15% Math & Logic Questions

Intended Use

Supervised Fine-Tuning (SFT) and Reinforcement Learning

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Model Details

Model Provider

SupraLabs

Model Tree

Base

this model

Input Modalities

Text

Output Modalities