Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

🏆 Benchmarks

BenchmarkSupra-50M (ours)GPT-2 (124M)SmolLM-135MOpenELM-270M
Parameters50M124M (2.5×)135M (2.7×)270M (5.4×)
BLiMP (linguistics)76.3%63.0%69.8%(N/A)
SciQ (science)77.2%53.2%73.4%84.70%
ARC-Easy (knowledge)52.2%42.0%49.2%45.08%
PIQA (logic)62.2%63.0%67.3%69.75%
HellaSwag (context)31.8%29.5%42.0%46.71%
TaskMetricValue
arc_easyacc,none0.5185
arc_easyacc_stderr,none0.0103
arc_easyacc_norm,none0.4600
arc_easyacc_norm_stderr,none0.0102
arc_challengeacc,none0.2159
arc_challengeacc_stderr,none0.0120
arc_challengeacc_norm,none0.2517
arc_challengeacc_norm_stderr,none0.0127
hellaswagacc,none0.2903
hellaswagacc_stderr,none0.0045
hellaswagacc_norm,none0.3172
hellaswagacc_norm_stderr,none0.0046
winograndeacc,none0.5154
winograndeacc_stderr,none0.0140
piqaacc,none0.6251
piqaacc_stderr,none0.0113
piqaacc_norm,none0.6219
piqaacc_norm_stderr,none0.0113
openbookqaacc,none0.1860
openbookqaacc_stderr,none0.0174
openbookqaacc_norm,none0.3080
openbookqaacc_norm_stderr,none0.0207
boolqacc,none0.5303
boolqacc_stderr,none0.0087

Benchmarks

image


🧠 Model Architecture & Hyperparameters

Supra-50M is based on the LlamaForCausalLM architecture with the following configuration:

HyperparameterValue
ArchitectureLlama (decoder-only transformer)
Parameters~50M
vocab_size32,000
hidden_size512
intermediate_size1,408
num_hidden_layers12
num_attention_heads8
num_key_value_heads4 (GQA)
max_position_embeddings1,024
rope_theta10,000
tie_word_embeddingsTrue

📚 Training Data

PropertyValue
DatasetHuggingFaceFW/fineweb-edu (sample-100BT split)
Total tokens20,000,000,000 (20B)
Sequence length1,024 tokens
Storage formatMemory-mapped binary (uint16, ~40 GB)

🔤 Tokenizer

A custom Byte-Level BPE tokenizer was trained from scratch on 500,000 documents sampled from fineweb-edu (sample-10BT).

PropertyValue
TypeByteLevelBPETokenizer
Vocabulary size32,000
Min frequency2
Special tokens<s>, <pad>, </s>, <unk>, <mask>

⚙️ Training Configuration

ParameterValue
Epochs1
Per-device batch size32
Gradient accumulation steps4
Effective batch size128 × 1,024 tokens
Learning rate6e-4
LR schedulerCosine
Warmup ratio2%
OptimizerAdamW Fused (adam_beta1=0.9, adam_beta2=0.95)
Weight decay0.1
Max grad norm1.0
Precisionbfloat16
torch.compileEnabled
HardwareSingle GPU
Final loss3.259

🚀 Inference

python

from transformers import pipeline
import torch
print("[*] Loading Supra-50M model from Hugging Face Hub...")
pipe = pipeline(
"text-generation",
model="SupraLabs/Supra-50M_BASE",
device_map="auto",
torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32
)
def generate_text(prompt, max_new_tokens=150):
result = pipe(
prompt,
max_new_tokens=max_new_tokens,
do_sample=True,
temperature=0.5,
top_k=25,
top_p=0.9,
repetition_penalty=1.2,
pad_token_id=pipe.tokenizer.pad_token_id,
eos_token_id=pipe.tokenizer.eos_token_id
)
return result[0]['generated_text']
# Example
prompt = "The importance of education is"
print(f"\nPrompt: {prompt}")
print("-" * 40)
print("\nOutput:\n" + generate_text(prompt))

💬 Sample Outputs

Prompt: "The main concept of physics is "

The main concept of physics is iffy, and the idea that we can make things behave in a certain way. The most important part of physics is called quantum mechanics which states that all particles are made up of energy (energy) and matter (matter). In physics, there are two types of particles: elementary particles and exotic ones. These particles have properties like mass, speed or momentum but they don’t interact with each other to form new objects. This is because these particles do not exist independently from one another. In this case, an exotic particle might be created by adding more energy into its structure than it would take for a normal particle. However, when you add additional energy to an exotic particle, the new object will become smaller and larger until it becomes too large to fit within the existing structure. If you think about how light travels through space, it takes around 20 billion years before the light reaches our eyes. Light waves travel faster than light at high speeds so if we could create some kind of light wave, then we wouldn’t need any special equipment. It just needs a few hundred millionths of a second to produce light rays. So even though the light is moving along the same path as the current, the speed of light is different depending on where the light hits the

Prompt: "Artificial intelligence is "

Artificial intelligence is iffy, it can be used to make intelligent machines that could take over the world. What does Artificial Intelligence mean? AI refers to artificial intelligence and machine learning technology which is a type of computer science (also known as artificial intelligence) in which computers are programmed with knowledge about their environment or other objects. The term AI comes from the Greek word "art" meaning "to create." The most common uses for AI include: - Machine Learning This means using algorithms like natural language processing systems to learn how words work together to form sentences such as “I am going to go to the store.” These programs will then use these rules to decide whether they should buy something or not so that you know what’s being sold on the internet. For example, if you purchase an ebook at Amazon, you may want to check its price first before purchasing it. If this happens, your shopping cart might look different than it did when purchased by someone else who bought it earlier. You can also think of AI as a way to help people understand themselves better through training and reasoning rather than simply seeing them doing things differently. In fact, we often see AI models working very well because of the way humans interact with our minds. This ability makes us more effective

Prompt: "Once upon a time, "

Once upon a time, ...... I was so excited about the new school year and wanted to make some changes in my life. I had been looking for ways to help me become more self-aware. As an adult, I have always felt that there is no one way of doing things without thinking first. This has led me to start making small changes at home or at work. One such change was to create a space where I could be more mindful and aware of myself as well as other people around me. It’s important to remember that we all need our own personal growth and development. We can do this by taking responsibility for ourselves; being responsible for what happens outside us and keeping it within our control. By creating these smaller steps towards becoming more conscious of yourself, you will see how much better your future looks!The word "treaty" means something like "a treaty made with a king." The French word for "covenant," célèbre (French: cœle), comes from the Latin cecus ("to give up"). A covenant is not a binding agreement but rather an act of mutual understanding between two parties. In general terms, a contract is anything agreed on which someone agrees to agree to receive certain benefits. For example, if a person


📄 License

This model is released under the Apache 2.0 License.


Third-party validation benchmarks by Datdanboi25

image


© SupraLabs 2026 — Project Chimera

Model provider

SupraLabs

Model tree

Base

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today