Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Model Config

  • Parameters: 1,410,688 (1M)
  • Architecture: Llama
  • Vocab size with custom BPE tokenizer: 4096
  • Hidden Size: 128
  • Intermediate Size: 256
  • Hidden Layers: 6
  • Attention Heads: 4
  • Key Value Heads: 2
  • Max Position Embeddings: 1024
  • Learning rate: 6e-4
  • Weight Decay: 0.1
  • Trained in bfloat16

Final Loss

This model reached a final CrossEntropy loss (on the train set) of 3.79.

Benchmarks

All benchmarks were executed using lm_eval.

TaskValueRandom level
Arc_Easy ↑0.30260.25 (25%)
Wikitext (byte PPL) ↓3.0043-
BLiMP ↑0.61860.5 (50%)

For further benchmarks, see benchmarks.md in this repo's files list.

Usage

To use our model, just run this code:

python3

from transformers import pipeline
import torch
print("Loading Supra Mini v6 1M model from Hugging Face...")
pipe = pipeline(
"text-generation",
model="SupraLabs/Supra-Mini-v6-1M",
device_map="auto",
torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32
)
def generate_text(prompt, max_length=150):
result = pipe(
prompt,
max_new_tokens=max_length,
do_sample=True,
temperature=0.5,
top_k=25,
top_p=0.9,
repetition_penalty=1.2,
pad_token_id=pipe.tokenizer.pad_token_id,
eos_token_id=pipe.tokenizer.eos_token_id
)
return result[0]['generated_text']
test_prompt = "The importance of education is"
print(f"\nPrompt: {test_prompt}")
print("-" * 30)
print("\nOutput:\n" + generate_text(test_prompt))

Use cases

  1. Educational research
  2. deployment or testing/fine-tuning on edge environments
  3. Or more simply, for fun

Limitations

  1. Cannot reason, chat, or code
  2. Incoherent more often than not
  3. Mostly unfactual

Training guide

We trained Supra Mini v6 1M on a single NVIDIA RTX 5060 Ti 16GB in ~3 hours for 1 epoch. The full training code can be found in this repo as train_tokenizer.py (train costum BPE tokenizer with vocab size of 16384) and train_model.py (train the model). The model was trained on the first 5 billion tokens of 70% Sample-10BT from Fineweb-Edu and 30% Cosmopedia-v2.

Model provider

SupraLabs

Model tree

Base

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today