Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Training summary

Approximate training stages:

  • 1B tokens: Cosmopedia v2 bootstrap pretraining.
  • +1.5B tokens: mixed continuation using Cosmopedia-v2 repository configs including cosmopedia-v2, fineweb-edu-dedup, and python-edu.
  • +2.5B tokens: Went back to Cosmopedia v2 but increased context length from 512 -> 1024.
  • Total: about 5B pretraining tokens.

Architecture

Veyra-30M is a small attention-sparse decoder-only language model.

Key details:

  • Exact parameters: 31,988,224 / 31.99M
  • Vocabulary: 8,192 tokens
  • Hidden size: 512
  • Layers: 8
  • Attention heads: 8 query heads, 2 KV heads
  • MLP intermediate size: 2048
  • Activation: SwiGLU
  • Normalization: RMSNorm
  • Position encoding: RoPE
  • Tied token embeddings / LM head
  • Context in this checkpoint: 1024 tokens

Loading

This repository uses custom Transformers code.

Minimal usage:

markdown

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
repo = "veyra-ai/veyra-30m-base-5b-tokens"
tokenizer = AutoTokenizer.from_pretrained(repo, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(repo, trust_remote_code=True, dtype=torch.float32)
model.eval()
prompt = "Photosynthesis is the process by which"
input_ids = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
with torch.no_grad():
out = model.generate(
input_ids,
do_sample=True,
temperature=0.5,
top_k=30,
repetition_penalty=1.15,
no_repeat_ngram_size=2,
max_new_tokens=80,
)
print(tokenizer.decode(out[0], skip_special_tokens=True))

For raw completion prompts, use add_special_tokens=False.

Optimizer

Training used:

  • CosineGatedAdam / CGA-v0 on 2D projection matrices
  • AdamW on embeddings, norms, tied head, and auxiliary parameters

Intended use

This checkpoint is primarily for:

  • continued pretraining
  • research / ablations
  • tracking Veyra training milestones
  • testing tiny model behavior

It is not intended for production use or reliable factual answering.

Known limitations

This model can:

  • hallucinate confidently
  • repeat phrases
  • fail arithmetic
  • fail simple factual questions
  • produce fake code
  • continue in textbook-like or tutorial-like styles

Further continuation pretraining and post-training are planned.

Model provider

veyra-ai

Model tree

Base

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today