Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: mit

Model Details

  • Parameters: ~362M (F32) — marketed as 250M class
  • Architecture: LlamaForCausalLM (custom reconfiguration)
  • Hidden size: 960
  • Layers: 32
  • Attention heads: 15
  • KV heads: 5 (GQA)
  • Intermediate size: 2560
  • Max context: 8192 tokens
  • Vocab size: 49,152
  • Activation: SiLU
  • Tokenizer: SmolLM2 tokenizer with ChatML formatting (<|im_start|> / <|im_end|>)
  • License: MIT

Key Differences from Source

Unlike the base SmolLM2-360M, Axon 250M was created through architectural merging and reconfiguration:

  • Restructured layer count and attention configuration
  • GQA with 5 KV heads for efficient inference
  • Custom head dimension of 64
  • RoPE with theta=100000

Usage

python

from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("axonlabsai/axon-250m", torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("axonlabsai/axon-250m")
messages = [{"role": "user", "content": "Hey, what's up?"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
output = model.generate(inputs, max_new_tokens=128)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Limitations

  • NOT fine-tuned — no task-specific training was performed
  • Very small model with limited reasoning and factual knowledge
  • Prone to hallucination and incoherent outputs on complex prompts
  • Best suited for simple chat and experimentation, not production use
  • The "250M" branding reflects its model class, actual parameter count is ~362M

About Axon Labs

Axon Labs builds AI models and tools. This is our tiny model — small enough to run anywhere, dumb enough to be funny.

Model provider

axonlabsai

Model tree

Base

HuggingFaceTB/SmolLM2-360M

Quantized

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today