Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: mitModel Details
- Parameters: ~362M (F32) — marketed as 250M class
- Architecture: LlamaForCausalLM (custom reconfiguration)
- Hidden size: 960
- Layers: 32
- Attention heads: 15
- KV heads: 5 (GQA)
- Intermediate size: 2560
- Max context: 8192 tokens
- Vocab size: 49,152
- Activation: SiLU
- Tokenizer: SmolLM2 tokenizer with ChatML formatting (
<|im_start|>/<|im_end|>) - License: MIT
Key Differences from Source
Unlike the base SmolLM2-360M, Axon 250M was created through architectural merging and reconfiguration:
- Restructured layer count and attention configuration
- GQA with 5 KV heads for efficient inference
- Custom head dimension of 64
- RoPE with theta=100000
Usage
python
from transformers import AutoModelForCausalLM, AutoTokenizermodel = AutoModelForCausalLM.from_pretrained("axonlabsai/axon-250m", torch_dtype="auto", device_map="auto")tokenizer = AutoTokenizer.from_pretrained("axonlabsai/axon-250m")messages = [{"role": "user", "content": "Hey, what's up?"}]inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)output = model.generate(inputs, max_new_tokens=128)print(tokenizer.decode(output[0], skip_special_tokens=True))
Limitations
- NOT fine-tuned — no task-specific training was performed
- Very small model with limited reasoning and factual knowledge
- Prone to hallucination and incoherent outputs on complex prompts
- Best suited for simple chat and experimentation, not production use
- The "250M" branding reflects its model class, actual parameter count is ~362M
About Axon Labs
Axon Labs builds AI models and tools. This is our tiny model — small enough to run anywhere, dumb enough to be funny.
Model provider
axonlabsai
Model tree
Base
HuggingFaceTB/SmolLM2-360M
Quantized
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information