FlameF0X
TinyMoE-100m-2x8
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: mitModel Details
- Architecture: Sparse Mixture of Experts (MoE)
- Total Parameters: 99,809,280 (~100M total parameters)
- Active Parameters per Token: 22,544,640 (~22.5M active parameters)
- Expert Configuration: 8 total local experts, 2 active experts routed per token (
num_experts_per_tok": 2) - Context Length: 1024 tokens
- Base Architecture: Mixtral / Mistral For Causal LM
- License: MIT
Parameter Breakdown
Unlike a standard dense model, an MoE model stores a larger footprint of parameters on disk but selectively activates only a subset for any given token during a forward pass:
| Component | Total Parameters | Status During Inference |
|---|---|---|
| Embeddings (Input + LM Head) | 24,576,000 | Always Active |
| Attention Blocks (10 Layers) | 4,423,680 | Always Active |
| MoE Routers (10 Layers) | 30,720 | Always Active |
| Experts (8 Total across 10 Layers) | 70,778,880 | 2 of 8 Active per Layer (~17.6M active) |
| Overall Footprint | 99,809,280 | 22,544,640 Active per Token |
Training Data
This model was trained on a high-quality mixture of datasets to balance narrative fluidness with factual language structural grounding:
- TinyStories: For coherent, creative synthetic narrative generation.
- WikiText-103: For general knowledge syntax, vocabulary diversity, and structural language understanding.
Quick Start
You can load and experiment with this model using the Hugging Face transformers library:
python
from transformers import AutoModelForCausalLM, AutoTokenizermodel_id = "FlameF0X/TinyMoE-100M-2x8"tokenizer = AutoTokenizer.from_pretrained(model_id)model = AutoModelForCausalLM.from_pretrained(model_id)input_text = "Once upon a time,"inputs = tokenizer(input_text, return_tensors="pt")outputs = model.generate(**inputs, max_new_tokens=50)print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Model provider
FlameF0X
Model tree
Base
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information