FlameF0X

FlameF0X

TinyMoE-100m-2x8

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: mit

Model Details

  • Architecture: Sparse Mixture of Experts (MoE)
  • Total Parameters: 99,809,280 (~100M total parameters)
  • Active Parameters per Token: 22,544,640 (~22.5M active parameters)
  • Expert Configuration: 8 total local experts, 2 active experts routed per token (num_experts_per_tok": 2)
  • Context Length: 1024 tokens
  • Base Architecture: Mixtral / Mistral For Causal LM
  • License: MIT

Parameter Breakdown

Unlike a standard dense model, an MoE model stores a larger footprint of parameters on disk but selectively activates only a subset for any given token during a forward pass:

Table
ComponentTotal ParametersStatus During Inference
Embeddings (Input + LM Head)24,576,000Always Active
Attention Blocks (10 Layers)4,423,680Always Active
MoE Routers (10 Layers)30,720Always Active
Experts (8 Total across 10 Layers)70,778,8802 of 8 Active per Layer (~17.6M active)
Overall Footprint99,809,28022,544,640 Active per Token

Training Data

This model was trained on a high-quality mixture of datasets to balance narrative fluidness with factual language structural grounding:

  • TinyStories: For coherent, creative synthetic narrative generation.
  • WikiText-103: For general knowledge syntax, vocabulary diversity, and structural language understanding.

Quick Start

You can load and experiment with this model using the Hugging Face transformers library:

python

from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "FlameF0X/TinyMoE-100M-2x8"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
input_text = "Once upon a time,"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Model provider

FlameF0X

FlameF0X

Model tree

Base

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today