Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Bundle
- Format: JANGTQ
- Profile: JANGTQ_2
- Family: mimo_v2
- Layers: 48
- Routed experts: 256
- TQ layout: prestacked_switch_mlp
- Routed expert bits: 2-bit packed TQ experts
- Runtime sidecar:
jangtq_runtime.safetensors - Tokenizer: Qwen2Tokenizer
- Chat template:
chat_template.jinja - Verified local size: 79G
Modalities
Text runtime path is the primary target. Vision and audio weights/tokenizer assets are preserved in the bundle for runtimes that wire MiMo multimodal execution.
Files
This repo includes the safetensor shard index, quantization metadata, tokenizer files, generation_config.json, MiMo configuration code, and the TurboQuant runtime sidecar required by JANGTQ loaders.
Model provider
OsaurusAI
Model tree
Base
XiaomiMiMo/MiMo-V2.5
Fine-tuned
this model
Modalities
Input
Video, Audio, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information