Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: mit๐ Model Details
- Model Name: TIGER-OM (SKT-OM)
- Architecture: Mixture of Experts (MoE)
- Total Parameters: 13B (Active parameters much lower due to MoE sparsity)
- Base Models:
- Primary Base: Shrijanagain/ST-X-0
- Expert Integration: Mistral-7B
- Format: Safetensors (Safe & Fast loading)
- Quantization: FP16 / BF16 (Original) + Q4_K_M GGUF available in separate repo
- Context Length: 8192 tokens
- Training Hardware: AMD Developer Cloud GPUs ($100 developer credits)
- Inference Optimized: ROCm 7.0 + vLLM + AMD MI300X
๐ Key Features
- True MoE Architecture โ Sparse activation for better efficiency and performance
- Think Mode Reasoning โ Advanced Chain-of-Thought, Planning, Self-Reflection & Verification
- Dynamic Plugin System โ Intelligent routing to Code, Math, Search, Data Analysis plugins
- Agentic Capabilities โ Full LangGraph multi-agent workflow
- Advanced RAG Integration โ SKT RAG + Query Rewriting + Multi-hop + Reranking
- Stateful Memory โ Persistent conversation context
๐๏ธ Architecture Breakdown
TIGER-OM is built on a 13B MoE backbone:
- Base: Shrijanagain/ST-X-0 (strong foundational model)
- Experts: Fine-tuned using Mistral-7B as expert layers for specialized reasoning and tool-use capabilities
- Router Network: Learned gating mechanism for expert selection
- Think Mode Layer: Custom system prompt + reasoning controller
- Plugin Head: Tool calling & execution layer
This hybrid approach (ST-X-0 + Mistral-7B experts) gives excellent reasoning, code understanding, and general intelligence while maintaining MoE efficiency.
๐ Files in this Repo (Safetensors)
model-00001-of-0000X.safetensorsโ Main model weightsconfig.jsontokenizer.json/tokenizer_config.jsongeneration_config.jsonspecial_tokens_map.jsonmodel.safetensors.index.json
All weights are in safe safetensors format โ No pickle risk.
๐ How to Use (Safetensors)
python
from transformers import AutoModelForCausalLM, AutoTokenizerimport torchmodel_name = "Shrijanagain/TIGER-OM"tokenizer = AutoTokenizer.from_pretrained(model_name)model = AutoModelForCausalLM.from_pretrained(model_name,torch_dtype=torch.bfloat16,device_map="auto",trust_remote_code=True)prompt = """You are SKT-OM, an advanced agentic AI with Think Mode enabled.User Query: Calculate training cost comparison and suggest best option..."""inputs = tokenizer(prompt, return_tensors="pt").to(model.device)outputs = model.generate(**inputs,max_new_tokens=1024,temperature=0.7,top_p=0.9,do_sample=True,repetition_penalty=1.1)print(tokenizer.decode(outputs[0], skip_special_tokens=True))
๐ Important Links
- Live Demo: SKT-OM Space
- GGUF Quantized (Q4_K_M): Shrijanagain/TIGER-GGUF
- GitHub (RAG + ADK Code): SHRIJANAGAIN/SKT-AMD-FILES
๐ ๏ธ Technologies & Stack
- Base Models: Shrijanagain/ST-X-0 + Mistral-7B Experts
- RAG: SKT RAG + AMD ADK Kit
- Agents: LangGraph
- Hardware: AMD MI300X + ROCm 7.0
- Inference: vLLM (FP16) + transformers (Safetensors)
- Training: AMD Developer Cloud
โก Performance
- Excellent balance of quality vs efficiency due to MoE architecture
- Strong performance on reasoning, tool-use, code, and multi-step tasks
- Significantly lower inference cost compared to dense 13B+ models
๐ Use Cases
- Complex technical Q&A
- Agentic workflows & tool calling
- Research assistance
- Code generation & debugging
- Mathematical & logical reasoning
- Comparative analysis
- Data analysis with plugins
๐ Hackathon
AMD Developer Hackathon 2026
Trained entirely on AMD Developer Cloud
Fully built in public with multiple technical updates.
๐ License
MIT License
Model provider
Shrijanagain
Model tree
Base
mistralai/Mistral-7B-Instruct-v0.3
Fine-tuned
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information