- December 11, 2025
- 3 min read
GLM-4.6, MiniMax-M2, and Ministral-3 Now Available on FriendliAI

As part of our ongoing commitment to supporting the latest frontier models, we’re excited to announce full support for GLM-4.6, MiniMax-M2, and Ministral-3 across Serverless and Dedicated Endpoints.
The open-source frontier is evolving faster than ever, and GLM-4.6, MiniMax-M2, and Ministral-3 series stand out as three of the most capable models in reasoning, long-context understanding, and tool usage.
All these models are now available on FriendliAI through our OpenAI-compatible API, with full tool-calling support for agentic workflows across Serverless Endpoints and Dedicated Endpoints. For teams building agentic AI systems, this means you can deploy these highly efficient, cutting-edge models with FriendliAI’s signature performance, reliability, and cost efficiency.
Why These New Models Matter
GLM-4.6: Longer Context and Superior Reasoning
- One of the most capable open large-context models available.
- Supports a 200k-token context window.
- Strong at reasoning, coding, and tool use.
- Excels in tasks requiring:
- Sustained long-context understanding
- Structured outputs
- Multi-step reasoning
- Analysis of long documents
- Coordination of complex workflows
- Efficient and practical for production systems needing transparency, customization, and scalability.
- Well-suited for tool-using, RAG, and agentic applications requiring both intelligence and operational flexibility.
MiniMax‑M2: Compact, Efficient, and Scalable
- Sparse Mixture-of-Experts (MoE) model: 230B total parameters, 10B active.
- Delivers state-of-the-art performance on reasoning, coding, and agentic tasks while staying efficient.
- Supports a 128k-token context window.
- Strong at tasks such as:
- Handling long documents or full codebases
- Multi-file edits
- Code generation and fixing
- Long-horizon planning and tool-using pipelines (shell, browser, code runner, retrieval, etc.)
- Offers cutting-edge capability with lower latency, lower cost, and easier scaling than dense frontier models.
Ministral-3 Series: Efficient, Multimodal Reasoning with Vision
- Optimized transformer-based model designed for efficient reasoning and instruction following.
- Delivers strong performance on reasoning, coding, and tool-using tasks with a focus on stability and control.
- Designed for low-latency, cost-efficient deployment compared to large dense frontier models.
- Strong at tasks such as:
- Multi-step reasoning with structured outputs
- Tool calling and function-driven workflows
- Agentic pipelines requiring predictable behavior
- RAG and automation systems operating at scale
- Offers a practical balance of intelligence, efficiency, and reliability for real-world production environments.
Why Run These Models on FriendliAI?
The models like GLM-4.6 and MiniMax-M2 excel at reasoning, coding, and tool use, real-world workflows require more than raw model power. FriendliAI provides the orchestration, integration, and scalability needed to turn these capabilities into practical, production-ready solutions.
By using FriendliAI, you get:
High Throughput & Low Latency
- Advanced batching & scheduling algorithms
- Optimized GPU kernels
- 50% cost reduction with online quantization
- Continuous batching for high-demand workloads
Production-Grade Reliability
- Customizable request-based autoscaling
- Full logs & metrics observability
- Enterprise-grade SLAs
- Globally geo-distributed infrastructure
- SOC2 certified
Flexible Deployment Options
- Serverless Endpoints: Use instantly with no infrastructure setup
- Dedicated Endpoints: Exclusive access to high-demand GPUs
- Container: Run on your public cloud or on-prem clusters
No matter how you deploy, you get FriendliAI’s hallmark scalability, reliability, speed, and cost efficiency.
Get Started with GLM-4.6 & MiniMax-M2 on FriendliAI
Try on Serverless (Instant Access)
Explore all available models immediately on FriendliAI Suite, no setup needed.
Deploy a Dedicated Endpoint
Choose your GPU, deploy in minutes, and scale to production workloads effortlessly.
Build Agents with Tools
Use our OpenAI-compatible API to connect your tools and power agentic workflows with reliability and speed.
👉 Try the models today on FriendliAI and start building your next generation of intelligent agents.
Written by
FriendliAI Tech & Research
Share
General FAQ
What is FriendliAI?
FriendliAI is a GPU-inference platform that lets you deploy, scale, and monitor large language and multimodal models in production, without owning or managing GPU infrastructure. We offer three things for your AI models: Unmatched speed, cost efficiency, and operational simplicity. Find out which product is the best fit for you in here.
How does FriendliAI help my business?
Our Friendli Inference allows you to squeeze more tokens-per-second out of every GPU. Because you need fewer GPUs to serve the same load, the true metric—tokens per dollar—comes out higher even if the hourly GPU rate looks similar on paper. View pricing
Which models and modalities are supported?
Over 380,000 text, vision, audio, and multi-modal models are deployable out of the box. You can also upload custom models or LoRA adapters. Explore models
Can I deploy models from Hugging Face directly?
Yes. A one-click deploy by selecting “Friendli Endpoints” on the Hugging Face Hub will take you to our model deployment page. The page provides an easy-to-use interface for setting up Friendli Dedicated Endpoints, a managed service for generative AI inference. Learn more about our Hugging Face partnership
Still have questions?
If you want a customized solution for that key issue that is slowing your growth, contact@friendli.ai or click Talk to an engineer — our engineers (not a bot) will reply within one business day.

