December 11, 2025
3 min read

GLM-4.6, MiniMax-M2, and Ministral-3 Now Available on FriendliAI

TL;DR

FriendliAI now supports GLM-4.6, MiniMax-M2, and Ministral-3, offering these high-capability models via OpenAI-compatible APIs on both Serverless and Dedicated Endpoints.
These models provide advanced reasoning and long-context support (up to 200k tokens for GLM-4.6), making them ideal for complex agentic workflows and large-scale RAG applications.
By deploying on FriendliAI, users gain optimized GPU kernels, 50% cost reductions via online quantization, and enterprise-grade reliability with 99.99% uptime SLAs.

GLM-4.6, MiniMax-M2, and Ministral-3 Now Available on FriendliAI thumbnail

As part of our ongoing commitment to supporting the latest frontier models, we’re excited to announce full support for GLM-4.6, MiniMax-M2, and Ministral-3 across Serverless and Dedicated Endpoints.

The open-source frontier is evolving faster than ever, and GLM-4.6, MiniMax-M2, and Ministral-3 series stand out as three of the most capable models in reasoning, long-context understanding, and tool usage.

All these models are now available on FriendliAI through our OpenAI-compatible API, with full tool-calling support for agentic workflows across Serverless Endpoints and Dedicated Endpoints. For teams building agentic AI systems, this means you can deploy these highly efficient, cutting-edge models with FriendliAI’s signature performance, reliability, and cost efficiency.

Why These New Models Matter

GLM-4.6: Longer Context and Superior Reasoning

One of the most capable open large-context models available.
Supports a 200k-token context window.
Strong at reasoning, coding, and tool use.
Excels in tasks requiring:
- Sustained long-context understanding
- Structured outputs
- Multi-step reasoning
- Analysis of long documents
- Coordination of complex workflows
Efficient and practical for production systems needing transparency, customization, and scalability.
Well-suited for tool-using, RAG, and agentic applications requiring both intelligence and operational flexibility.

MiniMax‑M2: Compact, Efficient, and Scalable

Sparse Mixture-of-Experts (MoE) model: 230B total parameters, 10B active.
Delivers state-of-the-art performance on reasoning, coding, and agentic tasks while staying efficient.
Supports a 128k-token context window.
Strong at tasks such as:
- Handling long documents or full codebases
- Multi-file edits
- Code generation and fixing
- Long-horizon planning and tool-using pipelines (shell, browser, code runner, retrieval, etc.)
Offers cutting-edge capability with lower latency, lower cost, and easier scaling than dense frontier models.

Ministral-3 Series: Efficient, Multimodal Reasoning with Vision

Optimized transformer-based model designed for efficient reasoning and instruction following.
Delivers strong performance on reasoning, coding, and tool-using tasks with a focus on stability and control.
Designed for low-latency, cost-efficient deployment compared to large dense frontier models.
Strong at tasks such as:
- Multi-step reasoning with structured outputs
- Tool calling and function-driven workflows
- Agentic pipelines requiring predictable behavior
- RAG and automation systems operating at scale
Offers a practical balance of intelligence, efficiency, and reliability for real-world production environments.

Why Run These Models on FriendliAI?

The models like GLM-4.6 and MiniMax-M2 excel at reasoning, coding, and tool use, real-world workflows require more than raw model power. FriendliAI provides the orchestration, integration, and scalability needed to turn these capabilities into practical, production-ready solutions.

By using FriendliAI, you get:

High Throughput & Low Latency

Advanced batching & scheduling algorithms
Optimized GPU kernels
50% cost reduction with online quantization
Continuous batching for high-demand workloads

Production-Grade Reliability

Customizable request-based autoscaling
Full logs & metrics observability
Enterprise-grade SLAs
Globally geo-distributed infrastructure
SOC2 certified

Flexible Deployment Options

Serverless Endpoints: Use instantly with no infrastructure setup
Dedicated Endpoints: Exclusive access to high-demand GPUs
Container: Run on your public cloud or on-prem clusters

No matter how you deploy, you get FriendliAI’s hallmark scalability, reliability, speed, and cost efficiency.

Get Started with GLM-4.6 & MiniMax-M2 on FriendliAI

Try on Serverless (Instant Access)
Explore all available models immediately on FriendliAI Suite, no setup needed.

Deploy a Dedicated Endpoint
Choose your GPU, deploy in minutes, and scale to production workloads effortlessly.

Build Agents with Tools

Use our OpenAI-compatible API to connect your tools and power agentic workflows with reliability and speed.

👉 Try the models today on FriendliAI and start building your next generation of intelligent agents.

Written by

FriendliAI Tech & Research

General FAQ

What is FriendliAI?

FriendliAI is a GPU-inference platform that lets you deploy, scale, and monitor large language and multimodal models in production, without owning or managing GPU infrastructure. We offer three things for your AI models: Unmatched speed, cost efficiency, and operational simplicity. Find out which product is the best fit for you in here.

How does FriendliAI help my business?

Our Friendli Inference allows you to squeeze more tokens-per-second out of every GPU. Because you need fewer GPUs to serve the same load, the true metric—tokens per dollar—comes out higher even if the hourly GPU rate looks similar on paper. View pricing

Which models and modalities are supported?

Over 520,000 text, vision, audio, and multi-modal models are deployable out of the box. You can also upload custom models or LoRA adapters. Explore models

Can I deploy models from Hugging Face directly?

Yes. A one-click deploy by selecting “Friendli Endpoints” on the Hugging Face Hub will take you to our model deployment page. The page provides an easy-to-use interface for setting up Friendli Dedicated Endpoints, a managed service for generative AI inference. Learn more about our Hugging Face partnership

Still have questions?

If you want a customized solution for that key issue that is slowing your growth, contact@friendli.ai or click Talk to an engineer — our engineers (not a bot) will reply within one business day.