- February 11, 2026
- 3 min read
GLM-5: The Open-Source Model for Production-Grade Coding Agents

The FriendliAI team is continuing our partnership with Z.ai, to offer Day 0 support for GLM-5. The most advanced open-source foundation model built for complex systems engineering and long-horizon agent workflows is now available on Friendli Serverless and Dedicated Endpoints.
Advances for GLM-5
GLM-5 introduces new features that strengthen its abilities to reason, execute long-horizon agentic tasks, and write frontend or backend code:
- Agentic long-horizon planning and execution: GLM-5 is purpose-built for multi-stage, long-step complex tasks. It can autonomously decompose system-level requirements with an architect-level approach, while maintaining context coherence and goal alignment across automated workflows that run for hours.
- Backend refactoring and deep debugging: GLM-5 demonstrates strong depth reasoning in backend architecture design, complex algorithm implementation, and difficult bug resolution. It includes robust self-reflection and error correction mechanisms, enabling it to analyze logs, identify root causes, and iteratively fix issues after compilation or runtime failures until the system runs end-to-end.
- Automation of Knowledge Work with Office by Z.ai: GLM-5 can turn text or source materials directly into .docx, .pdf, and .xlsx files—PRDs, lesson plans, exams, spreadsheets, financial reports, run sheets, menus, and more—delivered end-to-end as ready-to-use documents. It supports multi-turn collaboration and turning outputs into real deliverables.
- An open-source alternative with Opus-level intelligence: GLM-5 directly benchmarks against Claude Opus 4.5 in code logic density and systems engineering capability, while providing open-source deployment flexibility and strong cost efficiency.
As a result, GLM-5 delivers production-grade productivity and performance gains beyond GLM-4.7, comparable with top closed-source alternatives, like Claude Opus 4.5.

In a world with Claude Code and OpenClaw, developers desperately need an open-source model that can build complex systems and power agentic applications, not just write code. That’s why FriendliAI serves GLM-5 with the lowest latency, highest throughput and compute efficiency.
Large Model, Efficient Architecture
GLM-5 is an MIT-licensed large multimodal model pre-trained with 740 billion total parameters, 40 billion active parameters, and 28.5 trillion tokens of data. Its Mixture of Experts (MoE) architecture enables the model to retain proper context across multi-turn conversations or tasks, optimizing token utilization.
DeepSeek Sparse Attention (DSA) helps GLM-5 determine which tokens to prioritize during inference. It adapts the sparsity pattern based on input content to balance efficiency with accuracy. The team at Z.ai also incorporated slime into GLM-5, a novel asynchronous reinforcement learning infrastructure that substantially improves training throughput and efficiency for more fine-grained post-training iterations.
One-Click Deployment for GLM-5
Deploy GLM-5 with one-click on FriendliAI by creating your endpoint. Choose between:
- Friendli Dedicated Endpoints for consistent performance and guaranteed availability on reserved GPUs, or…
- Friendli Serverless Endpoints to run inference with low latency and high throughput for testing and developing AI applications
Then, all you need to do is:
- Configure your model and compute instances
- Enable (or disable) LoRA adapters, quantization, autoscaling, and engine configurations
API calls can be made for $1.00 per million input tokens and $3.20 per million output tokens.
Getting Started with FriendliAI
We’re currently offering up to $50,000 in credits to switch to the most performant engine. Contact our team to determine your eligibility while the GPUs are still hot! Visit this page to learn more.
Written by
FriendliAI Tech & Research
Share
General FAQ
What is FriendliAI?
FriendliAI is a GPU-inference platform that lets you deploy, scale, and monitor large language and multimodal models in production, without owning or managing GPU infrastructure. We offer three things for your AI models: Unmatched speed, cost efficiency, and operational simplicity. Find out which product is the best fit for you in here.
How does FriendliAI help my business?
Our Friendli Inference allows you to squeeze more tokens-per-second out of every GPU. Because you need fewer GPUs to serve the same load, the true metric—tokens per dollar—comes out higher even if the hourly GPU rate looks similar on paper. View pricing
Which models and modalities are supported?
Over 380,000 text, vision, audio, and multi-modal models are deployable out of the box. You can also upload custom models or LoRA adapters. Explore models
Can I deploy models from Hugging Face directly?
Yes. A one-click deploy by selecting “Friendli Endpoints” on the Hugging Face Hub will take you to our model deployment page. The page provides an easy-to-use interface for setting up Friendli Dedicated Endpoints, a managed service for generative AI inference. Learn more about our Hugging Face partnership
Still have questions?
If you want a customized solution for that key issue that is slowing your growth, contact@friendli.ai or click Talk to an engineer — our engineers (not a bot) will reply within one business day.

