June 17, 2026
5 min read

GLM-5.2, Day-0 on FriendliAI Model APIs: The Strongest Open Model for Long-Horizon Agentic and Coding Tasks

TL;DR

FriendliAI has launched Day-0 support for GLM-5.2, Z.ai's new flagship open-weight model, available now through FriendliAI Model APIs — serverless, pay-per-token, and OpenAI-compatible. Start sending requests in minutes, with no GPUs to manage.
Built for long-horizon agents and code. GLM-5.2 is a 744B-parameter Mixture-of-Experts model (~40B active per token) with a usable 1M-token context window and two thinking-effort levels (High and Max), purpose-built to plan, execute, test, and fix across long agentic sessions.
The strongest open-weight model for agentic coding. GLM-5.2 is the first open-weight model to cross 80% on Terminal-Bench 2.1 (81.0)

GLM-5.2, Day-0 on FriendliAI Model APIs: The Strongest Open Model for Long-Horizon Agentic and Coding Tasks thumbnail

GLM-5.2 is Z.ai's most capable open-weight model to date, and it is built for the workloads that define modern AI: autonomous coding agents, multi-step tool use, and long-horizon tasks that have to stay coherent over hours, not seconds. It inherits the 744B-parameter Mixture-of-Experts foundation of the GLM-5 series — activating roughly 40B parameters per token — and pairs it with a genuinely usable 1M-token context window and a new effort-level system that lets you dial reasoning depth up or down per request.

FriendliAI is proud to launch Day-0 support for GLM-5.2 via Model APIs. OpenAI-compatible, production-grade, and tuned for the MoE and long-context patterns that make this model efficient. No capacity planning, no GPU orchestration, no cold starts to babysit. Point your agent at FriendliAI and go.

Whether you're building a coding agent that refactors an entire repository in a single session, an orchestration layer that fans out across dozens of tools, or a long-running research agent, GLM-5.2 on FriendliAI gives you frontier-class open intelligence at a fraction of closed-model cost.

How GLM-5.2 performs

Across a broad suite of coding, repository, and agentic tool-use benchmarks — all evaluated under maximum thinking effort, GLM-5.2 establishes itself as the leading open-weight model and trades blows with frontier closed systems.

Image 1) Z.ai GLM-5.2 technical blog: z.ai/blog/glm-5.2

A few results stand out for agent builders:

Terminal-Bench 2.1: 81.0 — GLM-5.2 is the first open-weight model to cross the 80% mark, landing within a few points of the best closed models and far ahead of every other open option.
MCP-Atlas: 77.0 — essentially tied with the top frontier model on tool orchestration over the Model Context Protocol, and ahead of GPT-5.5 and Gemini 3.1 Pro.
SWE-bench Pro: 62.1 and ProgramBench: 63.7 — leading the open field and ahead of GPT-5.5 and Gemini 3.1 Pro on real-world software engineering and program synthesis.
Humanity's Last Exam (with tools): 54.7 — competitive with frontier closed models on hard, tool-augmented reasoning.

Where GLM-5.2 really shows its character, though, is in how it scales with effort. Reasoning effort buys accuracy, and GLM-5.2 converts additional output tokens into real gains.

Image 2) Z.ai GLM-5.2 technical blog: z.ai/blog/glm-5.2

Averaged over Terminal-Bench 2.1, DeepSWE, and SWE-Atlas QnA, GLM-5.2 climbs from its Non-Thinking baseline to roughly 75% at Max effort, a steep improvement over GLM-5.1 and competitive with much larger closed models on the same effort curve. The takeaway for production: you control the trade-off. Run High effort for fast, economical agent loops, and reach for Max only on the hard problems that justify it — exactly the kind of cost-versus-quality tuning that matters when you're running thousands of agent sessions a day.

What you can build with GLM-5.2 on FriendliAI

GLM-5.2 is purpose-built for agents that operate over long horizons and large contexts. FriendliAI's high-performance Model API unlocks these capabilities at production scale, pay-per-token.

Autonomous coding agents

Point Claude Code, Kilo, Cursor, Cline, or any OpenAI-compatible coding agent at GLM-5.2 on FriendliAI and let it run. With Max effort and a 1M-token context, the agent can reason over an entire codebase, track cross-file dependencies, and carry a task through implementation, testing, and iteration in a single coherent session.

Repository-scale refactors

Load a mid-sized repo into one context window and let GLM-5.2 reason across it holistically — migrations, framework upgrades, large-scale refactors — without the accuracy loss that comes from chunking a project across many short calls.

Long-horizon, multi-tool agents

GLM-5.2's strength on MCP-Atlas and Tool-Decathlon makes it a strong engine for orchestration agents that fan out across many tools and APIs over long sessions, maintaining state and making sound decisions step after step.

Cost-sensitive agentic pipelines at scale

When you're running thousands of agent sessions a day, token economics decide what's viable. GLM-5.2's open-model pricing on FriendliAI — combined with effort-level control — lets you ship agentic features that would be cost-prohibitive on closed frontier APIs.

Why run GLM-5.2 on FriendliAI Model APIs

A frontier open model is only as good as the infrastructure serving it. GLM-5.2's MoE architecture, sparse-attention long-context path, and long agentic sessions all demand an inference stack engineered for exactly these patterns — which is what FriendliAI is built for.

With FriendliAI Model APIs, you get:

Serverless, pay-per-token access — no GPU provisioning, no capacity planning, no idle cost. Call GLM-5.2 like any API and pay only for the tokens you use.
2–5× faster output token speed — FriendliAI's inference engine is tuned for high-throughput MoE serving and efficient long-context decoding, so your agents finish sooner.
50–90% lower inference cost — more tokens per GPU means a better tokens-per-dollar number, on top of GLM-5.2's already open-model economics.
99.99% uptime SLA — production-grade reliability for agent workloads you can't afford to have stall.
OpenAI-compatible API — drop GLM-5.2 into your existing agent stack with a model-name change.

Get started with GLM-5.2 on FriendliAI Model APIs

GLM-5.2 is available now through FriendliAI Model APIs. You can plug it straight into your coding agent or call it directly from your own code.

Prerequisites

A FriendliAI account
A FriendliAI API key (create one in the Friendli Suite dashboard)

Use GLM-5.2 in your coding agent

GLM-5.2 works out of the box with Claude Code, Cline, Cursor, Kilo Code, OpenClaw, OpenCode, and Hermes — any agent that can point at a custom endpoint. Connect your agent to FriendliAI's Model API base URL, https://api.friendli.ai/serverless, and set the model to zai-org/GLM-5.2.

For example, to use GLM-5.2 in Claude Code, open your ~/.claude/settings.json file and add:

json

{
  "env": {
    "ANTHROPIC_AUTH_TOKEN": "<YOUR_FRIENDLIAI_API_KEY>",
    "ANTHROPIC_BASE_URL": "https://api.friendli.ai/serverless"
  },
  "model": "zai-org/GLM-5.2"
}

Save the file, open Claude Code, and send a prompt — your requests now run on GLM-5.2 via FriendliAI. For per-agent setup, see FriendliAI Docs › Use Your Agent with FriendliAI.

Take open frontier intelligence to production with FriendliAI

GLM-5.2 represents a new high-water mark for open-weight models — frontier-class coding and agentic reasoning, a usable 1M-token context, and an MIT license that puts you fully in control. FriendliAI is the platform to take it to production: fast, cost-efficient, reliable inference, available the moment the model launches.

👉 Start building with GLM-5.2 on FriendliAI today — call it through Model APIs in minutes!

Written by

FriendliAI Tech & Research

General FAQ

What is FriendliAI?

FriendliAI is a GPU-inference platform that lets you deploy, scale, and monitor large language and multimodal models in production, without owning or managing GPU infrastructure. We offer three things for your AI models: Unmatched speed, cost efficiency, and operational simplicity. Find out which product is the best fit for you in here.

How does FriendliAI help my business?

Our Friendli Inference allows you to squeeze more tokens-per-second out of every GPU. Because you need fewer GPUs to serve the same load, the true metric—tokens per dollar—comes out higher even if the hourly GPU rate looks similar on paper. View pricing

Which models and modalities are supported?

Over 570,000 text, vision, audio, and multi-modal models are deployable out of the box. You can also upload custom models or LoRA adapters. Explore models

Can I deploy models from Hugging Face directly?

Yes. A one-click deploy by selecting “Friendli Endpoints” on the Hugging Face Hub will take you to our model deployment page. The page provides an easy-to-use interface for setting up Friendli Dedicated Endpoints, a managed service for generative AI inference. Learn more about our Hugging Face partnership

Still have questions?

If you want a customized solution for that key issue that is slowing your growth, support@friendli.ai or click Talk to an engineer — our engineers (not a bot) will reply within one business day.