April 15, 2026
8 min read

FriendliAI Now Supports Anthropic Messages API

TL;DR

API Compatibility: FriendliAI now supports the Anthropic Messages API across both Serverless and Dedicated Endpoints.
Seamless Transition: Developers can run open-weight models like GLM-5.1, MiniMax-M2.5, and DeepSeek-V3.2 on Claude-based applications without rewriting any of the logic — just the base URL and authentication token.
Significant Cost Reductions: Switching to open-weight models provides massive cost savings over Claude. This is especially relevant for coding agents, which are the largest driver of API costs in developer workflows.
Migration Incentive: FriendliAI is offering up to $50,000 in inference credits to help teams migrate away from Anthropic, OpenAI, or other inference providers.

FriendliAI Now Supports Anthropic Messages API thumbnail

Run open models like GLM, MiniMax, and Kimi through the API compatible with your Claude-based applications — on Serverless or Dedicated Endpoints, for a fraction of the cost.

The Anthropic Messages API has become one of the most widely adopted interfaces in AI development. It powers Anthropic models running on Claude Code, Kilo Code, and Cline, along with a growing ecosystem of coding agents and AI-powered applications. Developers send structured conversation turns to POST /v1/messages, get back structured responses, and build from there.

FriendliAI now supports the Messages API across both Serverless and Dedicated Endpoints. Developers can point any application that speaks this protocol at FriendliAI and run open-weight models — GLM-5.1, MiniMax-M2.5, Kimi-K2.5, DeepSeek-V3.2, and more — without rewriting a single line of application logic. And to make the switch easier, FriendliAI is offering up to $50,000 in inference credits for teams moving off Anthropic or other providers.

What the Messages API is, and why it matters

The Messages API is the protocol for building LLM-powered applications on top of the Claude platform. Developers can send a JSON payload with a list of conversational messages (each with a role and content), along with parameters like model, max_tokens, and system. The model returns the next assistant message. It supports multi-turn conversations, tool use, streaming, and extended thinking.

Unlike the OpenAI Chat Completions format, which was the first widely adopted LLM API standard, the Anthropic Messages API has its own conventions: stop reasons use terms like end_turn rather than stop, and the request shape differs in subtle but breaking ways.

Applications built for the Messages API aren’t compatible with an OpenAI endpoint without an adaptation layer. This matters because a large and growing number of developer tools are built to support this format – Claude Code is the obvious example. In addition to the OpenAI API, Kilo Code, Roo Code, and many coding agents communicate natively with the Messages API when running Claude models.

What this means for developers

FriendliAI’s support for the Messages API enables interoperability and freedom of choice. Any tool or application that makes requests to Anthropic's /v1/messages endpoint can be redirected to FriendliAI by changing the base URL and authentication token. The request and response formats are the same. Tool calling, streaming, reasoning, and prompt caching — with discounted pricing for cached prompts — are all supported.

There are two ways to use it, depending on your needs:

Serverless Endpoint

/serverless/v1/messages

Pick a model from the catalog and start sending requests immediately. It’s best for experimentation, development, and variable workloads.

Dedicated Endpoints

/dedicated/v1/messages

Deploy a model on reserved GPUs with an endpoint you control. The model field takes your endpoint ID. It’s best for production workloads where you need consistent latency, high throughput, and no shared resource contention.

In both cases, only three minor things change: the URL, the authentication header, and the model identifier. The messages field, the response shape, the streaming event format — all of them stay the same. Anthropic-specific fields like metadata, cache_control, and service_tier are accepted and parsed without error, so you don't need to strip these fields from your requests.

For tools that support a custom base URL — which include Kilo Code, Cline, and most open-source coding agents — the switch is a settings change, not a code change. Review this tutorial to learn how to use FriendliAI with Claude Code or your open-source coding agent.

The cost case for open models

Claude models are excellent, but also expensive. At the API level, Claude Sonnet 4.6 costs $3 per million input tokens and $15 per million output tokens. Claude Opus 4.6, the flagship, costs $5/$25. For developers using Claude Code heavily, monthly bills of $100–$200 on Max plans — or significantly more under direct API billing — are common. Open-weight models have closed the quality gap dramatically. Here's what's available on FriendliAI Serverless right now:

Table with columns: Model, Input / Output per 1M tokens, Strengths
Model	Input / Output per 1M tokens	Strengths
MiniMax-M2.5	$0.30 / $1.20	Agentic workflows, coding, strong benchmarks. Cached input at $0.06.
DeepSeek-V3.2	$0.50 / $1.50	Low-cost, general-purpose workhorse. Cached input at $0.25.
GLM-5.1	$1.40 / $4.40	Most intelligent open-weight model, according to benchmarks by Artificial Analysis. Strong at frontend, logic, scientific coding. Cached input at $0.26.

To put this in perspective: output tokens dominate the bill for most agent workflows, and that's where the gap is widest. Claude Sonnet 4.6 charges $15 per million output tokens. GLM-5.1, the most capable model on the list, charges $4.40 — less than a third of the price. MiniMax-M2.5 charges $1.20, which is over 12x cheaper. On the input side, the savings are smaller but still significant: GLM-5.1 at $1.40 vs. Sonnet's $3, MiniMax-M2.5 at $0.30 vs. Sonnet's $3.

The gap widens if you're on Claude Opus 4.6 at $5/$25. GLM-5.1's output rate is less than a fifth of Opus's, and MiniMax-M2.5 is over 20x cheaper. In a coding agent session with hundreds of round-trips, it's the difference between a manageable bill and a runaway one. The GLM model family has only become more capable and is approaching Opus-level intelligence on coding benchmarks. Here’s how GLM-5.1 compares to Claude Opus 4.6 and other models on SWE-Bench Pro, Terminal-Bench 2.0, and NL2Repo.

Why coding agents make this especially relevant

Coding agents are the single largest driver of API cost in developer workflows. Unlike a simple chat interaction, an agent session involves dozens or hundreds of round-trips: reading files, proposing edits, running tests, and iterating. A single prompt in a coding tool can trigger 5–30 calls to the model API behind the scenes. Context windows fill up fast, and the same large context gets re-sent on every turn.

This is where the economics of open models become compelling — and where Anthropic's ecosystem is getting more restrictive, not less. Claude Code only supports three classes of Claude models: Opus, Sonnet, and Haiku. And as of April 4, Anthropic cut off subscription-based access for third-party agent harnesses, pushing users of tools like OpenClaw to pay-as-you-go API rates. The direction is clear: if you build outside Anthropic's first-party tools, you pay full price.

Tools like Kilo Code offer a different path. With support for 500+ models that are also hosted on Friendli Dedicated Endpoints, they give you the freedom to route each task to the right model at the right price. Architecture decisions go to a strong reasoning model. Routine edits go to something fast and cheap. You match the model to the task instead of paying Opus rates for a docstring.

FriendliAI lets you run open-weight models on applications built with the Anthropic Messages API. Any tool that can point at a Claude-compatible endpoint can now use FriendliAI — on Serverless for quick experimentation, or on Dedicated for sustained production workloads. No adapter layer. No protocol translation. Just a URL change.

Beyond cost: what FriendliAI gives you

No provisioning barrier. On Serverless, there's nothing to set up. Pick a model, get a token, send a request. You can test whether GLM-5.1 or MiniMax-M2.5 works for your use case in minutes, not days.

No rate limit surprises. On Dedicated Endpoints, the GPU is yours. Anthropic's API has tiered rate limits that throttle you based on account level — and as recent events show, subscription terms can change overnight. On dedicated infrastructure, throughput is predictable and under your control.

Model choice that keeps pace. New open-weight models are released constantly. GLM-5.1, MiniMax-M2.5, and Kimi-K2.5 are today's options; next month, there will be more. On FriendliAI, you deploy what you want, when you want, without waiting for a single provider to add it to their lineup.

Your data, your infrastructure. For teams with compliance requirements, running inference on FriendliAI means your prompts and completions aren't flowing through a third party's shared API. You choose the isolation level that fits your needs.

Getting started

If you already have an application, agent, or tool that calls Anthropic's Messages API, here's how to try it on FriendliAI:

Option 1: Serverless (fastest start)

No provisioning required. Choose a supported model, generate a Friendli API KEY, and send requests.

bash

curl --request POST \
  --url https://api.friendli.ai/serverless/v1/messages \
  --header 'Authorization: Bearer flp_YOUR_API_KEY' \
  --header 'Content-Type: application/json' \
  --data '{ "model": "zai-org/GLM-5.1", "max_tokens": 1024, "system": "You are a senior software engineer.", "messages": [ { "role": "user", "content": "Review this pull request for security issues." } ] }'

Note that not all models hosted on Serverless Endpoints support the Messages API yet. For more information on how to use the Messages API on Serverless Endpoints, please refer to our API reference.

Option 2: Dedicated Endpoints (production)

Deploy a model on reserved GPU infrastructure through the FriendliAI dashboard. You'll receive an endpoint ID to use as your model identifier.

bash

curl --request POST \
  --url https://api.friendli.ai/dedicated/v1/messages \
  --header 'Authorization: Bearer flp_YOUR_API_KEY' \
  --header 'Content-Type: application/json' \
  --data '{ "model": "YOUR_ENDPOINT_ID", "max_tokens": 1024, "system": "You are a senior software engineer.", "messages": [ { "role": "user", "content": "Review this pull request for security issues." } ] }'

For more information on how to use the Messages API on Dedicated Endpoints, please refer to our API reference.

For coding agents such as Kilo Code or Cline, look for the "Custom Base URL" option in your API provider settings. Point it at the appropriate FriendliAI URL, enter your API key, and select your model. That's it.

The Anthropic Messages API established a strong protocol for building with LLMs. FriendliAI's support for that protocol means you're no longer choosing between the API format your tools expect and the models and infrastructure you actually want. Run open models at open-model prices, through the interface your stack already understands.

Switch to FriendliAI with Messages API

The Messages API on FriendliAI is now available in beta across Serverless and Dedicated Endpoints. If you're currently building on the Claude API, FriendliAI is offering up to $50,000 in inference credits to help you make the switch. Credits are based on your current spend and apply to both Serverless and Dedicated Endpoints. No migration required before approval — just submit your current provider bill and our team will review your credit amount. Apply here.

Written by

FriendliAI Tech & Research

General FAQ

What is FriendliAI?

FriendliAI is a GPU-inference platform that lets you deploy, scale, and monitor large language and multimodal models in production, without owning or managing GPU infrastructure. We offer three things for your AI models: Unmatched speed, cost efficiency, and operational simplicity. Find out which product is the best fit for you in here.

How does FriendliAI help my business?

Our Friendli Inference allows you to squeeze more tokens-per-second out of every GPU. Because you need fewer GPUs to serve the same load, the true metric—tokens per dollar—comes out higher even if the hourly GPU rate looks similar on paper. View pricing

Which models and modalities are supported?

Over 580,000 text, vision, audio, and multi-modal models are deployable out of the box. You can also upload custom models or LoRA adapters. Explore models

Can I deploy models from Hugging Face directly?

Yes. A one-click deploy by selecting “Friendli Endpoints” on the Hugging Face Hub will take you to our model deployment page. The page provides an easy-to-use interface for setting up Friendli Dedicated Endpoints, a managed service for generative AI inference. Learn more about our Hugging Face partnership

Still have questions?

If you want a customized solution for that key issue that is slowing your growth, support@friendli.ai or click Talk to an engineer — our engineers (not a bot) will reply within one business day.