December 19, 2025
6 min read

MCP: Ushering in the Era of AI Agents

TL;DR

The Model Context Protocol (MCP) acts as a "universal adapter," replacing brittle code with a standard contract for AI agents to connect with data and tools.
Built on JSON-RPC 2.0, MCP enables dynamic tool discovery and real-time, bi-directional communication between models and external systems.
Reliable AI agents require precise, schema-faithful data; FriendliAI provides the high-performance infrastructure needed to prevent malformed tool calls.
The FriendliAI Edge: By offering optimized GPU kernels and 99.99% uptime, FriendliAI ensures MCP-powered agents are production-ready with low latency and structured accuracy.

MCP: Ushering in the Era of AI Agents thumbnail

FriendliAI is building AI inference infrastructure for reliable, production-grade AI agents, enabling precise, predictable, and structured outputs at scale. As agentic systems grow more complex, consistent machine-readable responses become foundational where FriendliAI plays a key role in the emerging MCP ecosystem.

The Model Context Protocol (MCP) is at the forefront of accelerating the AI agents industry, marking 2025 as the year of Agentic AI. In this blog, we will explore the technical details of MCP, including its specifications, architecture, benefits, and various use cases. We will also examine the foundational capability that makes MCP possible—Structured Outputs—and explain how FriendliAI’s proven expertise in generating robust, reliable Structured Outputs uniquely positions it as an optimal provider of MCP-powered services.

What is MCP?

The Model Context Protocol (MCP), introduced by Anthropic in late 2024, is an emerging open standard intended to facilitate the seamless integration of artificial intelligence applications with external data sources and tools. This protocol addresses the inherent complexities associated with interfacing Large Language Models (LLMs) with external systems, thereby ensuring a streamlined and efficacious integration procedure.

Essentially, MCP acts as a “universal adapter” for AI applications, enabling them to access data repositories, APIs, and tools without bespoke solutions for each integration and allowing AI systems to retrieve information and perform actions dynamically. MCP provides a standard for sharing contextual information, exposing tools and capabilities, and building composable integrations and workflows.

Why MCP Exists: The Integration Problem Behind AI Agents

As AI systems evolve from single-turn chatbots into autonomous, multi-step agents, the hardest problem is no longer model quality—it’s integration. Modern agents must reason over proprietary data, invoke tools safely, and act across heterogeneous systems. Until recently, this integration layer was built with brittle, bespoke glue code: hand-written function schemas, prompt conventions, and one-off APIs that broke as soon as tools or models changed. The Model Context Protocol (MCP) addresses this problem directly by providing an open, standardized contract between large language models and the external systems they must interact with.

What MCP Enables That Was Previously Infeasible

MCP externalizes tools, data, and actions into discoverable servers. This enables dynamic tool discovery, composable workflows, stateful and secure interactions, and interoperability across models and environments. MCP turns tool use from an application concern into infrastructure.

Architecture of MCP

MCP follows a client-server architecture with four primary components:

Hosts orchestrate user interaction and reasoning
Clients manage sessions, permissions, and security
Servers expose tools, resources, and prompts
A JSON-RPC transport layer enables communication

This separation is critical for safety, auditability, and scale.

Component	Description	Role
MCP Hosts	LLMs that interact with users and initiate connections	Manages user inputs and requests
MCP Clients	Connection managers integrated into host applications	Handles communication with MCP servers
MCP Servers	Lightweight servers exposing specific functionalities	Provides tools and data access
Base Protocol (Transport Layer)	Mechanism for communication (STDIO or Streamable HTTP)	Handles message exchange, request/response linking, and abstracted communication behaviors

Figure 1: Core components of MCP. [Online]. Available: https://modelcontextprotocol.io/specification/2025-11-25. Accessed: Dec. 18, 2025.

Figure 2: Diagram of the core components and its interactions within MCP. [Online]. Available: https://modelcontextprotocol.io/specification/2025-11-25/architecture. Accessed: Dec. 18, 2025.

Each component plays a unique role in ensuring smooth interactions between AI applications and external systems.

Hosts

Hosts are LLM applications, like chatbots or IDEs, that initiate connections to servers. The host process acts as the container and coordinator:

Oversees client connections and permissions, including lifecycle management
Manages and integrates LLM sampling
Coordinates context aggregation for multiple clients
Creates and manages multiple client instances
Enforces security policies and consent requirements
Handles user authorization and access control

Clients

Clients are created by the host and maintain stateful one-to-one connections with servers:

Establishes and manages a single stateful session for each server
Bidirectionally routes protocol messages
Handles protocol negotiation and capability exchange
Manages subscriptions and notifications
Maintains security boundaries between servers

In addition, clients can implement more features to further enrich the MCP servers.

Roots

Clients can expose filesystem “roots” to servers, defining where servers can access within the given boundaries of the client's filesystem. This enables more structured interaction, ensuring that servers understand which directories and files they can access.

Roots are generally presented to users through workspace or project configuration interfaces, where users can select directories or files that should be accessible by the server. Clients must declare the roots capability during initialization, and if they support root list changes, they notify the server whenever the list of roots changes.

MCP specifies messages like roots/list for retrieving available roots, and notifications/roots/list_changed to notify servers of changes. Each root is defined by a uri (a file URI) and an optional human-readable name.

For security, clients must ensure proper access control and validation, while servers should respect root boundaries during operations. The "roots" feature is essential for controlled and secure interaction between clients and servers within MCP.

Sampling

The "sampling" feature allows servers to request generative tasks, such as completions or content generation, from language models via clients. This feature allows servers to leverage AI capabilities without needing API keys, while clients maintain control over model access and selection.

MCP facilitates the request for text, image, or audio-based outputs and includes mechanisms for ensuring trust and safety. Notably, the system promotes a human-in-the-loop model, requiring users to review and approve sampling requests and results.

To utilize this feature, clients must declare the sampling capability. Servers initiate sampling via the sampling/createMessage request, including a prompt and model preferences. The process ensures that responses are vetted by the user before final delivery, enhancing security and reliability.

Servers

Servers are the fundamental building blocks of MCP, providing specialized context and capabilities:

Expose resources, tools and prompts via MCP primitives
Function independently with dedicated responsibilities
Initiate sampling via client interfaces
Adhere to security restrictions
Operate as local processes or remote services

The servers consist of primitives that enable rich interactions between clients, servers, and LLMs, which can be categorized as below:

Primitive	Control	Description	Example
Prompts	User-controlled	Interactive templates invoked by user choice	Menu options
Resources	Application-controlled	Contextual data attached and managed by the client	Files, Git history
Tools	Model-controlled	Functions exposed to the LLM to take actions	API requests, writing files

Figure 3: Primitives of MCP Servers. [Online]. Available: https://modelcontextprotocol.io/specification/2025-11-25/server. Accessed: Dec. 18, 2025.

Prompts

The "prompts" enable servers to expose prompt templates to clients, allowing for structured interaction with language models. Prompts typically include instructions or templates that guide how clients should interact with the model, such as generating code reviews or answering specific questions.

Servers supporting prompts must declare the prompts capability during initialization and can notify clients about changes in available prompts through listChanged. Clients can list available prompts via the prompts/list request and retrieve specific prompts using the prompts/get request, often passing arguments to customize them.

The system promotes a user-driven approach, where prompts are exposed in user interfaces, such as through slash commands. This flexibility allows clients to present prompts in a manner that fits their user experience needs, while ensuring seamless model interactions.

Resources

The "resources" allow servers to expose data that provides context to language models, such as files, application data, or database schemas. This feature is critical for enabling AI systems to access external knowledge when generating responses.

Servers supporting resources must declare the resources capability, with optional features like subscribe for client notifications and listChanged to notify clients of resource updates. Clients can discover resources through the resources/list request and retrieve their contents via resources/read.

Additionally, resource templates enable the dynamic exposure of parameterized resources, while servers can use listChanged notifications to keep clients informed about resource changes. This feature enhances model context by facilitating seamless interaction between clients and servers.

Tools

The "tools" allow servers to expose external tools that can be invoked by language models. These tools enable models to interact with external systems, such as databases, APIs, or computational services, enriching the model's responses by providing additional capabilities.

Servers that support tools must declare the tools capability. Clients can discover available tools through the tools/list request, which supports pagination. To invoke a tool, clients send a tools/call request, specifying the tool name and necessary input arguments. The server responds with the tool's result, which may include text, image, or other content types.

Additionally, when the list of available tools changes, servers can notify clients via the tools/list_changed notification. For trust and safety, it’s recommended that users be involved in the tool invocation process, ensuring they are aware of which tools are being used. This human-in-the-loop approach helps maintain security and control over tool usage.

Utilities

The "utilities" feature provides optional capabilities that enhance server functionality. This includes tools like completion handling, logging, and pagination, which can be crucial for improving user experience and operational efficiency.

1. Completion: This utility helps servers handle and return completion requests, ensuring efficient interaction with language models for generating outputs.

2. Logging: Servers can implement logging functionality, recording interactions for auditing, troubleshooting, or performance tracking purposes.

3. Pagination: This utility assists in managing large datasets, enabling servers to paginate results, improving the scalability and performance of requests that involve extensive data retrieval.

These utilities are not mandatory but offer flexibility for servers looking to provide more robust features and better user experiences within the MCP ecosystem.

Base Protocol (Transport Layer)

The transport layer is how these components communicate over various transport layers (e.g., STDIO or Streamable HTTP). It uses JSON-RPC message format encoded in UTF-8 and has a stateful session.

Transports

MCP defines the communication mechanisms used between clients and servers. MCP currently supports two standard transport methods:

stdio: This transport involves communication through standard input and output. The client launches the server as a subprocess, sending and receiving JSON-RPC messages via stdin and stdout. This method is ideal for local interactions where both client and server are on the same machine.
Streamable HTTP: This transport allows servers to handle multiple client connections through HTTP POST and GET requests. It also supports server-to-client notifications via Server-Sent Events (SSE), enabling dynamic interactions. This transport is used for more scalable, network-based communications.

Both transports support JSON-RPC with UTF-8 for message encoding, and custom transports can also be implemented for specific needs.

Authorization

MCP secures interactions between clients and servers by specifying an authorization flow for accessing restricted resources. The authorization mechanisms are built on OAuth 2.1, with additional provisions for both public and confidential clients.

When an MCP client attempts to access a protected resource, the server responds with a 401 Unauthorized status if the client has not yet proven its authorization. The client then initiates the OAuth 2.1 authorization flow, using the Proof Key for Code Exchange (PKCE) method for public clients. This flow involves a series of steps where the client generates a code challenge and receives an access token after successful authorization by the resource owner.

OAuth 2.1 Authorization Flow: MCP’s authorization process is based on OAuth 2.1, ensuring compatibility with widely recognized security standards.
Server Metadata Discovery: Clients can discover the server's authorization metadata via the OAuth 2.0 Metadata protocol, simplifying integration and enhancing security.

This ensures that only authorized clients can access sensitive resources while maintaining high security standards.

Lifecycle Management

MCP defines a structured lifecycle for client-server connections:

Initialization Phase: Clients negotiate protocol versions with servers. Exchange capability information (tools/resources/prompts).
Operation Phase: Normal communication occurs based on negotiated capabilities.
Shutdown Phase: Graceful termination ensures clean disconnection

1. Initialization

The lifecycle begins with the initialization phase, where the client and server agree on the protocol version and exchange capabilities. The client initiates this phase by sending an initialize request that includes supported protocol versions, capabilities, and client details. Upon receiving this, the server responds with its own supported capabilities and version. Once this negotiation completes, the client sends an initialized notification to signal readiness for normal operations.

Figure 4: Diagram showing capability negotiation in MCP. [Online]. Available: https://modelcontextprotocol.io/specification/2025-11-25/architecture. [Accessed Dec. 18, 2025].

2. Operation

During the operation phase, the client and server engage in regular protocol communication, exchanging messages based on the negotiated capabilities. Both sides must respect the agreed-upon version and capabilities.

3. Shutdown

When the communication is complete, the shutdown phase gracefully terminates the connection. The client sends a disconnect request, closing the connection between both parties.

This lifecycle ensures a robust, reliable client-server interaction framework with clear phases for capability and version negotiation, operational communication, and safe termination.

Utilities

MCP offers several utilities designed to extend and enhance its base functionality of the transport layer. These utilities provide additional capabilities that simplify and streamline interactions between clients and servers. The key utilities are:

Ping: This utility allows clients and servers to verify connectivity and ensure the protocol is operational. It is commonly used to check the health of the connection and confirm that both parties are still active.
Cancellation: This feature enables clients to cancel operations if they are no longer needed or if the context changes. It is important for managing resource usage efficiently and ensuring that unnecessary work is not performed.
Progress: The progress utility provides real-time updates on long-running operations, keeping clients informed of the task's progress. This helps in managing user expectations and improving the overall user experience.

These utilities are optional, but they greatly enhance the MCP’s robustness and flexibility, providing tools to maintain communication, optimize resource usage, and improve interactivity.

Advantages of MCP

Without the assistance of an MCP, developers were burdened with the tedious and error-prone task of manually documenting the capabilities and constraints of their available tools. This process involved meticulously crafting descriptions for each tool, outlining its intended functionality, input requirements, expected outputs, and potential limitations. Furthermore, developers had to ensure that this documentation remained up-to-date as tools evolved or new tools were introduced, adding another layer of complexity to the maintenance overhead.

In contrast, MCP provides a universal framework for communication between LLMs and external tools, offering plug-and-play integration that reduces development time and complexity. MCP’s dynamic tool discovery and context aggregation mechanisms streamline the process, making it easier to scale AI systems without increasing integration effort.

Standardized Integration: MCP replaces fragmented integrations with a single protocol for all tools. It uses JSON-RPC 2.0 as its messaging format for requests, responses, and notifications.
Dynamic Tool Discovery: Automatically identifies available tools and resources during initialization. Simplifies configuration compared to traditional APIs.
Real-Time Bi-Directional Communication: Supports dynamic updates to ensure real-time data exchange. Enables bidirectional communication between clients and servers.
Context Awareness: Built-in mechanisms for aggregating context across multiple clients. Ensures AI models operate with relevant data.
Scalability: Plug-and-play expansion allows easy integration of new tools, without linear effort required, which comes with lifecycle management for secure connections.

By simplifying integration efforts, MCP reduces development time, improves scalability, enhances security, and streamlines debugging processes. Its advantages over traditional function calling—flexibility, interoperability, and long-term sustainability—make it a promising foundation for the next generation of autonomous AI agents.

Potential Advancements and Future Directions of MCP

The future of MCP holds immense potential, with several promising advancements on the horizon:

Service Discovery: Establish a mechanism for clients to discover and connect to remote MCP servers.
Stateless Operations: Enable support for serverless environments.
Complex Agentic Workflows
- Hierarchical Agent Systems: Enhance support for trees of agents through namespacing and topology awareness.
- Interactive Workflows: Improve handling of user permissions and information requests across agent hierarchies, including mechanisms to send output to users instead of models.
- Streaming Results: Deliver real-time updates from long-running agent operations.
Scalable MCP Server Deployment & Maintenance
Enhanced Security Features: Implement advanced security protocols, including sandboxing, to ensure the safe exchange of sensitive data between AI models and external systems.
Integration with Multimodal AI: As AI systems evolve to handle multiple types of data, including images, text, audio, and video, MCP is positioned to play a crucial role. Future iterations of MCP could support seamless integration across diverse modalities, enabling developers to create richer, more robust AI systems. By facilitating the exchange of data across multiple AI models, MCP could open doors to the next generation of AI-powered applications, from autonomous vehicles to advanced medical diagnostics.
Edge AI and IoT Applications: Extend MCP's capabilities to embedded systems.
Ethical Governance: Form a multi-company consortium to establish ethical standards for MCP's evolution.

Significance of MCP in the AI Agents Industry

Model Context Protocol (MCP) is transforming the AI agents industry by enabling seamless integration between AI models and external systems. This plays a pivotal role in the emergence of Agentic AI, where AI systems evolve from isolated chatbots into context-aware, interoperable agents capable of performing complex tasks in real-world environments. MCP addresses fundamental limitations of standalone AI models, such as restricted context awareness and inability to act on external data, thereby unlocking new possibilities for automation and intelligent decision-making.

According to Andreessen Horowitz (a16z), a leading venture capital firm, the MCP market is already rapidly expanding worldwide.

Figure 5: MCP Market Map. [Online]. Available: https://a16z.com/a-deep-dive-into-mcp-and-the-future-of-ai-tooling. Accessed: Dec. 18, 2025.

Example Applications

The increasing use cases of both official and community-developed MCP servers and clients for a wide range of applications are contributing to the burgeoning of the AI agents industry.

Airbnb
AWS KB
Blender
Brave Search
Claude Desktop App
Cline
Cursor
Docker
Figma
Firecrawl
Git
GitHub
Gmail
Google Drive
Google Maps
Kubernetes
LinkedIn
Notion
PostgreSQL
Redis
Shopify
Slack
Spotify
Sqlite
Stripe
Weaviate
… and many more!

The continued development of AI agents and MCP infrastructure promises to unlock even more innovative and impactful applications across various industries. As AI agents become more sophisticated and integrated into our daily lives, we can anticipate a future where they seamlessly augment human capabilities, automate mundane tasks, and enable us to achieve new levels of productivity, creativity, and efficiency.

Hands-On Example: Querying Documentation with MCP

This example shows how MCP can query external sources to answer questions. We demonstrate retrieving instructions for creating an API token for Serverless Endpoints.

Save and run the code to try it yourself:

shell

uv run example.py

python

# /// script
# requires-python = ">=3.13"
# dependencies = [
#     "friendli>=0.12.2",
#     "mcp>=1.24.0",
#     "uvloop>=0.22.1",
# ]
# ///
import json
import os

import uvloop
from mcp.client.session import ClientSession
from mcp.client.streamable_http import streamable_http_client
from mcp.types import TextContent

from friendli import AsyncFriendli
from friendli.models import Function, Tool

FRIENDLI_TOKEN: str = os.environ["FRIENDLI_TOKEN"]
MODEL: str = "zai-org/GLM-4.6"
QUESTION: str = "How can I create an API token to use Friendli Serverless Endpoints?"


async def main() -> None:
    async with streamable_http_client(url="https://friendli.ai/docs/mcp") as (r, w, _):
        async with ClientSession(read_stream=r, write_stream=w) as mcp:
            # Initialize MCP session
            _ = await mcp.initialize()

            # Convert MCP tool schema to OpenAI-compatible tool schema
            tools: list[Tool] = [
                Tool(
                    function=Function(
                        name=t.name,
                        description=t.description,
                        parameters=t.inputSchema or {},
                    )
                )
                for t in (await mcp.list_tools()).tools
            ]

            async with AsyncFriendli(token=FRIENDLI_TOKEN) as client:
                # Send initial user message
                messages = [
                    {"role": "system", "content": "You are a helpful assistant."},
                    {"role": "user", "content": QUESTION},
                ]

                res = await client.serverless.chat.complete(
                    model=MODEL,
                    messages=messages,
                    tools=tools,
                    chat_template_kwargs={"enable_thinking": True},
                    parse_reasoning=True,
                )

                # Handle tool calls
                if res.choices[0].message.tool_calls:
                    for tc in res.choices[0].message.tool_calls:
                        # Call MCP tool
                        result = await mcp.call_tool(
                            tc.function.name,
                            json.loads(s=tc.function.arguments),
                        )
                        print(f"\nTool result: {result.content}")

                        # Format tool result as content
                        content = "\n".join(
                            i.text for i in result.content if isinstance(i, TextContent)
                        )
                        messages.extend(
                            [
                                {
                                    "role": "assistant",
                                    "tool_calls": [
                                        {
                                            "id": tc.id,
                                            "type": "function",
                                            "function": {
                                                "name": tc.function.name,
                                                "arguments": tc.function.arguments,
                                            },
                                        }
                                    ],
                                },
                                {
                                    "role": "tool",
                                    "tool_call_id": tc.id,
                                    "content": content,
                                },
                            ]
                        )

                    res = await client.serverless.chat.complete(
                        model=MODEL,
                        messages=messages,
                        tools=tools,
                        chat_template_kwargs={"enable_thinking": True},
                        parse_reasoning=True,
                    )

                # Print final assistant message
                if res.choices[0].message.content:
                    print(f"\nAssistant: {res.choices[0].message.content}")


if __name__ == "__main__":
    uvloop.run(main())

This example demonstrates how MCP enables AI agents to fetch and integrate information from external resources dynamically.

The Backbone of MCP: Structured Output

At the heart of the Model Context Protocol (MCP) lies the concept of Structured Output, which is essential for enabling the seamless integration of AI models with external systems and tools. Structured output ensures that the data generated by AI systems follows a consistent and predictable format, which is critical for applications requiring precision, accuracy, and interoperability.

In MCP, the ability to deliver structured output underpins many of its advanced features, including dynamic tool discovery, real-time communication, and context aggregation. Without well-organized, consistent output, these features would be ineffective or unreliable, as they depend on having clear, defined structures to interact with external systems and make sense of data.

Why Structured Output Is the Real Bottleneck

Most agent failures stem from malformed tool arguments, schema violations, and inconsistent outputs. Structured output is not an implementation detail—it is the foundation that makes MCP viable in production.

Where MCP Breaks Without Strong Inference Infrastructure

Agent workflows require deterministic, low-latency, schema-faithful inference. Inference quality directly affects correctness, cost, and reliability.

FriendliAI Advantages

FriendliAI is purpose-built for structured output, making it a strong fit for MCP integrations. Its robust architecture produces accurate, well-structured outputs that are easy for downstream systems and tools to consume, enabling high-quality, context-aware AI interactions with greater efficiency.

By delivering consistent, actionable structured data, FriendliAI allows developers to build AI agents that seamlessly integrate with external data sources, APIs, and tools, unlocking richer functionality across AI-driven workflows.

In addition, FriendliAI delivers industry-leading inference performance and production-grade reliability:

High Throughput & Low Latency

Optimized GPU kernels
Advanced batching and scheduling algorithms
50%+ cost reduction with Online Quantization and Multi-LoRA Adapters

Production-Grade Reliability

Enterprise-grade 99.99% uptime and SLAs
Customizable, request-based autoscaling
Full logs and metrics for end-to-end observability
Globally geo-distributed infrastructure
SOC 2 certified

Flexible Deployment Options

Serverless Endpoints: Instant usage with no infrastructure setup
Dedicated Endpoints: Exclusive access to high-demand GPUs
Container: Run on your public cloud or on-premises clusters

Together, FriendliAI and MCP provide a powerful combination of speed, accuracy, and scalability, enabling AI agents to operate more effectively and at higher levels of performance.

Conclusion

The Model Context Protocol marks a paradigm shift in the AI agents industry by establishing a de facto industry standard for seamlessly integrating models with external systems. Its ability to simplify complex integrations while maintaining scalability makes it indispensable in modern software development.

As adoption grows across industries, MCP is poised to become the foundation of interoperable AI ecosystems, bridging the gap between LLMs and the real-world data they need to thrive.

At FriendliAI, we strive to be at the forefront of this transformative shift. Our platform’s superior performance, reliability, and ease of integration with MCP empowers developers and organizations to harness the full potential of AI agents. By combining fast inference, robust structured outputs, and scalable architecture, we are committed to helping our users build the next generation of intelligent, context-aware AI systems. Together with MCP, FriendliAI is shaping the future of AI-powered applications and helping to unlock new possibilities for businesses and industries worldwide.

References

[1] Model Context Protocol, “Model Context Protocol Specification,” 2025. [Online]. Available: https://modelcontextprotocol.io/specification/2025-11-25. Accessed: Dec. 18, 2025.

[2] Model Context Protocol, “MCP Architecture,” 2025. [Online]. Available: https://modelcontextprotocol.io/specification/2025-11-25/architecture. Accessed: Dec. 18, 2025.

[3] Model Context Protocol, “MCP Server Features,” 2025. [Online]. Available: https://modelcontextprotocol.io/specification/2025-11-25/server. Accessed: Dec. 18, 2025.

[4] Andreessen Horowitz, “A Deep Dive into MCP and the Future of AI Tooling,” Andreessen Horowitz, 2025. [Online]. Available: https://a16z.com/a-deep-dive-into-mcp-and-the-future-of-ai-tooling. Accessed: Dec. 18, 2025.

Written by

FriendliAI Tech & Research

General FAQ

What is FriendliAI?

FriendliAI is a GPU-inference platform that lets you deploy, scale, and monitor large language and multimodal models in production, without owning or managing GPU infrastructure. We offer three things for your AI models: Unmatched speed, cost efficiency, and operational simplicity. Find out which product is the best fit for you in here.

How does FriendliAI help my business?

Our Friendli Inference allows you to squeeze more tokens-per-second out of every GPU. Because you need fewer GPUs to serve the same load, the true metric—tokens per dollar—comes out higher even if the hourly GPU rate looks similar on paper. View pricing

Which models and modalities are supported?

Over 520,000 text, vision, audio, and multi-modal models are deployable out of the box. You can also upload custom models or LoRA adapters. Explore models

Can I deploy models from Hugging Face directly?

Yes. A one-click deploy by selecting “Friendli Endpoints” on the Hugging Face Hub will take you to our model deployment page. The page provides an easy-to-use interface for setting up Friendli Dedicated Endpoints, a managed service for generative AI inference. Learn more about our Hugging Face partnership

Still have questions?

If you want a customized solution for that key issue that is slowing your growth, contact@friendli.ai or click Talk to an engineer — our engineers (not a bot) will reply within one business day.