December 2, 2025
2 min read

Enterprise Features Now Available on Friendli Dedicated Endpoints (Basic Plan)

TL;DR

FriendliAI’s basic plan now includes features formerly reserved for the Enterprise tier, such as real-time metrics and logs.
Users can now customize request-based autoscaling and host hundreds of LoRA adapters on a single endpoint with zero downtime.
All workloads benefit from SOC2-compliant infrastructure, providing high-level security to teams of all sizes at no extra cost.

Enterprise Features Now Available on Friendli Dedicated Endpoints (Basic Plan) thumbnail

At FriendliAI, our mission is to make high-performance AI inference effortless, scalable, and cost-efficient for every team. As a major step toward this, Friendli Dedicated Endpoints now make Enterprise-level features available to all Basic plan users.

Starting today, several capabilities previously exclusive to the Enterprise plan are now included in the Basic plan, such as full access to metrics and logs, request-count-based autoscaling, and support for attaching hundreds of LoRA adapters to each endpoint. These upgrades give smaller teams access to the same production-grade tooling previously reserved for large-scale deployments.

What’s New?

Observability with real-time performance metrics
Customizable request-count-based autoscaling
Request and response content logging (optional, opt-in)
Support for hundreds of LoRA adapters
SOC2 compliant infrastructure

Why Do These Changes Matter?

These upgrades give teams deeper visibility, smarter scaling, stronger security, and more flexibility when deploying models. Real-time metrics and logs give teams immediate visibility into system behavior, making it far easier to understand and resolve issues quickly. With request-count–based autoscaling, teams set their own queue thresholds, giving them fine-grained control over when and how their endpoints scale. Support for hundreds of LoRA adapters gives teams the flexibility to run more specialized and fine-tuned models on a single endpoint. And with SOC2-compliant infrastructure, teams can deploy these workloads with confidence, meeting modern security and governance standards.

Taken together, these upgrades make the Basic plan significantly more capable, giving teams access to the advanced tooling and operational controls used in large-scale deployments.

What Can Basic Users Do Now That They Couldn’t Before?

With this update, Basic users gain access to a set of capabilities that were previously reserved for Enterprise deployments. These features give teams far more operational control, observability, and scalability when running production-grade AI workloads.

1. View Full Metrics and Request Activity

Previously	Now
• Basic users had no visibility into these operational insights.	• Monitor real-time throughput, latency, token processed, and replica counts over time • Review request activity and troubleshoot issues more quickly

2. Adjust Autoscaling Control Parameters

Previously	Now
• Autoscaling could not be customized	• Users can now choose to adjust scaling based on how many requests are queued.

3. Support for many LoRA Adapters

Previously	Now
• Endpoints allowed only one LoRA adapter, which had to be fixed at creation time	• Run hundreds of LoRA adapters on a single endpoint • Add or remove adapters in real time with zero service disruption

4. Enable Request and Response Content Logging

Previously	Now
• This capability was not available on the Basic plan	• View specific request and response content (when explicitly enabled) • Get a clearer view of how the model is behaving • Spot and investigate requests that may require attention

Start Exploring the New Basic Plan Today

This upgrade makes it easier to:

Gain real-time observability with detailed performance metrics
Run a large number of LoRA adapters on a single endpoint
Exercise finer control over scaling
Understand how the model is behaving through request and response content

If you’re already on the Basic plan, these features are ready for you to explore.

👉 Checkout the full breakdown of updated Basic plan capabilities: https://friendli.ai/pricing.

Whether you're running large-scale inference for production AI products or exploring new models and configurations, these upgrades give you more capability at no additional cost. Experience FriendliAI’s enterprise-grade AI inference infrastructure, now expanded with Enterprise-level features accessible to Basic plan users.

Written by

FriendliAI Tech & Research

General FAQ

What is FriendliAI?

FriendliAI is a GPU-inference platform that lets you deploy, scale, and monitor large language and multimodal models in production, without owning or managing GPU infrastructure. We offer three things for your AI models: Unmatched speed, cost efficiency, and operational simplicity. Find out which product is the best fit for you in here.

How does FriendliAI help my business?

Our Friendli Inference allows you to squeeze more tokens-per-second out of every GPU. Because you need fewer GPUs to serve the same load, the true metric—tokens per dollar—comes out higher even if the hourly GPU rate looks similar on paper. View pricing

Which models and modalities are supported?

Over 520,000 text, vision, audio, and multi-modal models are deployable out of the box. You can also upload custom models or LoRA adapters. Explore models

Can I deploy models from Hugging Face directly?

Yes. A one-click deploy by selecting “Friendli Endpoints” on the Hugging Face Hub will take you to our model deployment page. The page provides an easy-to-use interface for setting up Friendli Dedicated Endpoints, a managed service for generative AI inference. Learn more about our Hugging Face partnership

Still have questions?

If you want a customized solution for that key issue that is slowing your growth, contact@friendli.ai or click Talk to an engineer — our engineers (not a bot) will reply within one business day.