• December 2, 2025
  • 2 min read

Enterprise Features Now Available on Friendli Dedicated Endpoints (Basic Plan)

Enterprise Features Now Available on Friendli Dedicated Endpoints (Basic Plan) thumbnail

At FriendliAI, our mission is to make high-performance AI inference effortless, scalable, and cost-efficient for every team. As a major step toward this, Friendli Dedicated Endpoints now make Enterprise-level features available to all Basic plan users.

Starting today, several capabilities previously exclusive to the Enterprise plan are now included in the Basic plan, such as full access to metrics and logs, request-count-based autoscaling, and support for attaching hundreds of LoRA adapters to each endpoint. These upgrades give smaller teams access to the same production-grade tooling previously reserved for large-scale deployments.

What’s New?

  • Observability with real-time performance metrics
  • Customizable request-count-based autoscaling
  • Request and response content logging (optional, opt-in)
  • Support for hundreds of LoRA adapters
  • SOC2 compliant infrastructure

Why Do These Changes Matter?

These upgrades give teams deeper visibility, smarter scaling, stronger security, and more flexibility when deploying models. Real-time metrics and logs give teams immediate visibility into system behavior, making it far easier to understand and resolve issues quickly. With request-count–based autoscaling, teams set their own queue thresholds, giving them fine-grained control over when and how their endpoints scale. Support for hundreds of LoRA adapters gives teams the flexibility to run more specialized and fine-tuned models on a single endpoint. And with SOC2-compliant infrastructure, teams can deploy these workloads with confidence, meeting modern security and governance standards.

Taken together, these upgrades make the Basic plan significantly more capable, giving teams access to the advanced tooling and operational controls used in large-scale deployments.

What Can Basic Users Do Now That They Couldn’t Before?

With this update, Basic users gain access to a set of capabilities that were previously reserved for Enterprise deployments. These features give teams far more operational control, observability, and scalability when running production-grade AI workloads.

1. View Full Metrics and Request Activity

PreviouslyNow
• Basic users had no visibility into these operational insights.• Monitor real-time throughput, latency, token processed, and replica counts over time

• Review request activity and troubleshoot issues more quickly


2. Adjust Autoscaling Control Parameters

PreviouslyNow
• Autoscaling could not be customized• Users can now choose to adjust scaling based on how many requests are queued.


3. Support for many LoRA Adapters

PreviouslyNow
• Endpoints allowed only one LoRA adapter, which had to be fixed at creation time• Run hundreds of LoRA adapters on a single endpoint

• Add or remove adapters in real time with zero service disruption


4. Enable Request and Response Content Logging

PreviouslyNow
• This capability was not available on the Basic plan• View specific request and response content (when explicitly enabled)

• Get a clearer view of how the model is behaving

• Spot and investigate requests that may require attention


Start Exploring the New Basic Plan Today

This upgrade makes it easier to:

  • Gain real-time observability with detailed performance metrics
  • Run a large number of LoRA adapters on a single endpoint
  • Exercise finer control over scaling
  • Understand how the model is behaving through request and response content

If you’re already on the Basic plan, these features are ready for you to explore.

👉 Checkout the full breakdown of updated Basic plan capabilities: https://friendli.ai/pricing.

Whether you're running large-scale inference for production AI products or exploring new models and configurations, these upgrades give you more capability at no additional cost. Experience FriendliAI’s enterprise-grade AI inference infrastructure, now expanded with Enterprise-level features accessible to Basic plan users.


Written by

FriendliAI Tech & Research


Share


General FAQ

What is FriendliAI?

FriendliAI is a GPU-inference platform that lets you deploy, scale, and monitor large language and multimodal models in production, without owning or managing GPU infrastructure. We offer three things for your AI models: Unmatched speed, cost efficiency, and operational simplicity. Find out which product is the best fit for you in here.

How does FriendliAI help my business?

Our Friendli Inference allows you to squeeze more tokens-per-second out of every GPU. Because you need fewer GPUs to serve the same load, the true metric—tokens per dollar—comes out higher even if the hourly GPU rate looks similar on paper. View pricing

Which models and modalities are supported?

Over 380,000 text, vision, audio, and multi-modal models are deployable out of the box. You can also upload custom models or LoRA adapters. Explore models

Can I deploy models from Hugging Face directly?

Yes. A one-click deploy by selecting “Friendli Endpoints” on the Hugging Face Hub will take you to our model deployment page. The page provides an easy-to-use interface for setting up Friendli Dedicated Endpoints, a managed service for generative AI inference. Learn more about our Hugging Face partnership

Still have questions?

If you want a customized solution for that key issue that is slowing your growth, contact@friendli.ai or click Talk to an engineer — our engineers (not a bot) will reply within one business day.


Explore FriendliAI today