(function() { var utmInheritingDomain = "appstore.com", utmRegExp = /(&|\?)utm_[A-Za-z]+=[A-Za-z0-9]+/gi, links = document.getElementsByTagName("a"), utms = [ "utm_medium={{URL - utm_medium}}", "utm_source={{URL - utm_source}}", "utm_campaign={{URL - utm_campaign}}" ]; for (var index = 0; index < links.length; index += 1) { var tempLink = links[index].href, tempParts; if (tempLink.indexOf(utmInheritingDomain) > 0) { tempLink = tempLink.replace(utmRegExp, ""); tempParts = tempLink.split("#"); if (tempParts[0].indexOf("?") < 0 ) { tempParts[0] += "?" + utms.join("&"); } else { tempParts[0] += "&" + utms.join("&"); } tempLink = tempParts.join("#"); } links[index].href = tempLink; } }());

Friendli Container:
Serve your generative AI
in your private environment


FEATURES & BENEFITS

Accelerate AI inference with Friendli Container

Unlock the full potential of your generative AI services with cutting-edge inference optimizations. Deploy seamlessly in your own GPU environment or in your private cloud for blazing fast inference and reduced operational costs.

Friendli Container empowers you with full control over your model and data, meeting your privacy and security needs.

Friendli Container

Full control of your data

Built to meet privacy and security needs

Integration with internal systems

Save on huge GPU costs

SUPPORTED MODELS

Generative AI models with Container

The current version of Friendli Container supports all major generative language models,
including Llama 3.3, Mixtral, Mistral, MPT, Gemma, Command R+, and more!

AWS logo

Friendli Container on EKS

Simply deploy your model on Friendli Container in your EKS workflow with a single command!
Now you can find Friendli Dedicated Endpoints on AWS marketplace, making building and serving LLMs seamless and efficient.

Get started here

Customer stories

Conversational Chatbot

NextDay AI logo

NextDay AI's personalized character chatbot, ranked among the top 15 generative AI web products by a16z, processes over 3 trillion tokens monthly.

Read more
CHALLENGE

NextDay AI processes over 3 trillion tokens monthly, leading to high H100 GPU costs.

SOLUTION

By leveraging Friendli Container, NextDay AI effectively managed its traffic, achieving 3x higher LLM throughputs and over 50% GPU cost savings.

Conversational Chatbot

ScatterLab logo

ScatterLab's 'Zeta' ranked among the top 10 mobile applications for South Korean teenagers. Zeta users spend an average of 140 minutes daily, generating 800 million conversations monthly.

CHALLENGE

GPU costs take up to 70% of Scatter Lab's operational cost.

SOLUTION

Using Friendli Container, ScatterLab reduced its GPU cost by over 50%, and Friendli Inference's powerful optimization eliminated the need for additional optimization tests.


How to use Friendli Container

Friendli Container Graphic

Friendli Container enables you to effortlessly deploy your generative AI model on your own machine.

Visit our documentation to learn how to start with Friendli Container.


Frequently asked questions

How does the pricing for Friendli Container work?

Friendli Container offers a flexible pricing structure. Please contact us at sales@friendli.ai for a custom quote. You can also try the 60 days free trial to experience the full capabilities of Friendli Container. Serve your LLM model in your development environment without any charges.

Can I use Friendli Container for enterprise purposes?

Yes, Friendli Container offers an Enterprise version tailored to the needs of larger organizations. To access the Enterprise version and discuss pricing options, please contact our sales team at sales@friendli.ai.

Is my data secure when using Friendli Container?

Yes, ensuring the security and privacy of your data is our top priority. Friendli Container allows you to serve your LLM in a secure and private environment, safeguarding your sensitive information throughout the process. We adhere to industry-standard security protocols and continuously update our platform to address any potential vulnerabilities.

How much performance gain should I expect using Friendli Container?

Our engine provides 10x faster token generation and 5x faster initial response time compared to vLLM. The actual performance may change depending on your GPU, LLM model, and traffic. Please contact contact@friendli.ai to get help measuring your performance gain in your environment.

Experience superior inference performance
for all kinds of LLMs with Friendli Inference.

Learn more
Iteration Batching

Iteration Batching

Groundbreaking optimization technique developed by us

(Also known as Continuous Batching)

Friendli DNN Library

Friendli DNN Library

Optimized GPU kernels for generative AI

Friendli TCache

Friendli TCache

Intelligently reusing computational results

Native Quantization

Native Quantization

Efficient serving without compromising accuracy

EXPLORE FRIENDLI SUITE

Other ways to run generative AI
models with Friendli

Friendli Dedicated Endpoints

Build and run generative AI models on autopilot

Learn more

Friendli Serverless Endpoints

Call our fast and affordable API for open-source generative AI models

Learn more