Friendli Container:
Serve your generative AI
in your private environment

FEATURES & BENEFITS

Accelerate AI inference with Friendli Container

Unlock the full potential of your generative AI services with cutting-edge inference optimizations. Deploy seamlessly in your own GPU environment or in your private cloud for blazing fast inference and reduced operational costs.

Friendli Container empowers you with full control over your model and data, meeting your privacy and security needs.

Full control of your data

Built to meet privacy and security needs

Integration with internal systems

Save on huge GPU costs

SUPPORTED MODELS

Generative AI models with Container

The current version of Friendli Container supports all major generative language models,
including Llama 3.3, Mixtral, Mistral, MPT, Gemma, Command R+, and more!

View the full model list

Friendli Container on EKS

Simply deploy your model on Friendli Container in your EKS workflow with a single command!
Now you can find Friendli Dedicated Endpoints on AWS marketplace, making building and serving LLMs seamless and efficient.

Get started here

Customer stories

Conversational Chatbot

NextDay AI's personalized character chatbot, ranked among the top 15 generative AI web products by a16z, processes over 3 trillion tokens monthly.

CHALLENGE

NextDay AI processes over 3 trillion tokens monthly, leading to high H100 GPU costs.

SOLUTION

By leveraging Friendli Container, NextDay AI effectively managed its traffic, achieving 3x higher LLM throughput and over 50% GPU cost savings.

Conversational Chatbot

ScatterLab's 'Zeta' ranked among the top 10 mobile applications for South Korean teenagers. Zeta users spend an average of 140 minutes daily, generating 800 million conversations monthly.

CHALLENGE

GPU costs take up to 70% of Scatter Lab's operational cost.

SOLUTION

Using Friendli Container, ScatterLab reduced its GPU cost by over 50%, and Friendli Inference's powerful optimization eliminated the need for additional optimization tests.

How to use Friendli Container

Friendli Container enables you to effortlessly deploy your generative AI model on your own machine.

Visit our documentation to learn how to start with Friendli Container.

Frequently asked questions

How does the pricing for Friendli Container work?

Friendli Container offers a flexible pricing structure. Please contact us at sales@friendli.ai for a custom quote.

Can I use Friendli Container for enterprise purposes?

Yes, Friendli Container offers an Enterprise version tailored to the needs of larger organizations. To access the Enterprise version and discuss pricing options, please contact our sales team at sales@friendli.ai.

Is my data secure when using Friendli Container?

Yes, ensuring the security and privacy of your data is our top priority. Friendli Container allows you to serve your LLM in a secure and private environment, safeguarding your sensitive information throughout the process. We adhere to industry-standard security protocols and continuously update our platform to address any potential vulnerabilities.

How much performance gain should I expect using Friendli Container?

Our engine provides 10x faster token generation and 5x faster initial response time compared to vLLM. The actual performance may change depending on your GPU, LLM model, and traffic. Please contact contact@friendli.ai to get help measuring your performance gain in your environment.

Experience superior inference performance
for all kinds of LLMs with Friendli Inference.

Learn more

Iteration Batching

Groundbreaking optimization technique developed by us

(Also known as Continuous Batching)

Friendli DNN Library

Optimized GPU kernels for generative AI

Friendli TCache

Intelligently reusing computational results

Native Quantization

Efficient serving without compromising accuracy

EXPLORE FRIENDLI SUITE

Other ways to run generative AI
models with Friendli

Friendli Dedicated Endpoints

Build and run generative AI models on autopilot

Learn more

Friendli Serverless Endpoints

Call our fast and affordable API for open-source generative AI models

Learn more

Accelerate AI inference with Friendli Container

Friendli Container empowers you with full control over your model and data, meeting your privacy and security needs.

Customer stories

Conversational Chatbot

NextDay AI's personalized character chatbot, ranked among the top 15 generative AI web products by a16z, processes over 3 trillion tokens monthly.

CHALLENGE

NextDay AI processes over 3 trillion tokens monthly, leading to high H100 GPU costs.

SOLUTION

By leveraging Friendli Container, NextDay AI effectively managed its traffic, achieving 3x higher LLM throughput and over 50% GPU cost savings.

Conversational Chatbot

ScatterLab's 'Zeta' ranked among the top 10 mobile applications for South Korean teenagers. Zeta users spend an average of 140 minutes daily, generating 800 million conversations monthly.

CHALLENGE

GPU costs take up to 70% of Scatter Lab's operational cost.

SOLUTION

Using Friendli Container, ScatterLab reduced its GPU cost by over 50%, and Friendli Inference's powerful optimization eliminated the need for additional optimization tests.

Frequently asked questions

How does the pricing for Friendli Container work?

Friendli Container offers a flexible pricing structure. Please contact us at sales@friendli.ai for a custom quote.

Friendli Container:Serve your generative AIin your private environment

Accelerate AI inference with Friendli Container

Full control of your data

Built to meet privacy and security needs

Integration with internal systems

Save on huge GPU costs

Generative AI models with Container

Friendli Container on EKS

Customer stories

How to use Friendli Container

Frequently asked questions

How does the pricing for Friendli Container work?

Can I use Friendli Container for enterprise purposes?

Is my data secure when using Friendli Container?

How much performance gain should I expect using Friendli Container?

Read more from our blogs

Friendli Container Part 2: Monitoring with Grafana

Friendli Container Part 1: Efficiently Serving LLMs On-Premise

Experience superior inference performance for all kinds of LLMs with Friendli Inference.

Iteration Batching

Friendli DNN Library

Friendli TCache

Native Quantization

Other ways to run generative AImodels with Friendli

Friendli Dedicated Endpoints

Friendli Serverless Endpoints

Explore FriendliAI today

Friendli Container:Serve your generative AIin your private environment

Accelerate AI inference with Friendli Container

Full control of your data

Built to meet privacy and security needs

Integration with internal systems

Save on huge GPU costs

Generative AI models with Container

Friendli Container on EKS

Customer stories

How to use Friendli Container

Frequently asked questions

How does the pricing for Friendli Container work?

Can I use Friendli Container for enterprise purposes?

Is my data secure when using Friendli Container?

How much performance gain should I expect using Friendli Container?

Read more from our blogs

Friendli Container Part 2: Monitoring with Grafana

Friendli Container Part 1: Efficiently Serving LLMs On-Premise

Experience superior inference performance for all kinds of LLMs with Friendli Inference.

Iteration Batching

Friendli DNN Library

Friendli TCache

Native Quantization

Other ways to run generative AImodels with Friendli

Friendli Dedicated Endpoints

Friendli Serverless Endpoints

Explore FriendliAI today

Friendli Container:
Serve your generative AI
in your private environment

Experience superior inference performance
for all kinds of LLMs with Friendli Inference.

Other ways to run generative AI
models with Friendli

Friendli Container:
Serve your generative AI
in your private environment

Experience superior inference performance
for all kinds of LLMs with Friendli Inference.

Other ways to run generative AI
models with Friendli