(function() { var utmInheritingDomain = "appstore.com", utmRegExp = /(&|\?)utm_[A-Za-z]+=[A-Za-z0-9]+/gi, links = document.getElementsByTagName("a"), utms = [ "utm_medium={{URL - utm_medium}}", "utm_source={{URL - utm_source}}", "utm_campaign={{URL - utm_campaign}}" ]; for (var index = 0; index < links.length; index += 1) { var tempLink = links[index].href, tempParts; if (tempLink.indexOf(utmInheritingDomain) > 0) { tempLink = tempLink.replace(utmRegExp, ""); tempParts = tempLink.split("#"); if (tempParts[0].indexOf("?") < 0 ) { tempParts[0] += "?" + utms.join("&"); } else { tempParts[0] += "&" + utms.join("&"); } tempLink = tempParts.join("#"); } links[index].href = tempLink; } }());

Friendli Container:
Serve your generative AI
in your private environment


FEATURES & BENEFITS

Accelerate AI inference with Friendli Container

Unlock the full potential of your generative AI services with cutting-edge inference optimizations. Deploy seamlessly in your own GPU environment or in your private cloud for blazing fast inference and reduced operational costs.

Friendli Container empowers you with full control over your model and data, ensuring maximum security and privacy.

Friendli Container

Full control of your data

Maximum privacy and security

Integration with internal systems

Save on huge GPU costs

SUPPORTED MODELS

Generative AI models with Container

The current version of Friendli Containers supports all major generative language models,
including Llama 3.3, Mixtral, Mistral, MPT, Gemma, Command R+, and more!

AWS logo

Friendli Container on EKS

Simply deploy your model on Friendli Container in your EKS workflow with a single command!
Now you can find Friendli Dedicated Endpoints on AWS marketplace, making building and serving LLMs seamless and efficient.

Get started here

Customer stories

Conversational Chatbot

NextDay AI logo

NextDay AI's personalized character chatbot, ranked among the top 15 generative AI web products by a16z, processes over 0.5 trillion tokens monthly.

Read more
RESULT

with Friendli Container

LLM
throughput

50%+

GPU
cost saving

Conversational Chatbot

ScatterLab logo

ScatterLab's "Zeta", a top 10 mobile app for South Korean teenagers, leveraged Friendli Container to manage real-time responses with 17 times more parameters than their previous RAG version.

Friendli Engine is an irreplaceable solution for generative AI serving, both in terms of speed and cost-effectiveness. It eliminates the need for serving optimization tests.

Content Generation

NaCloud logo

NaCloud specializes in LLMs for novel writing services.

CHALLENGE

Novel-writing LLM faces significant hurdles in maintaining coherence and context over long-form narratives.

SOLUTION

Friendli Container optimized context window utilization, allowing for better retention and faster generation.


How to use Friendli Container

Friendli Container Graphic

Friendli Container enables you to effortlessly deploy your generative AI model on your own machine.

Visit our documentation to learn how to start with Friendli Container.


Frequently asked questions

How does the pricing for Friendli Container work?

Friendli Container offers a flexible pricing structure. Please contact us at sales@friendli.ai for a custom quote. You can also try the 60 days free trial to experience the full capabilities of Friendli Container. Serve your LLM model in your development environment without any charges.

Can I use Friendli Container for enterprise purposes?

Yes, Friendli Container offers an Enterprise version tailored to the needs of larger organizations. To access the Enterprise version and discuss pricing options, please contact our sales team at sales@friendli.ai.

Is my data secure when using Friendli Container?

Yes, ensuring the security and privacy of your data is our top priority. Friendli Container allows you to serve your LLM in a secure and private environment, safeguarding your sensitive information throughout the process. We adhere to industry-standard security protocols and continuously update our platform to address any potential vulnerabilities.

How much performance gain should I expect using Friendli Container?

Our engine provides 10x faster token generation and 5x faster initial response time compared to vLLM. The actual performance may change depending on your GPU, LLM model, and traffic. Please contact contact@friendli.ai to get help measuring your performance gain in your environment.

Experience superior inference performance
for all kinds of LLMs with Friendli Engine.

Learn more
Iteration Batching

Iteration Batching

Groundbreaking optimization technique developed by us

(Also known as Continuous Batching)

Friendli DNN Library

Friendli DNN Library

Optimized GPU kernels for generative AI

Friendli TCache

Friendli TCache

Intelligently reusing computational results

Native Quantization

Native Quantization

Efficient serving without compromising accuracy

EXPLORE FRIENDLI SUITE

Other ways to run generative AI
models with Friendli

Friendli Dedicated Endpoints

Build and run generative AI models on autopilot

Learn more

Friendli Serverless Endpoints

Call our fast and affordable API for open-source generative AI models

Learn more