(function() { var utmInheritingDomain = "appstore.com", utmRegExp = /(&|\?)utm_[A-Za-z]+=[A-Za-z0-9]+/gi, links = document.getElementsByTagName("a"), utms = [ "utm_medium={{URL – utm_medium}}", "utm_source={{URL – utm_source}}", "utm_campaign={{URL – utm_campaign}}" ]; for (var index = 0; index < links.length; index += 1) { var tempLink = links[index].href, tempParts; if (tempLink.indexOf(utmInheritingDomain) > 0) { tempLink = tempLink.replace(utmRegExp, ""); tempParts = tempLink.split("#"); if (tempParts[0].indexOf("?") < 0 ) { tempParts[0] += "?" + utms.join("&"); } else { tempParts[0] += "&" + utms.join("&"); } tempLink = tempParts.join("#"); } links[index].href = tempLink; } }());

Friendli Container:
Serve LLMs/LMMs inferences with Friendli Engine
in your GPU environment


FEATURES & BENEFITS

Supercharge your LLM compute with
Friendli Container’s accelerated inference solutions

Friendli Container simplifies the process of containerizing your generative model for efficient serving.
Our engine ensures better user experiences while cutting down on LLM inference costs.
With Friendli Container, you can perform high-speed LLM inferencing in a secure and private environment.

Full control of your data

Maximum privacy and security

Integration with internal systems

Save on huge GPU costs

SUPPORTED MODELS

Generative AI models with Container

The current version of Friendli Containers supports all major generative language models,
including Llama 3, Mixtral, Mistral, MPT, Gemma, Command R+, and more!

Generative AI models with Container
CUSTOMER STORY

LLM-powered chatbot company cuts GPU costs by more than 50% instantly.


PROBLEM

High H100 GPU costs from processing ~0.5 trillion tokens per month.

SOLUTION

Use Friendli Container for LLM serving.

RESULTS

Costs were instantly cut by more than 50%.

Zeta 2.0 blooms with Friendli Container


PROBLEM

The generative model is expensive to run.

SOLUTION

Use Friendli Container for Zeta.

RESULTS

Cuts costs by 50%.

How to use Friendli Container

Friendli Container Graphic

Friendli Containers enable you to effortlessly deploy your generative AI model on your own machine.

Visit our documentation to learn how to start running a Friendli Container.

Frequently asked questions

How does the pricing for Friendli Container work?

Friendli Container offers a flexible pricing structure. Please contact us at sales@friendli.ai for a custom quote. You can also try the 60 days free trial to experience the full capabilities of Friendli Container. Serve your LLM model in your development environment without any charges.

Can I use Friendli Container for enterprise purposes?

Yes, Friendli Container offers an Enterprise version tailored to the needs of larger organizations. To access the Enterprise version and discuss pricing options, please contact our sales team at sales@friendli.ai.

Is my data secure when using Friendli Container?

Yes, ensuring the security and privacy of your data is our top priority. Friendli Container allows you to serve your LLM in a secure and private environment, safeguarding your sensitive information throughout the process. We adhere to industry-standard security protocols and continuously update our platform to address any potential vulnerabilities.

How much performance gain should I expect using Friendli Container?

Our engine provides 10x faster token generation and 5x faster initial response time compared to vLLM. The actual performance may change depending on your GPU, LLM model, and traffic. Please contact contact@friendli.ai to get help measuring your performance gain in your environment.

Experience superior inference performance
for all kinds of LLMs with Friendli Engine.

Learn more

Friendli DNN library

Optimized GPU kernels for generative AI

Iteration batching (aka continuous batching)

We invented this technique and have further innovated it.

Learn more

Friendli TCache

Intelligently reuse computatioanal results

Learn more

Native quantization support

Efficient serving without sacrificing accuracy

Learn more

Multi-LoRA serving on a single GPU

Serve multiple LoRA models on a single GPU

Learn more

Supports a wide range of generaitve AI models

See full list
EXPLORE FRIENDLI SUITE

Other ways to run generative AI
models with Friendli

Friendli Dedicated Endpoints

Build and run LLMs on autopilot

Learn more

Friendli Serverless Endpoints

Fast and affordable API for open-source generative AI models

Learn more