Friendli Container:Serve LLMs and LMM inferences
with Friendli Engine in your GPU environment
Supercharge your LLM compute with
Friendli Container’s accelerated inference solutions
Friendli Container simplifies the process of containerizing your generative model for efficient serving.
Our engine ensures better user experiences while cutting down on LLM inference costs.
With Friendli Container, you can perform high-speed LLM inference in a secure and private environment.
Full control of your data
Maximum privacy and security
Integration with internal systems
Save on huge GPU costs
Generative AI models with Container
The current version of Friendli Containers supports all major generative language models,
including Llama 3, Mixtral, Mistral, MPT, Gemma, Command R+, and more!
LLM-powered chatbot company was able to cut GPU costs by more than 50% instantly.
PROBLEM
High H100 GPU costs from processing ~0.5 trillion tokens per month.
SOLUTION
Use Friendli Container for LLM serving.
RESULTS
Costs were instantly cut by more than 50%.
Zeta 2.0 blooms with Friendli Container
PROBLEM
The generative model is expensive to run.
SOLUTION
Use Friendli Container for Zeta.
RESULTS
Cut costs by 50%.
How to use Friendli Container
Friendli Container enables you to effortlessly deploy your generative AI model on your own machine.
Visit our documentation to learn how to start with Friendli Container.
Frequently asked questions
How does the pricing for Friendli Container work?
Friendli Container offers a flexible pricing structure. Please contact us at sales@friendli.ai for a custom quote. You can also try the 60 days free trial to experience the full capabilities of Friendli Container. Serve your LLM model in your development environment without any charges.
Can I use Friendli Container for enterprise purposes?
Yes, Friendli Container offers an Enterprise version tailored to the needs of larger organizations. To access the Enterprise version and discuss pricing options, please contact our sales team at sales@friendli.ai.
Is my data secure when using Friendli Container?
Yes, ensuring the security and privacy of your data is our top priority. Friendli Container allows you to serve your LLM in a secure and private environment, safeguarding your sensitive information throughout the process. We adhere to industry-standard security protocols and continuously update our platform to address any potential vulnerabilities.
How much performance gain should I expect using Friendli Container?
Our engine provides 10x faster token generation and 5x faster initial response time compared to vLLM. The actual performance may change depending on your GPU, LLM model, and traffic. Please contact contact@friendli.ai to get help measuring your performance gain in your environment.
Experience superior inference performance
for all kinds of LLMs with Friendli Engine.
Learn moreFriendli DNN library
Optimized GPU kernels for generative AI
Iteration batching (aka continuous batching)
We invented this technique and have further innovated it.
Learn moreSupports a wide range of generaitve AI models
See full listOther ways to run generative AI
models with Friendli
Friendli Serverless Endpoints
Call our fast and affordable API for open-source generative AI models
Learn more