Friendli Container:Serve your generative AI
in your private environment
Accelerate AI inference with Friendli Container
Unlock the full potential of your generative AI services with cutting-edge inference optimizations. Deploy seamlessly in your own GPU environment or in your private cloud for blazing fast inference and reduced operational costs.
Friendli Container empowers you with full control over your model and data, ensuring maximum security and privacy.
Full control of your data
Maximum privacy and security
Integration with internal systems
Save on huge GPU costs
Generative AI models with Container
The current version of Friendli Containers supports all major generative language models,
including Llama 3.3, Mixtral, Mistral, MPT, Gemma, Command R+, and more!
Friendli Container on EKS
Simply deploy your model on Friendli Container in your EKS workflow with a single command!
Now you can find Friendli Dedicated Endpoints on AWS marketplace, making building and serving LLMs seamless and efficient.
Customer stories
Conversational Chatbot
NextDay AI's personalized character chatbot, ranked among the top 15 generative AI web products by a16z, processes over 0.5 trillion tokens monthly.
Read morewith Friendli Container
3×
LLM
throughput
50%+
GPU
cost saving
Conversational Chatbot
ScatterLab's "Zeta", a top 10 mobile app for South Korean teenagers, leveraged Friendli Container to manage real-time responses with 17 times more parameters than their previous RAG version.
Friendli Engine is an irreplaceable solution for generative AI serving, both in terms of speed and cost-effectiveness. It eliminates the need for serving optimization tests.
Content Generation
NaCloud specializes in LLMs for novel writing services.
Novel-writing LLM faces significant hurdles in maintaining coherence and context over long-form narratives.
Friendli Container optimized context window utilization, allowing for better retention and faster generation.
How to use Friendli Container
Friendli Container enables you to effortlessly deploy your generative AI model on your own machine.
Visit our documentation to learn how to start with Friendli Container.
Frequently asked questions
How does the pricing for Friendli Container work?
Friendli Container offers a flexible pricing structure. Please contact us at sales@friendli.ai for a custom quote. You can also try the 60 days free trial to experience the full capabilities of Friendli Container. Serve your LLM model in your development environment without any charges.
Can I use Friendli Container for enterprise purposes?
Yes, Friendli Container offers an Enterprise version tailored to the needs of larger organizations. To access the Enterprise version and discuss pricing options, please contact our sales team at sales@friendli.ai.
Is my data secure when using Friendli Container?
Yes, ensuring the security and privacy of your data is our top priority. Friendli Container allows you to serve your LLM in a secure and private environment, safeguarding your sensitive information throughout the process. We adhere to industry-standard security protocols and continuously update our platform to address any potential vulnerabilities.
How much performance gain should I expect using Friendli Container?
Our engine provides 10x faster token generation and 5x faster initial response time compared to vLLM. The actual performance may change depending on your GPU, LLM model, and traffic. Please contact contact@friendli.ai to get help measuring your performance gain in your environment.
Experience superior inference performance
for all kinds of LLMs with Friendli Engine.
Learn moreIteration Batching
Groundbreaking optimization technique developed by us
(Also known as Continuous Batching)
Friendli DNN Library
Optimized GPU kernels for generative AI
Friendli TCache
Intelligently reusing computational results
Native Quantization
Efficient serving without compromising accuracy
Other ways to run generative AI
models with Friendli
Friendli Serverless Endpoints
Call our fast and affordable API for open-source generative AI models
Learn more