Use Cases
NaCloud
Reducing LLM serving costs for a novel writing service.
Friendli Container helped NaCloud reduce the cost of serving LLMs.
PROBLEM
Operating a writing service powered by LLMs
Generative AI powered writing service required serving LLMs.
SOLUTION
Uses Friendli Container for LLM serving
Friendli Container enabled our client to use Friendli Engine.
RESULT
Cuts LLM serving cost instantly
NaCloud was able to cut GPU serving costs.
Upstage
Upstage LLMs with Friendli Dedicated Endpoints
Upstage’s Solar LLMs are operated cost-efficiently without any operation burden, thanks to Friendli Dedicated Endpoints.
PROBLEM
Operated LLMs cost-efficiently under varying input traffic
Upstage needed to manage large language model serving efficiently under varying input traffic.
SOLUTION
Uses Friendli Dedicated Endpoints for running LLMs
To solve their problem, Upstage decided to utilize Friendli Dedicated Endpoints which is easy to use for operating large language models.
RESULT
Cost-efficient LLM offering without any operational burden
As a result, Upstage was able to serve their propriety large language model without any operation hassle.
Chatbot Company A
LLM-powered chatbot company A cuts GPU costs by more than 50% instantly
A client company operates a chatbot service. Friendli Container helped them reduce the operation cost.
PROBLEM
Processing ~0.5 trillion tokens per month incurs high H100 GPU costs
The client used many H100 GPUs to power the chatbot service, which was expensive.
SOLUTION
Uses Friendli Container for LLM serving
Friendli Container enabled our client to use Friendli Engine in the client’s own GPU environment.
RESULT
Cuts costs by more than 50% instantly
The client was able to cut GPU operation costs by more than half because our engine was able to handle more traffic with less number of GPUs.
Integration of Friendli Engine with Amazon Sagemaker Jumpstart
Friendli Engine supports running NCSOFT VARCO LLMs in Amazon Sagemaker Jumpstart.
Amazon Sagemaker Jumpstart users can run VARCO LLMs with Friendli Engine, FriendliAI’s cutting-edge generative AI engine. This opens a door to integrate other Jumpstart foundation models with Friendli.
PROBLEM
Serving JumpStart Foundation Models incurs performance and cost challenges
It is challenging to serve JumpStart Foundation Models efficiently in Amazon Sagemaker. The models are computationally heavy, incurring high costs and performance problems.
SOLUTION
Friendli Engine has been integrated with Amazon Sagemaker Jumpstart to serve JumpStart Foundation Models
Friendli Engine can be used with NCSOFT VARCO LLMs. Users of VARCO LLMs enjoy high speed and low cost of serving LLMs.
RESULT
Harness the power of Friendli Engine to serve JumpStart Foundation Models
Users can effortlessly utilize NCSOFT VARCO LLMs on Friendli Engine, resulting in cost reduction within Amazon Sagemaker Jumpstart.
TUNiB’s emotional chatbots with Friendli Dedicated Endpoints
Launch diverse generative AI models, managed by Friendli Dedicated Endpoints
TUNiB's emotional chatbot services are earning accolades with Friendli Dedicated Endpoints - FriendliAI's managed service for serving LLM.
PROBLEM
Managing multiple AI models incurs significant time and costs
The client required to oversee the deployment of various generative AI models to manage unpredictable real-time requests.
SOLUTION
Uses Friendli Dedicated Endpoints for various models
Friendli Dedicated Endpoints has enabled TUNiB to handle real-time executions with ease while also managing the number of deployments necessary to minimize operation costs. In addition, Friendli Engine has further improved its model deployments, significantly reducing both costs and latency.
RESULT
Convenience and dependable service without the need for self-management
With Friendli Dedicated Endpoints, TUNiB is able to deliver enhanced performance in interactive and creative AI models, all while maintaining low operating costs and request latency.
Zeta blooms with Friendli Engine
Realize efficient generative AI with Friendli Engine
Scatter Lab's renewed chatbot service is accepting praise with Friendli Engine, FriendliAI's LLM serving engine that speeds up generative AI.
PROBLEM
Quality and size of generative model comes with its own cost
The client company wanted their model to produce real-time responses based on current context, which required 17 times more parameters than the original version.
SOLUTION
Uses Friendli Engine for Zeta
Scatter Lab adopted Friendli Engine to serve their model. Friendli was able to handle the real-time executions while reducing the cost and the latency dramatically.
RESULT
Reliable service with much improved efficiency
With Friendli Engine, Zeta had launched successfully and is being used in practice. Its enhanced performance of interactive and creative communication is accepting praises while maintaining the cost and latency of the service.
Training a Large Language Model (LLM) with Friendli training
Swift and Sound; develop your own large-scale AI with Friendli training
We developed a GPT-3 13B model to show what it's like to train a LLM on Friendli training.
PROBLEM
Too much cost for large-scale AI training
Normally, training a large-scale model takes a lot of resources. If you take distributed learning, the burden of faults and loads would only increase.
SOLUTION
Automated and optimized training experience
On Friendli Engine, we could enjoy its special support for distributed learning along with various optimization techniques. Friendli Engine also handled the errors and performance problems to ensure sound training.
RESULT
Made large-scale AI simple
Manipulating Friendli’s automatic matching for state-of-the-art training techniques, training a 13 billion parameter model was felt like a breeze.