Use Cases



Reducing LLM serving costs for a novel writing service.

Friendli Container helped NaCloud reduce the cost of serving LLMs.


Operating a writing service powered by LLMs

Generative AI powered writing service required serving LLMs.


Uses Friendli Container for LLM serving

Friendli Container enabled our client to use Friendli Engine.


Cuts LLM serving cost instantly

NaCloud was able to cut GPU serving costs.



Upstage LLMs with Friendli Dedicated Endpoints

Upstage’s Solar LLMs are operated cost-efficiently without any operation burden, thanks to Friendli Dedicated Endpoints.


Operated LLMs cost-efficiently under varying input traffic

Upstage needed to manage large language model serving efficiently under varying input traffic.


Uses Friendli Dedicated Endpoints for running LLMs

To solve their problem, Upstage decided to utilize Friendli Dedicated Endpoints which is easy to use for operating large language models.


Cost-efficient LLM offering without any operational burden

As a result, Upstage was able to serve their propriety large language model without any operation hassle.


Company A

LLM-powered chatbot company cuts GPU costs by more than 50% instantly

A client company operates a chatbot service. Friendli Container helped them reduce the operation cost.


Processing ~0.5 trillion tokens per month incurs high H100 GPU costs

The client used many H100 GPUs to power the chatbot service, which was expensive.


Uses Friendli Container for LLM serving

Friendli Container enabled our client to use Friendli Engine in the client’s own GPU environment.


Cuts costs by more than 50% instantly

The client was able to cut GPU operation costs by more than half because our engine was able to handle more traffic with less number of GPUs.


Integration of Friendli Engine with Amazon Sagemaker Jumpstart

Friendli Engine supports running NCSOFT VARCO LLMs in Amazon Sagemaker Jumpstart.

Amazon Sagemaker Jumpstart users can run VARCO LLMs with Friendli Engine, FriendliAI’s cutting-edge generative AI engine. This opens a door to integrate other Jumpstart foundation models with Friendli.


Serving JumpStart Foundation Models incurs performance and cost challenges

It is challenging to serve JumpStart Foundation Models efficiently in Amazon Sagemaker. The models are computationally heavy, incurring high costs and performance problems.


Friendli Engine has been integrated with Amazon Sagemaker Jumpstart to serve JumpStart Foundation Models

Friendli Engine can be used with NCSOFT VARCO LLMs. Users of VARCO LLMs enjoy high speed and low cost of serving LLMs.


Harness the power of Friendli Engine to serve JumpStart Foundation Models

Users can effortlessly utilize NCSOFT VARCO LLMs on Friendli Engine, resulting in cost reduction within Amazon Sagemaker Jumpstart.


TUNiB’s emotional chatbots with Friendli Dedicated Endpoints

Launch diverse generative AI models, managed by Friendli Dedicated Endpoints

TUNiB's emotional chatbot services are earning accolades with Friendli Dedicated Endpoints - FriendliAI's managed service for serving LLM.


Managing multiple AI models incurs significant time and costs

The client required to oversee the deployment of various generative AI models to manage unpredictable real-time requests.


Uses Friendli Dedicated Endpoints for various models

Friendli Dedicated Endpoints has enabled TUNiB to handle real-time executions with ease while also managing the number of deployments necessary to minimize operation costs. In addition, Friendli Engine has further improved its model deployments, significantly reducing both costs and latency.


Convenience and dependable service without the need for self-management

With Friendli Dedicated Endpoints, TUNiB is able to deliver enhanced performance in interactive and creative AI models, all while maintaining low operating costs and request latency.


Lee Luda (이루다) 2.0 blooms with Friendli Engine

Realize efficient generative AI with Friendli Engine

Scatter Lab's renewed chatbot service is accepting praise with Friendli Engine, FriendliAI's LLM serving engine that speeds up generative AI.


Quality and size of generative model comes with its own cost

The client company wanted their model to produce real-time responses based on current context, which required 17 times more parameters than the original version.


Uses Friendli Engine for Lee Luda 2.0

Scatter Lab adopted Friendli Engine to serve their model. Friendli was able to handle the real-time executions while reducing the cost and the latency dramatically.


Reliable service with much improved efficiency

With Friendli Engine, Lee Luda 2.0 had launched successfully and is being used in practice. Its enhanced performance of interactive and creative communication is accepting praises while maintaining the cost and latency of the service.


Training a Large Language Model (LLM) with Friendli training

Swift and Sound; develop your own large-scale AI with Friendli training

We developed a GPT-3 13B model to show what it's like to train a LLM on Friendli training.


Too much cost for large-scale AI training

Normally, training a large-scale model takes a lot of resources. If you take distributed learning, the burden of faults and loads would only increase.


Automated and optimized training experience

On Friendli Engine, we could enjoy its special support for distributed learning along with various optimization techniques. Friendli Engine also handled the errors and performance problems to ensure sound training.


Made large-scale AI simple

Manipulating Friendli’s automatic matching for state-of-the-art training techniques, training a 13 billion parameter model was felt like a breeze.


Friendli Engine

Use Friendli Engine for your generative language model

A client company using Transformer-based models was in a difficult situation, and we could help them with our efficient serving system.


Resource-consuming generative models

Transformer-based generative models take massive cost for serving, making companies reluctant to use them.


Tens of times increased efficiency using Friendli Engine

We offered Friendli Engine, our new serving system for generative models. With Friendli Engine, the generative model of the company was able to experience much improves throughput with low and steady latency.


More affordable use of generative models

The company can use the models without cost concerns. Friendli Engine can be helpful to any kind of generative models in need of high performance with low cost.