Use Cases

NaCloud

Reducing LLM serving costs for a novel writing service.

Friendli Container helped NaCloud reduce the cost of serving LLMs.

PROBLEM

Operating a writing service powered by LLMs

Generative AI powered writing service required serving LLMs.

SOLUTION

Uses Friendli Container for LLM serving

Friendli Container enabled our client to use Friendli Engine.

RESULT

Cuts LLM serving cost instantly

NaCloud was able to cut GPU serving costs.

Upstage

Upstage LLMs with Friendli Dedicated Endpoints

Upstage’s Solar LLMs are operated cost-efficiently without any operation burden, thanks to Friendli Dedicated Endpoints.

PROBLEM

Operated LLMs cost-efficiently under varying input traffic

Upstage needed to manage large language model serving efficiently under varying input traffic.

SOLUTION

Uses Friendli Dedicated Endpoints for running LLMs

To solve their problem, Upstage decided to utilize Friendli Dedicated Endpoints which is easy to use for operating large language models.

RESULT

Cost-efficient LLM offering without any operational burden

As a result, Upstage was able to serve their propriety large language model without any operation hassle.

Chatbot Company A

LLM-powered chatbot company A cuts GPU costs by more than 50% instantly

A client company operates a chatbot service. Friendli Container helped them reduce the operation cost.

PROBLEM

Processing ~0.5 trillion tokens per month incurs high H100 GPU costs

The client used many H100 GPUs to power the chatbot service, which was expensive.

SOLUTION

Uses Friendli Container for LLM serving

Friendli Container enabled our client to use Friendli Engine in the client’s own GPU environment.

RESULT

Cuts costs by more than 50% instantly

The client was able to cut GPU operation costs by more than half because our engine was able to handle more traffic with less number of GPUs.

Integration of Friendli Engine with Amazon Sagemaker Jumpstart

Friendli Engine supports running NCSOFT VARCO LLMs in Amazon Sagemaker Jumpstart.

Amazon Sagemaker Jumpstart users can run VARCO LLMs with Friendli Engine, FriendliAI’s cutting-edge generative AI engine. This opens a door to integrate other Jumpstart foundation models with Friendli.

PROBLEM

Serving JumpStart Foundation Models incurs performance and cost challenges

It is challenging to serve JumpStart Foundation Models efficiently in Amazon Sagemaker. The models are computationally heavy, incurring high costs and performance problems.

SOLUTION

Friendli Engine has been integrated with Amazon Sagemaker Jumpstart to serve JumpStart Foundation Models

Friendli Engine can be used with NCSOFT VARCO LLMs. Users of VARCO LLMs enjoy high speed and low cost of serving LLMs.

RESULT

Harness the power of Friendli Engine to serve JumpStart Foundation Models

Users can effortlessly utilize NCSOFT VARCO LLMs on Friendli Engine, resulting in cost reduction within Amazon Sagemaker Jumpstart.

TUNiB’s emotional chatbots with Friendli Dedicated Endpoints

Launch diverse generative AI models, managed by Friendli Dedicated Endpoints

TUNiB's emotional chatbot services are earning accolades with Friendli Dedicated Endpoints - FriendliAI's managed service for serving LLM.

PROBLEM

Managing multiple AI models incurs significant time and costs

The client required to oversee the deployment of various generative AI models to manage unpredictable real-time requests.

SOLUTION

Uses Friendli Dedicated Endpoints for various models

Friendli Dedicated Endpoints has enabled TUNiB to handle real-time executions with ease while also managing the number of deployments necessary to minimize operation costs. In addition, Friendli Engine has further improved its model deployments, significantly reducing both costs and latency.

RESULT

Convenience and dependable service without the need for self-management

With Friendli Dedicated Endpoints, TUNiB is able to deliver enhanced performance in interactive and creative AI models, all while maintaining low operating costs and request latency.

Zeta blooms with Friendli Engine

Realize efficient generative AI with Friendli Engine

Scatter Lab's renewed chatbot service is accepting praise with Friendli Engine, FriendliAI's LLM serving engine that speeds up generative AI.

PROBLEM

Quality and size of generative model comes with its own cost

The client company wanted their model to produce real-time responses based on current context, which required 17 times more parameters than the original version.

SOLUTION

Uses Friendli Engine for Zeta

Scatter Lab adopted Friendli Engine to serve their model. Friendli was able to handle the real-time executions while reducing the cost and the latency dramatically.

RESULT

Reliable service with much improved efficiency

With Friendli Engine, Zeta had launched successfully and is being used in practice. Its enhanced performance of interactive and creative communication is accepting praises while maintaining the cost and latency of the service.

Training a Large Language Model (LLM) with Friendli training

Swift and Sound; develop your own large-scale AI with Friendli training

We developed a GPT-3 13B model to show what it's like to train a LLM on Friendli training.

PROBLEM

Too much cost for large-scale AI training

Normally, training a large-scale model takes a lot of resources. If you take distributed learning, the burden of faults and loads would only increase.

SOLUTION

Automated and optimized training experience

On Friendli Engine, we could enjoy its special support for distributed learning along with various optimization techniques. Friendli Engine also handled the errors and performance problems to ensure sound training.

RESULT

Made large-scale AI simple

Manipulating Friendli’s automatic matching for state-of-the-art training techniques, training a 13 billion parameter model was felt like a breeze.