FriendliAI Secures $20M to Accelerate AI Inference Innovation — Read the Full Story

FriendliAI
Use Cases

Get StartedTalk to an expert

Save operational cost

50-

90%

Save GPU cost

6×

Use less GPUs 1

Experience blazing speed

10.7×

Higher throughput 2

6.2×

Lower latency 3

With FriendliAI

Book demo

Streamlined workflow

Deploy and monitor in one workflow

Fully managed service

Autoscaling GPU resources

Automatic fault recovery


USE CASES

Industry

Explore detailed use cases tailored to various industries, including healthcare, finance, and e-commerce. Learn how industry-specific applications of AI can drive efficiency, innovation, and growth in your sector.

Book Demo

Customer Stories

Discover how we’ve helped businesses and individuals achieve their goals by exploring our customers’ success stories. Learn about the unique challenges they faced, the solutions we provided, and the lasting impact we made.

Use cases diagram

Challenge

High-traffic conversational chatbots face significant GPU costs,
with average daily usage of 2 hours per user.

Solution

FriendliAI's optimization techniques reduce GPU costs by
50% to 90% for chatbot companies.

Conversational Chatbot

NextDay AI's personalized character chatbot, ranked among the top 15 generative AI web products by a16z, processes over 3 trillion tokens monthly.

Read more
CHALLENGE

NextDay AI processes over 3 trillion tokens monthly, leading to high H100 GPU costs.

SOLUTION

By leveraging Friendli Container, NextDay AI effectively managed its traffic, achieving 3x higher LLM throughput and over 50% GPU cost savings.

Conversational Chatbot

ScatterLab logo

ScatterLab's 'Zeta' ranked among the top 10 mobile applications for South Korean teenagers. Zeta users spend an average of 140 minutes daily, generating 800 million conversations monthly.

CHALLENGE

GPU costs take up to 70% of Scatter Lab's operational cost.

SOLUTION

Using Friendli Container, ScatterLab reduced its GPU cost by over 50%, and Friendli Inference's powerful optimization eliminated the need for additional optimization tests.

Conversational Chatbot

TUNiB logo

TUNiB’s DearMate chatbot service offers various personas like friend, lover, counselor, and coach.

Read more
CHALLENGE

With limited engineering resources, TUNiB aimed to prioritize model development.

SOLUTION

Friendli Dedicated Endpoints’ managed platform enabled TUNiB to focus on model training by automating GPU resource management and fault recovery.

Telecom Service

SKT is South Korea’s leading telecom operator known for its innovative mobile services, extensive 5G infrastructure and advancements in AI development.

CHALLENGE

Building and serving AI agents for SKT’s massive customer base required strict SLAs, high reliability, and the ability to efficiently handle heavy traffic.

SOLUTION

Friendli Dedicated Endpoints delivered exceptional reliability and traffic efficiency while reducing operational costs. Within hours of onboarding, SKT achieved a 5x increase in LLM throughput and 3x cost savings.

Productivity

Upstage's Solar Pro 22B is an advanced LLM that excels at processing extensive documents and structured text, such as HTML and Markdown. It offers multilingual support with strong domain expertise in finance, healthcare, and legal sectors.

Translation

Upstage's Solar Mini 10.7B powers translation, chat, document parsing, and OCR capabilities.

CHALLENGE

Efficiently managing LLM serving under fluctuating input traffic for stable performance and reliability.

SOLUTION

Friendli Dedicated Endpoints easily managed varying traffic with autoscaling and automatic fault recovery systems.


MORE TO READ

Read more of our customers’ success stories

NextDay AI instantly saves GPU costs for LLM serving thumbnail
  • June 3, 2024
  • 4 min read

NextDay AI instantly saves GPU costs for LLM serving

How TUNiB easily managed & scales their emotional chatbot service thumbnail
  • June 2, 2024
  • 3 min read

How TUNiB easily managed & scales their emotional chatbot service


Are you ready to build and deploy your generative AI product effortlessly?

Get started free

1. Testing conducted by FriendliAI in October 2023 using Llama-2-13B running on Friendli Inference. See the detailed results and methodology here.
2. Performance compared to vLLM on a single NVIDIA A100 80GB GPU running AWQ-ed Mixtral 8x7B from Mistral AI with the following settings: mean input token length = 500, mean output token length = 150. Evaluation conducted by FriendliAI.
3. Performance of Friendli Container compared to vLLM on a single NVIDIA A100 80GB GPU running AWQ-ed Mixtral 8x7B from Mistral AI with the following settings: mean input token length = 500, mean output token length = 150, mean request per second = 0.5. Evaluation conducted by FriendliAI.

Products

Friendli Dedicated EndpointsFriendli Serverless EndpointsFriendli Container

Solutions

InferenceUse Cases
Models

Developers

DocsBlogResearch

Company

About usNewsCareersPatentsBrand ResourcesContact us
Pricing

Contact us:

contact@friendli.ai

FriendliAI Corp:

Redwood City, CA

Hub:

Seoul, Korea

Privacy PolicyService Level AgreementTerms of ServiceCA Notice

Copyright © 2025 FriendliAI Corp. All rights reserved