(function() { var utmInheritingDomain = "appstore.com", utmRegExp = /(&|\?)utm_[A-Za-z]+=[A-Za-z0-9]+/gi, links = document.getElementsByTagName("a"), utms = [ "utm_medium={{URL - utm_medium}}", "utm_source={{URL - utm_source}}", "utm_campaign={{URL - utm_campaign}}" ]; for (var index = 0; index < links.length; index += 1) { var tempLink = links[index].href, tempParts; if (tempLink.indexOf(utmInheritingDomain) > 0) { tempLink = tempLink.replace(utmRegExp, ""); tempParts = tempLink.split("#"); if (tempParts[0].indexOf("?") < 0 ) { tempParts[0] += "?" + utms.join("&"); } else { tempParts[0] += "&" + utms.join("&"); } tempLink = tempParts.join("#"); } links[index].href = tempLink; } }());

FriendliAI
Use Cases


Save operational cost

50-

90%

Save GPU cost

Use less GPUs 1

Experience blazing speed

10.7×

Higher throughput 2

6.2×

Lower latency 3

With FriendliAI

Book demo

Streamlined workflow

Fine-tune, deploy and monitor in one workflow

Fully managed service

Autoscaling GPU resources

Automatic fault recovery


USE CASES

Industry

Explore detailed use cases tailored to various industries, including healthcare, finance, and e-commerce. Learn how industry-specific applications of AI can drive efficiency, innovation, and growth in your sector.

Book Demo

Customer Stories

Discover how we’ve helped businesses and individuals achieve their goals by exploring our customers’ success stories. Learn about the unique challenges they faced, the solutions we provided, and the lasting impact we made.

Use cases diagram

GPU icon

Challenge

High-traffic conversational chatbots face significant GPU costs,
with average daily usage of 2 hours per user.

Solution

FriendliAI's optimization techniques reduce GPU costs by
50% to 90% for chatbot companies.

Coin icon

Conversational Chatbot

NextDay AI logo

NextDay AI's personalized character chatbot, ranked among the top 15 generative AI web products by a16z, processes over 0.5 trillion tokens monthly.

Read more
RESULT

with Friendli Container

LLM
throughput

50%+

GPU
cost saving

Conversational Chatbot

ScatterLab logo

ScatterLab's 'Zeta', a top 10 mobile app for South Korean teenagers, leveraged Friendli Container to manage real-time responses with 17 times more parameters than their previous RAG version.

Friendli Engine is an irreplaceable solution for generative AI serving, both in terms of speed and cost-effectiveness. It eliminates the need for serving optimization tests.

Conversational Chatbot

TUNiB logo

TUNiB's DearMate chatbot service offers various personas like friend, lover, counselor, and coach. Friendli Dedicated Endpoints' managed platform allows TUNiB to focus on model training while automating GPU resource management and fault recovery.

Read more
Friendli Dedicated Endpoints simplifies generative AI model serving and optimizes our service development process.

Telecom Service

SKT logo

SKT is South Korea's leading telecom operator known for its innovative mobile services, extensive 5G infrastructure and advancements in AI development.

CHALLENGE

Building and serving AI agents for SKT’s massive customer base required strict SLAs, high reliability, and the ability to efficiently handle heavy traffic.

SOLUTION

Friendli Dedicated Endpoints enabled exceptional reliability and traffic efficiency while reducing operational costs.

RESULT

Within
few hours of
onboarding

LLM
throughput

Cost
saving

Upstage logo

Productivity

Upstage's Solar Pro 22B is an advanced LLM that excels at processing extensive documents and structured text, such as HTML and Markdown. It offers multilingual support with strong domain expertise in finance, healthcare, and legal sectors.

Translation

Upstage's Solar Mini 10.7B powers translation, chat, document parsing, and OCR capabilities.

CHALLENGE

Efficiently managing LLM serving under fluctuating input traffic for stable performance and reliability.

SOLUTION

Friendli Dedicated Endpoints easily managed varying traffic with autoscaling and automatic fault recovery systems.




1. Testing conducted by FriendliAI in October 2023 using Llama-2-13B running on Friendli Engine. See the detailed results and methodology here.
2. Performance compared to vLLM on a single NVIDIA A100 80GB GPU running AWQ-ed Mixtral 8x7B from Mistral AI with the following settings: mean input token length = 500, mean output token length = 150. Evaluation conducted by FriendliAI.
3. Performance of Friendli Container compared to vLLM on a single NVIDIA A100 80GB GPU running AWQ-ed Mixtral 8x7B from Mistral AI with the following settings: mean input token length = 500, mean output token length = 150, mean request per second = 0.5. Evaluation conducted by FriendliAI.