(function() { var utmInheritingDomain = "appstore.com", utmRegExp = /(&|\?)utm_[A-Za-z]+=[A-Za-z0-9]+/gi, links = document.getElementsByTagName("a"), utms = [ "utm_medium={{URL – utm_medium}}", "utm_source={{URL – utm_source}}", "utm_campaign={{URL – utm_campaign}}" ]; for (var index = 0; index < links.length; index += 1) { var tempLink = links[index].href, tempParts; if (tempLink.indexOf(utmInheritingDomain) > 0) { tempLink = tempLink.replace(utmRegExp, ""); tempParts = tempLink.split("#"); if (tempParts[0].indexOf("?") < 0 ) { tempParts[0] += "?" + utms.join("&"); } else { tempParts[0] += "&" + utms.join("&"); } tempLink = tempParts.join("#"); } links[index].href = tempLink; } }());

Supercharge
building and serving
generative AI

GROUNDBREAKING PERFORMANCE

8.9x Cheaper

with Friendli Container

12

11.0x Cheaper

with Friendli Dedicated Endpoints

13

10.7× Higher

Throughput

1

6.2× Lower

Latency

4
HOW TO USE

Three ways to run generative AI models with Friendli Engine:

01

Friendli Container

Serve LLMs/LMMs inferences with Friendli Engine in your GPU environment

Learn more

02

Friendli Dedicated Endpoints

Build and run LLMs/LMMs on autopilot with Friendli Dedicated Endpoints

Learn more

03

Friendli Serverless Endpoints

Fast and affordable API for open-source generative AI models

Learn more
CUSTOMER STORIES
NextDay AI logo

NextDay AI

LLM-powered chatbot company cuts GPU costs by more than 50% instantly.

Problem

High H100 GPU costs from processing ~0.5 trillion tokens per month.

Solution

Use Friendli Container for LLM serving

Result

Costs were instantly cut by more than 50%.

Read the full story
SK Telecom logo

SK Telecom Elevates LLM Operations with Friendli Dedicated Endpoints

SKT’s custom LLMs were deployed seamlessly with Friendli Dedicated Endpoints, achieving 5x throughput and reducing 3x operational costs.

Problem

Running and operating the custom LLMs requires long hours and increases operational costs.

Solution

Leverages Friendli Dedicated Endpoints to serve and operate their LLMs.

Result

Onboarding within a few hours, 3x cost savings, and 5x increase in throughput.

NaCloud logo

Reducing LLM serving costs for a novel writing service

Friendli Container helped NaCloud reduce the cost of serving LLMs.

Problem

Operating a LLM writing service requires high inference cost

Solution

Use Friendli Container for LLM serving

Result

Cut LLM serving cost instantly

Upstage logo

Upstage LLMs with Friendli Dedicated Endpoints

Upstage’s Solar LLMs are operated cost-efficiently without any operation burden, thanks to Friendli Dedicated Endpoints.

Problem

Translation traffic inference (~100k/day) needs cost efficient operation

Solution

Use Friendli Dedicated Endpoints for running LLMs

Result

Cost-efficient LLM offering without any operational burden

Tunib logo

TUNiB’s emotional chatbots with Friendli Dedicated Endpoints

TUNiB’s chatbot operates LLM inference requests smoothly with Friendli Dedicated Endpoints.

Problem

Managing chatbot LLMs incurs significant engineering effort

Solution

Use Friendli Dedicated Endpoints for the models

Result

Convenient, reliable, and cost-efficient service without the need for self-management

Read the full story

1. Performance compared to vLLM on a single NVIDIA A100 80GB GPU running AWQ-ed Mixtral 8x7B from Mistral AI with the following settings: mean input token length = 500, mean output token length = 150. Evaluation conducted by FriendliAI.
2. Prices are based on running Friendli container and vLLM on a Coreweave A100 80GB GPU.
3. The price of the competitive service is $2.21 per hour.
4. Performance of Friendli Container compared to vLLM on a single NVIDIA A100 80GB GPU running AWQ-ed Mixtral 8x7B from Mistral AI with the following settings: mean input token length = 500, mean output token length = 150, mean request per second = 0.5. Evaluation conducted by FriendliAI.