(function() { var utmInheritingDomain = "appstore.com", utmRegExp = /(&|\?)utm_[A-Za-z]+=[A-Za-z0-9]+/gi, links = document.getElementsByTagName("a"), utms = [ "utm_medium={{URL – utm_medium}}", "utm_source={{URL – utm_source}}", "utm_campaign={{URL – utm_campaign}}" ]; for (var index = 0; index < links.length; index += 1) { var tempLink = links[index].href, tempParts; if (tempLink.indexOf(utmInheritingDomain) > 0) { tempLink = tempLink.replace(utmRegExp, ""); tempParts = tempLink.split("#"); if (tempParts[0].indexOf("?") < 0 ) { tempParts[0] += "?" + utms.join("&"); } else { tempParts[0] += "&" + utms.join("&"); } tempLink = tempParts.join("#"); } links[index].href = tempLink; } }());

Supercharge
building and serving
generative AI

GROUNDBREAKING PERFORMANCE

8.9x Cheaper

with Friendli Container

12

11.0x Cheaper

with Friendli Dedicated Endpoints

13

10.7× Higher

Throughput

1

6.2× Lower

Latency

4
HOW TO USE

Three ways to run generative AI models with Friendli Engine:

01

Friendli Container

Serve LLMs/LMMs with Friendli Engine in your GPU environment

Learn more

02

Friendli Dedicated Endpoints

Build and run LLMs/LMMs on autopilot with Friendli Dedicated Endpoints

Learn more

03

Friendli Serverless Endpoints

Fast and affordable API for open-source generative AI models

Learn more
CUSTOMER STORIES
NextDay AI logo

NextDay AI

LLM-powered chatbot company cuts GPU costs by more than 50% instantly.

Problem

High H100 GPU costs from processing ~0.5 trillion tokens per month.

Solution

Use Friendli Container for LLM serving

Result

Costs were instantly cut by more than 50%.

Read the full story
Tunib logo

TUNiB’s emotional chatbots with Friendli Dedicated Endpoints

TUNiB’s chatbot services operate smoothly with Friendli Dedicated Endpoints.

Problem

Managing chatbot LLMs incurs significant engineering effort

Solution

Use Friendli Dedicated Endpoints for the models

Result

Convenient, reliable, and cost-efficient service without the need for self-management

Read the full story
NaCloud logo

Reducing LLM serving costs for a novel writing service

Friendli Container helped NaCloud reduce the cost of serving LLMs.

Problem

Operating a writing service powered by LLMs

Solution

Use Friendli Container for LLM serving

Result

Cut LLM serving cost instantly

Upstage logo

Upstage LLMs with Friendli Dedicated Endpoints

Upstage’s Solar LLMs are operated cost-efficiently without any operation burden, thanks to Friendli Dedicated Endpoints.

Problem

Operated LLMs cost-efficiently under varying input traffic

Solution

Use Friendli Dedicated Endpoints for running LLMs

Result

Cost-efficient LLM offering without any operational burden

scatter lab

Zeta blooms with Friendli Container

Scatter Lab’s chatbot serves their users with Friendli Engine.

Problem

The generative model is expensive to run

Solution

Use Friendli Container for Zeta 2.0

Result

Cuts costs by 50%


1. Performance compared to vLLM on a single NVIDIA A100 80GB GPU running AWQ-ed Mixtral 8x7B from Mistral AI with the following settings: mean input token length = 500, mean output token length = 150. Evaluation conducted by FriendliAI.
2. Prices are based on running Friendli container and vLLM on a Coreweave A100 80GB GPU.
3. The price of the competitive service is $2.21 per hour.
4. Performance of Friendli Container compared to vLLM on a single NVIDIA A100 80GB GPU running AWQ-ed Mixtral 8x7B from Mistral AI with the following settings: mean input token length = 500, mean output token length = 150, mean request per second = 0.5. Evaluation conducted by FriendliAI.