Supercharge
building and serving
generative AI

GROUNDBREAKING PERFORMANCE

7.5× Cheaper

than OpenAI GPT 3.5

1

6× Higher

Throughput

2

4× Lower

Latency

3
HOW TO USE

Three ways to run generative AI models with Friendli Engine:

01

Friendli Container

Serve LLMs in your private environment

Learn more

02

Friendli Dedicated Endpoints

Build and serve custom LLMs on autopilot with Friendli Dedicated Endpoints

Learn more

03

Friendli Serverless Endpoints

Fast and affordable API for open-source LLMs and LMMs

Learn more
CUSTOMER STORIES
NaCloud logo

Reducing LLM serving costs for a novel writing service

Friendli Container helped NaCloud reduce the cost of serving LLMs.

Problem

Operating a writing service powered by LLMs

Solution

Uses Friendli Container for LLM serving

Result

Cuts LLM serving cost instantly

Upstage logo

Upstage LLMs with Friendli Dedicated Endpoints

Upstage’s Solar LLMs are operated cost-efficiently without any operation burden, thanks to Friendli Dedicated Endpoints.

Problem

Operated LLMs cost-efficiently under varying input traffic

Solution

Uses Friendli Dedicated Endpoints for running LLMs

Result

Cost-efficient LLM offering without any operational burden

Chatbot Company A

Chatbot Company A

LLM-powered chatbot company A cuts GPU costs by more than 50% instantly.

Problem

Processing ~0.5 trillion tokens per month incurs high H100 GPU costs

Solution

Uses Friendli Container for LLM serving

Result

Cuts costs by more than 50% instantly

ncsoft

Integration of Friendli Engine with Amazon Sagemaker Jumpstart

Amazon Sagemaker Jumpstart users can run VARCO LLMs with Friendli Engine.

Problem

Serving JumpStart Foundation Models incurs performance and cost challenges

Solution

Friendli Engine has been integrated with Amazon Sagemaker Jumpstart to serve JumpStart Foundation Models

Result

Harness the power of Friendli Engine to serve JumpStart Foundation Models

tunib

TUNiB’s emotional chatbots with Friendli Dedicated Endpoints

TUNiB’s chatbot services operate smoothly with Friendli Dedicated Endpoints.

Problem

Managing chatbot LLMs incurs significant engineering effort

Solution

Uses Friendli Dedicated Endpoints for the models

Result

Convenient, reliable, and cost-efficient service without the need for self-management

scatter lab

Zeta blooms with Friendli Container

Scatter Lab’s chatbot serves their users with Friendli Engine.

Problem

The generative model is expensive to run

Solution

Uses Friendli Container for Zeta 2.0

Result

Cuts costs by 50%


1. Based on published pricing November 8th, 2023, comparing Open AI GPT-3.5-Turboto Llama-2-13B using Friendli Serverless Endpoints. Assumes equal number of input and output tokens.
2. Testing conducted by FriendliAI in October 2023 using Llama-2-13B running on Friendli Engine. See the detailed results and methodologyhere.
3. Testing conducted by FriendliAI in October 2023 using Llama-2-70B running on Friendli Engine. See the detailed results and methodologyhere.