Supercharge
generative AI serving

Cut costs with the fastest LLM serving engine on the market

GROUNDBREAKING PERFORMANCE

7.5× Cheaper

than OpenAI GPT 3.5

1

6× Higher

Throughput

2

4× Lower

Latency

3
HOW TO USE

Three ways to run generative AI models with Friendli Engine:

01

Friendli Serverless Endpoints

Fast and affordable API for open-source LLMs and LMMs

Learn more

02

Friendli Dedicated Endpoints

Run custom LLMs on autopilot with Friendli Dedicated Endpoints

Learn more

03

Friendli Container

Serve LLMs in your private environment

Learn more
USE CASES
ncsoft

Integration of Friendli Engine with Amazon Sagemaker Jumpstart

Amazon Sagemaker Jumpstart users can run VARCO LLMs with Friendli Engine, FriendliAI’s cutting-edge generative AI engine. This opens a door to integrate other Jumpstart foundation models with Friendli.

Problem

Serving JumpStart Foundation Models incurs performance and cost challenges.

Solution

Friendli Engine has been integrated with Amazon Sagemaker Jumpstart to serve JumpStart Foundation Models.

Result

Harness the power of Friendli Engine to serve JumpStart Foundation Models.

tunib

TUNiB’s emotional chatbots with Friendli Dedicated Endpoints

TUNiB’s emotional chatbot services are earning accolades with Friendli Dedicated Endpoints - FriendliAI’s managed service for serving LLM.

Problem

Managing multiple AI models incurs significant time and costs.

Solution

Use Friendli Dedicated Endpoints for various models.

Result

Convenience and dependable service without the need for self-management.

scatter lab

Lee Luda (이루다) 2.0 blooms with Friendli Engine

Scatter Lab’s renewed chatbot service is accepting praise with Friendli Engine, FriendliAI’s LLM serving engine that speeds up generative AI.

Problem

Quality and size of generative model comes with its own cost.

Solution

Use Friendli Engine for Lee Luda 2.0.

Result

Reliable service with much improved efficiency.


1. Based on published pricing November 8th, 2023, comparing Open AI GPT-3.5-Turboto Llama-2-13B using Friendli Serverless Endpoints. Assumes equal number of input and output tokens.
2. Testing conducted by FriendliAI in October 2023 using Llama-2-13B running on Friendli Engine. See the detailed results and methodologyhere.
3. Testing conducted by FriendliAI in October 2023 using Llama-2-70B running on Friendli Engine. See the detailed results and methodologyhere.