(function() { var utmInheritingDomain = "appstore.com", utmRegExp = /(&|\?)utm_[A-Za-z]+=[A-Za-z0-9]+/gi, links = document.getElementsByTagName("a"), utms = [ "utm_medium={{URL - utm_medium}}", "utm_source={{URL - utm_source}}", "utm_campaign={{URL - utm_campaign}}" ]; for (var index = 0; index < links.length; index += 1) { var tempLink = links[index].href, tempParts; if (tempLink.indexOf(utmInheritingDomain) > 0) { tempLink = tempLink.replace(utmRegExp, ""); tempParts = tempLink.split("#"); if (tempParts[0].indexOf("?") < 0 ) { tempParts[0] += "?" + utms.join("&"); } else { tempParts[0] += "&" + utms.join("&"); } tempLink = tempParts.join("#"); } links[index].href = tempLink; } }());

Fast and affordable API
for open-source LLMs and LMMs:
Friendli Serverless Endpoints


Try serverless endpoints for blazing fast response with powerful built-in tools

Sign up for free

250 tokens/sec at $0.1/1M tokens

Serverless Endpoints delivers output tokens at a staggering 250 tokens per second with per-token billing as low as $0.1 per million tokens for the Llama 3.1 8B model.

Supports 128K context length

Build complex applications that require in-depth understanding and context retention on Serverless Endpoints. Our Llama 3.1 endpoints support complete 128K context length handling.

Easily build AI agents with tool-assist

Are you building an AI agent that can search the web, integrate knowledge bases, and solve complex problems using many tools? Serverless Endpoints has it all.

SUPPORTED MODELS

LLAMA 3.1 8B INSTRUCT

LLAMA 3.1 70B INSTRUCT

MIXTRAL 8X7B INSTRUCT V0.1

Stay tuned for new model support

PRICING

Free trial

Sign up

Sign up and get $5 in free trial credits!

Basic

Sign up

Pricing details

Model name

Price per unit

Llama 3.1 8B Instruct

$0.1/1M tokens

Llama 3.1 70B Instruct

$0.6/1M tokens

Mixtral 8x7B Instruct v0.1

$0.4/1M tokens