Fast and affordable API
for open-source LLMs and LMMs:Friendli Serverless Endpoints
Try serverless endpoints for blazing fast response with powerful built-in tools
Sign up for free250 tokens/sec at $0.1/1M tokens
Serverless Endpoints delivers output tokens at a staggering 250 tokens per second with per-token billing as low as $0.1 per million tokens for the Llama 3.1 8B model.
Supports 128K context length
Build complex applications that require in-depth understanding and context retention on Serverless Endpoints. Our Llama 3.1 endpoints support complete 128K context length handling.
Easily build AI agents with tool-assist
Are you building an AI agent that can search the web, integrate knowledge bases, and solve complex problems using many tools? Serverless Endpoints has it all.
SUPPORTED MODELS
LLAMA 3.1 8B INSTRUCT
LLAMA 3.1 70B INSTRUCT
MIXTRAL 8X7B INSTRUCT V0.1
Stay tuned for new model support