January 4, 2024
2 min read

Friendli Serverless Endpoints: Unleashing Generative AI for Everyone

FriendliAI, the world’s leading generative AI engine company, has launched its Friendli Serverless Endpoints, unlocking a new era of accessibility for generative AI inference. Users can access open-source generative AI models with simple API calls with the lowest cost on the market. This innovative service brings the power of Friendli Inference, our GPU-optimized inference engine, to anyone, regardless of their technical expertise.

Say goodbye to deployment headaches: Gone are the days of wrestling with infrastructure and optimizing models on complex GPU machines. Friendli Serverless Endpoints takes care of everything, allowing you to harness the transformative potential of generative AI models right within your applications.

Who is it for? Whether you're:

A curious developer eager to experiment with cutting-edge LLMs like Llama-2 and image creation models like Stable Diffusion,
A product manager seeking to integrate text generation or image creation into your product, or
A researcher exploring preliminary LLM features before diving into deep-dive fine-tuning,

Friendli Serverless Endpoints provides the perfect platform to unlock the magic of generative AI.

No more barriers: Friendli Serverless Endpoints removes the technical hurdles that often block the adoption of generative AI. You no longer need to worry about setting up the infrastructure, optimizing the model serving, or even choosing the right GPU. Simply connect your application to Friendli's secure endpoints and start weaving generative AI magic into your workflow with the lowest cost on the market.

Power under the hood: While Friendli Serverless Endpoints simplifies your experience, Friendli Inference, the beating heart of the service, delivers unparalleled inference serving performance and cost-efficiency.

Reduced costs: $0.2/M tokens for Llama-2 13B and$ 0.8/M tokens for Llama-2 70B, thanks to Friendli Inference.
Low latency: 2-4x faster compared to other leading solutions that use vLLM, ensuring a smooth and responsive generative AI experience.

Open doors to diverse models: Get started with a curated selection of popular open-source models including:

Large language models: Llama-2 and Llama-2-chat (13B and 70B), Mistral 7B and Mistral-7B-instruct
Visual models: Stable Diffusion v1.5
And more models soon to come!

Diverse models are supported in Friendli Serverless Endpoints including Llama-2, Llama-2-Chat, Mistral 7B, Mistral-7B-instruct, Stable Diffusion v1.5, and more-FriendliAI

More choices, more power: While Friendli Serverless Endpoints democratizes generative AI, FriendliAI also offers Friendli Dedicated Endpoints for advanced users. This premium service provides dedicated GPU instances, allowing you to serve your customized models reliably with high performance and low costs.

Start your generative journey today: Friendli Serverless Endpoints is the key to unlocking the incredible potential of generative AI. Sign up today and start building applications that leverage the power of language, image, and code without getting bogged down in technical complexities.

The future is generative. With FriendliAI, it's more accessible than ever.

Written by

FriendliAI Tech & Research

General FAQ

What is FriendliAI?

FriendliAI is a GPU-inference platform that lets you deploy, scale, and monitor large language and multimodal models in production, without owning or managing GPU infrastructure. We offer three things for your AI models: Unmatched speed, cost efficiency, and operational simplicity. Find out which product is the best fit for you in here.

How does FriendliAI help my business?

Our Friendli Inference allows you to squeeze more tokens-per-second out of every GPU. Because you need fewer GPUs to serve the same load, the true metric—tokens per dollar—comes out higher even if the hourly GPU rate looks similar on paper. View pricing

Which models and modalities are supported?

Over 380,000 text, vision, audio, and multi-modal models are deployable out of the box. You can also upload custom models or LoRA adapters. Explore models

Can I deploy models from Hugging Face directly?

Yes. A one-click deploy by selecting “Friendli Endpoints” on the Hugging Face Hub will take you to our model deployment page. The page provides an easy-to-use interface for setting up Friendli Dedicated Endpoints, a managed service for generative AI inference. Learn more about our Hugging Face partnership

Still have questions?

If you want a customized solution for that key issue that is slowing your growth, contact@friendli.ai or click Contact Sales — our experts (not a bot) will reply within one business day.

January 12, 2024
3 min read

LLM Serving Engine Comparative Analysis: Friendli Inference vs. vLLM vs. TensorRT-LLM

Llama

LLM Inference

Benchmarks

December 11, 2023
3 min read

Groundbreaking Performance of the Friendli Inference for LLM Serving on an NVIDIA H100 GPU

Friendli Serverless Endpoints: Unleashing Generative AI for Everyone

General FAQ

Related Posts

LLM Serving Engine Comparative Analysis: Friendli Inference vs. vLLM vs. TensorRT-LLM

Groundbreaking Performance of the Friendli Inference for LLM Serving on an NVIDIA H100 GPU