Friendli Serverless Endpoints: Unleashing Generative AI for Everyone

Friendli Serverless Endpoints: Unleashing Generative AI for Everyone thumbnail

FriendliAI, the world’s leading generative AI engine company, has launched its Friendli Serverless Endpoints, unlocking a new era of accessibility for generative AI inference. Users can access open-source generative AI models with simple API calls with the lowest cost on the market. This innovative service brings the power of Friendli Engine, our GPU-optimized inference engine, to anyone, regardless of their technical expertise.

Say goodbye to deployment headaches: Gone are the days of wrestling with infrastructure and optimizing models on complex GPU machines. Friendli Serverless Endpoints takes care of everything, allowing you to harness the transformative potential of generative AI models right within your applications.

Who is it for? Whether you're:

  • A curious developer eager to experiment with cutting-edge LLMs like Llama-2 and image creation models like Stable Diffusion,
  • A product manager seeking to integrate text generation or image creation into your product, or
  • A researcher exploring preliminary LLM features before diving into deep-dive fine-tuning,

Friendli Serverless Endpoints provides the perfect platform to unlock the magic of generative AI.

No more barriers: Friendli Serverless Endpoints removes the technical hurdles that often block the adoption of generative AI. You no longer need to worry about setting up the infrastructure, optimizing the model serving, or even choosing the right GPU. Simply connect your application to Friendli's secure endpoints and start weaving generative AI magic into your workflow with the lowest cost on the market.

Power under the hood: While Friendli Serverless Endpoints simplifies your experience, Friendli Engine, the beating heart of the service, delivers unparalleled inference serving performance and cost-efficiency.

  • Reduced costs: $0.2/M tokens for Llama-2 13B and $0.8/M tokens for Llama-2 70B, thanks to Friendli Engine.
  • Low latency: 2-4x faster compared to other leading solutions that use vLLM, ensuring a smooth and responsive generative AI experience.

Open doors to diverse models: Get started with a curated selection of popular open-source models including:

  • Large language models: Llama-2 and Llama-2-chat (13B and 70B), Mistral 7B and Mistral-7B-instruct
  • Visual models: Stable Diffusion v1.5
  • And more models soon to come!