Friendli Dedicated Endpoints let you run custom or open-source generative AI models on dedicated GPU hardware — without sharing resources or managing infrastructure.

What are Friendli Dedicated Endpoints?

  • Powered by the Friendli Engine: Serve models effortlessly with the Friendli Engine, our patented GPU-optimized serving technology. Friendli Dedicated Endpoints automatically orchestrate resources for high-performance inference.
  • Bring Your Own Model: Run your own model or choose any available model from HuggingFace and Weights & Biases.
  • Dedicated Resources: Select the GPU type for your workload. Each instance is fully dedicated to your model.
  • Reliable at Scale: Trusted by leading companies, Friendli Dedicated Endpoints deliver robust performance for production workloads.
  • Per-second Billing: Pay only for the time your model runs. No manual optimization required — Friendli handles efficiency for you.

Getting Started:

  1. Sign Up: Create a Friendli Suite account with free credits.
  2. Choose Your Model: Upload your own or choose one from HuggingFace and Weights & Biases.
  3. Launch an Instance: Select the perfect GPU for your model.
  4. Get Your Endpoint Address: Use it to send requests to your model.
  5. Send Your Input: Prompt your model and receive responses.
Friendli Dedicated Endpoints is more than just an AI serving platform — it provides a reliable, high-performance, and cost-efficient way to run your own models. Explore more in our documentation:

Additional Resources: