Skip to main content
This tutorial will guide you through Friendli Dedicated Endpoints, allowing you to deploy custom or open-source AI models on dedicated GPU hardware with full control and zero infrastructure overhead. Whether you’re fine-tuning your own model or scaling a production workload, get ready to experience high-performance inference powered by Friendli’s optimized serving engine.

What are Friendli Dedicated Endpoints

  • Powered by the Friendli Engine: Serve models effortlessly with the Friendli Engine, our patented GPU-optimized serving technology. Friendli Dedicated Endpoints automatically orchestrate resources for high-performance inference.
  • Bring Your Own Model: Run your own model or choose any available model from Hugging Face and Weights & Biases.
  • Dedicated Resources: Select the GPU type for your workload. Each instance is fully dedicated to your model.
  • Reliable at Scale: Trusted by leading companies, Friendli Dedicated Endpoints deliver robust performance for production workloads.
  • Per-second Billing: Pay only for the time your model runs. No manual optimization required — Friendli handles efficiency for you.

Getting started

  1. Sign Up: Create a Friendli Suite account with free credits.
  2. Choose Your Model: Upload your own or choose one from Hugging Face and Weights & Biases.
  3. Launch an Instance: Select the perfect GPU for your model.
  4. Get Your Endpoint Address: Use it to send requests to your model.
  5. Send Your Input: Prompt your model and receive responses.
Friendli Dedicated Endpoints is more than just an AI serving platform — it provides a reliable, high-performance, and cost-efficient way to run your own models. Explore more in our documentation:

Additional resources

Last modified on April 20, 2026