This tutorial will guide you through Friendli Dedicated Endpoints, allowing you to deploy custom or open-source AI models on dedicated GPU hardware with full control and zero infrastructure overhead. Whether you’re fine-tuning your own model or scaling a production workload, get ready to experience high-performance inference powered by Friendli’s optimized serving engine.Documentation Index
Fetch the complete documentation index at: https://friendli.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
What are Friendli Dedicated Endpoints
- Powered by the Friendli Engine: Serve models effortlessly with the Friendli Engine, our patented GPU-optimized serving technology. Friendli Dedicated Endpoints automatically orchestrate resources for high-performance inference.
- Bring Your Own Model: Run your own model or choose any available model from Hugging Face and Weights & Biases.
- Dedicated Resources: Select the GPU type for your workload. Each instance is fully dedicated to your model.
- Reliable at Scale: Trusted by leading companies, Friendli Dedicated Endpoints deliver robust performance for production workloads.
- Per-second Billing: Pay only for the time your model runs. No manual optimization required — Friendli handles efficiency for you.
Getting started
- Sign Up: Create a Friendli Suite account with free credits.
- Choose Your Model: Upload your own or choose one from Hugging Face and Weights & Biases.
- Launch an Instance: Select the perfect GPU for your model.
- Get Your Endpoint Address: Use it to send requests to your model.
- Send Your Input: Prompt your model and receive responses.
Additional resources
- FriendliAI website: https://friendli.ai
- FriendliAI blog: https://friendli.ai/blog