June 5, 2025
3 min read

One Click from W&B to FriendliAI: Deploy Models as Live Endpoints

At FriendliAI, our mission is to make AI deployment fast, reliable, and developer-friendly. Today, we’re excited to announce a new integration with Weights & Biases (W&B) that makes transitioning from experimentation to production easier than ever.

With our webhook-based deployment integration, W&B users can now deploy models directly from the W&B Registry to Friendli Dedicated Endpoints—all within the W&B UI.

👉 Explore the step-by-step tutorial here.

Why Integrate Weights & Biases with FriendliAI?

W&B is a go-to platform for tracking experiments and managing model artifacts. By integrating with Friendli Dedicated Endpoints, this workflow now extends into seamless, automated AI model deployment.

Here’s what you gain:

One-click production deployment of model versions—triggered by simple aliasing in W&B
Automated rollouts without custom scripts or DevOps complexity
Idempotent execution with support for idempotencyKey, ensuring conflict-free, exactly-once operations

This integration bridges the gap between training and serving, giving AI teams an end-to-end, production-ready workflow that’s fast, robust, and scalable.

The CI/CD for AI

Integrating FriendliAI with Weights & Biases brings continuous integration and delivery (CI/CD) principles to your machine learning workflow—without the complexity. This setup automates the journey from model development to production deployment, making your AI lifecycle truly end-to-end.

For example, let’s say your team assigns the alias production to the top-performing model in the W&B Registry. With the integration in place, every time that alias is reassigned to a new model version, FriendliAI automatically deploys the updated model to a production-ready endpoint—no manual intervention required.

This level of automation means your AI engineering team can ship models faster, reduce deployment overhead, and operate with greater confidence.

How It Works

Setting up automated model deployment from Weights & Biases to Friendli Dedicated Endpoints takes just a few minutes. Here’s how to get started:

Prerequisites

Before you begin, make sure you have:

A Friendli Suite account with access to Dedicated Endpoints
A personal access token generated from Friendli Suite
Admin access to your W&B team settings

Setup Steps

1. Create a secret

Issue a personal access token in Friendli Suite and add it as a secret under your W&B Team Settings.

2. Configure a webhook

In the W&B UI, set up a webhook that points to the FriendliAI API, using the access token secret for authentication.

Configuring webhook URL and access token in W&B

3. Create an automation

In the W&B Model Registry, define an automation that triggers when an alias is added to a model artifact.

Creating an automation in W&B Model Registry

Configuring webhook payload for W&B automation

4. Trigger deployment

Simply assign an alias (e.g., production) to the model version you want to deploy. Friendli automatically creates or updates the endpoint.

Deploying a model version by assigning an alias in W&B

FriendliAI project generated after W&B deployment

Model rollout status in W&B after alias assignment

5. Manage versions

To roll out updates, reassign the alias to a new model version. The endpoint updates instantly, with no downtime or redeployment steps.

Managing model versions and updating to v1 in W&B

6. Track history

Friendli automatically versions each deployment, giving you full visibility and easy rollback capabilities.

Tracking deployment history in FriendliAI after updating to v1

👉 Follow the full tutorial to get started.

Built for Production-Grade Reliability

Friendli Dedicated Endpoints are engineered for high-performance, low-latency inference at scale—ideal for production AI workloads.

With built-in support for idempotent deployments via idempotencyKey, our webhook integration guarantees that each deployment event is processed exactly once. Even in the case of retries or concurrent updates, you avoid duplicated actions, partial rollouts, and conflicting states.

The result? A deployment pipeline that’s predictable, robust, and production-safe—just what modern AI teams need for fast-changing, mission-critical applications.

Ready to Deploy Smarter?

Whether you're managing experiments or scaling foundation models, this integration simplifies your workflow and accelerates delivery. Make your AI model deployment effortless—so you can solely focus on building.

Have questions or need help setting it up? Contact us—we're happy to assist.

Written by

FriendliAI Tech & Research

General FAQ

What is FriendliAI?

FriendliAI is a GPU-inference platform that lets you deploy, scale, and monitor large language and multimodal models in production, without owning or managing GPU infrastructure. We offer three things for your AI models: Unmatched speed, cost efficiency, and operational simplicity. Find out which product is the best fit for you in here.

How does FriendliAI help my business?

Our Friendli Inference allows you to squeeze more tokens-per-second out of every GPU. Because you need fewer GPUs to serve the same load, the true metric—tokens per dollar—comes out higher even if the hourly GPU rate looks similar on paper. View pricing

Which models and modalities are supported?

Over 380,000 text, vision, audio, and multi-modal models are deployable out of the box. You can also upload custom models or LoRA adapters. Explore models

Can I deploy models from Hugging Face directly?

Yes. A one-click deploy by selecting “Friendli Endpoints” on the Hugging Face Hub will take you to our model deployment page. The page provides an easy-to-use interface for setting up Friendli Dedicated Endpoints, a managed service for generative AI inference. Learn more about our Hugging Face partnership

Still have questions?

If you want a customized solution for that key issue that is slowing your growth, contact@friendli.ai or click Talk to an engineer — our engineers (not a bot) will reply within one business day.