June 20, 2024
4 min read

Deploying Weights & Biases Model Checkpoints on Friendli Dedicated Endpoints

In this blog post, we'll be exploring our new exciting integration feature between Weights & Biases (W&B) and Friendli Dedicated Endpoints. For those who may not be familiar with the services, Friendli Dedicated Endpoints is our SaaS offering for deploying generative AI models on the Friendli Inference, the fastest LLM serving engine on the market, while W&B is a leading MLOps platform especially for machine learning experiments. W&B provides the tools to enable machine learning engineers and data scientists to build LLM models faster. Together, Friendli Dedicated Endpoints and W&B offer developers with a powerful end-to-end solution to build LLM models with confidence, and easily deploy them using the Friendli Inference.

Managing Model Artifacts with the W&B Platform

Before diving into the integration, let's first take a moment to discuss the W&B artifacts. Artifacts are a key feature of W&B, serving as a central repository for all your machine learning experiments. They store not only the final model but also all the datasets, and metadata associated with each experiment. This versioning and easy sharing capability make W&B artifacts invaluable assets for data scientists and machine learning engineers. Using W&B artifacts offers several advantages, including versioning, easy sharing, and collaboration. By storing all experiment data in a single location, W&B enables users to quickly access and compare the different versions of models, making it easier to reproduce the experiments, track progress and identify the trends among the experiments.

Easy Model Deployment on Friendli Dedicated Endpoints

Now let's turn our attention to the Friendli Suite, a versatile platform for model building and serving. Friendli Dedicated Endpoints enable users to easily deploy models for inference at scale with a few simple clicks, ensuring fast and reliable responses for your custom generative LLM models of your choice. All you need to do is select the GPU hardware and the LLM model that you wish to serve, and Friendli Dedicated Endpoints will orchestrate the rest of the work to create and manage an endpoint that accepts inference requests.

Step-by-Step Guide to Deploying Your W&B Model Artifacts on Friendli Dedicated Endpoint

W&B integration with Friendli Dedicated Endpoints

To fully unlock the potential of both platforms, W&B and Friendli have joined forces to enable users to create dedicated endpoints directly from the W&B artifacts, in order to easily deploy models straight out from your experimental workspaces. In this section, we'll walk you step by step through the integration process.

Before we begin, be sure to have the following prerequisites in place:

You need an account at the cloud-hosted W&B platform. Also you need an API key for uploading the model artifacts and integrating your W&B account on the Friendli Suite, which you can obtain here.
You need an account for the Friendli Suite.

Uploading your model as an W&B artifact

Uploading your model as an W&B artifact is easy. In the script for your model training, simply add a line of code logging your model artifact within your W&B run.

python
import wandb

# Initialize a new W&B run to upload the model
run = wandb.init(project="friendli-quickstart", job_type="model-training")

# Suppose your training code goes here and trained model files are stored in ./model/,
# Log the model artifact to save it as an output of this run
run.log_artifact("./model/", name="my-model", type="model")
wandb.finish()

Before executing the script, make sure that you are logged in using the wandb login command with the W&B API key.

You can locate your model artifact and the associated metadata in the W&B web app. For production models, we suggest publishing the artifact to the W&B Model Registry.

Artifact

Locating the W&B artifact in Model Registry

Launching a Friendli Dedicated Endpoint using your W&B Model Artifact

Before launching the endpoint, you should add the Weights & Biases account integration to Friendli Suite. Visit your account settings and add the W&B API key.

Within your Friendli Suite project, you will be able to launch a dedicated endpoint by providing the full name of the W&B model artifact of your choice.

Artifact (FDE)

Using the W&B model artifact in FDE

After initializing the endpoint, you can easily test the deployed endpoint through the playground interface. For general usages, it is recommended to request for an inference task through the API using the provided Endpoint URL and the Endpoint ID. To correctly use the API, enter the Endpoint ID in the model parameter within your inference requests.

To wrap up, the integration between Friendli Dedicated Endpoints with W&B Artifact enables for a streamlined, quick and easy deployment of your trained models. To briefly recap, you can configure your W&B API key within the Friendli Suite account to access your model artifacts. After setting up the connection, you can launch a Friendli Dedicated Endpoint using W&B model artifact. After then, you can sit back as your model runs and processes inference requests on autopilot. Through this, you can leverage the rich benefits of W&B artifacts, including versioning, easy sharing, and collaboration, while launching the deployment in a production-ready environment with just a few clicks on Friendli Dedicated Endpoints, straight out of your experiments, without the need for manual exporting and importing of the model files.

Looks interesting? Try deploying your W&B artifacts on Friendli Dedicated Endpoints today.

Written by

FriendliAI Tech & Research

General FAQ

What is FriendliAI?

FriendliAI is a GPU-inference platform that lets you deploy, scale, and monitor large language and multimodal models in production, without owning or managing GPU infrastructure. We offer three things for your AI models: Unmatched speed, cost efficiency, and operational simplicity. Find out which product is the best fit for you in here.

How does FriendliAI help my business?

Our Friendli Inference allows you to squeeze more tokens-per-second out of every GPU. Because you need fewer GPUs to serve the same load, the true metric—tokens per dollar—comes out higher even if the hourly GPU rate looks similar on paper. View pricing

Which models and modalities are supported?

Over 380,000 text, vision, audio, and multi-modal models are deployable out of the box. You can also upload custom models or LoRA adapters. Explore models

Can I deploy models from Hugging Face directly?

Yes. A one-click deploy by selecting “Friendli Endpoints” on the Hugging Face Hub will take you to our model deployment page. The page provides an easy-to-use interface for setting up Friendli Dedicated Endpoints, a managed service for generative AI inference. Learn more about our Hugging Face partnership

Still have questions?

If you want a customized solution for that key issue that is slowing your growth, contact@friendli.ai or click Contact Sales — our experts (not a bot) will reply within one business day.