Rollout and Rollback Endpoints Without Downtime

The versioning feature in Friendli Dedicated Endpoints helps you manage all changes to your deployed endpoints safely and transparently. When you update the configuration—like changing the model, engine settings, or autoscaling—a new version is created instead of replacing the current one.

Each version captures a full snapshot of the deployment, including:

  • Model name and artifact source
  • Accelerator type and count
  • Autoscaling and engine settings
  • Metadata (creator, timestamps, comments)

Why Use Versioning?

  • Zero-Downtime Updates: Safely apply changes while the current version continues to serve traffic.

  • One-Click Rollbacks: Instantly revert to a previous stable configuration if issues occur.

  • Easy-to-Follow History: Each version shows who made the change, when it was made, and what was changed. This makes audits and debugging easier.

How to Use Versioning

  1. Initial Deployment: Deploy your model for the first time via the platform or webhook. This creates version v0.
  1. Apply Configuration Updates: Changing any setting—such as model, accelerator type, or autoscaling—triggers a new version (v1, v2, etc.).

  2. Browse Version History: View the full version list by clicking on the “Versions” tab on the endpoint detail page. You’ll see which version is current or in progress.

  1. View Configuration Details: Click “View configs” to see a version’s full settings. You can see the updates from the previous version marked with a blue badge for easy comparison.

How to Rollback to a Previous Version

To rollback, select a previous version from the version history and click “Rollback”.

The system creates a new version (vN+1) using the selected version’s settings. This new version will become the current one, allowing you to quickly revert to a known good state.

When an Update Fails

Update failures can occur due to various reasons, such as:

  • Configuration Errors: Invalid settings or unsupported configurations can prevent the update.
  • Resource Limitations: Insufficient resources (like GPU availability) can block the update.
  • Network Issues: Temporary network problems can interrupt the update process.

When you attempt to update an endpoint and the process fails, the system will not automatically apply the changes.

Instead, it will log the error and allow you to troubleshoot the issue without affecting the live endpoint. This ensures that your endpoint remains operational without disruption.