Dedicated update endpoint

Update endpoint spec

curl --request PUT \
  --url https://api.friendli.ai/dedicated/beta/endpoint/{endpoint_id} \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "name": "<string>",
  "advanced": {
    "tokenizer_skip_special_tokens": true,
    "tokenizer_add_special_tokens": true,
    "max_batch_size": 123,
    "max_token_count": 2560,
    "enable_content_logging": true,
    "max_input_length": 123
  },
  "simplescale": {
    "replicas": 2
  },
  "autoscalingPolicy": {
    "minReplica": 0,
    "maxReplica": 1,
    "cooldownPeriod": 300
  },
  "hfModelRepo": "<string>",
  "hfModelRepoRevision": "<string>",
  "newVersionComment": "<string>",
  "instanceOptionId": "<string>"
}
'

{
  "name": "endpoint-name",
  "gpuType": "NVIDIA H100",
  "numGpu": 1,
  "instanceId": "instance-id",
  "projectId": "project-id",
  "creatorId": "creator-id",
  "teamId": "team-id",
  "autoscalingMin": 0,
  "autoscalingMax": 1,
  "autoscalingCooldown": 300,
  "maxBatchSize": 10,
  "maxInputLength": 1024,
  "tokenizerSkipSpecialTokens": true,
  "tokenizerAddSpecialTokens": true,
  "currReplicaCnt": 1,
  "desiredReplicaCnt": 1,
  "updatedReplicaCnt": 1
}

PUT

dedicated

beta

endpoint

{endpoint_id}

Update endpoint spec

curl --request PUT \
  --url https://api.friendli.ai/dedicated/beta/endpoint/{endpoint_id} \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "name": "<string>",
  "advanced": {
    "tokenizer_skip_special_tokens": true,
    "tokenizer_add_special_tokens": true,
    "max_batch_size": 123,
    "max_token_count": 2560,
    "enable_content_logging": true,
    "max_input_length": 123
  },
  "simplescale": {
    "replicas": 2
  },
  "autoscalingPolicy": {
    "minReplica": 0,
    "maxReplica": 1,
    "cooldownPeriod": 300
  },
  "hfModelRepo": "<string>",
  "hfModelRepoRevision": "<string>",
  "newVersionComment": "<string>",
  "instanceOptionId": "<string>"
}
'

{
  "name": "endpoint-name",
  "gpuType": "NVIDIA H100",
  "numGpu": 1,
  "instanceId": "instance-id",
  "projectId": "project-id",
  "creatorId": "creator-id",
  "teamId": "team-id",
  "autoscalingMin": 0,
  "autoscalingMax": 1,
  "autoscalingCooldown": 300,
  "maxBatchSize": 10,
  "maxInputLength": 1024,
  "tokenizerSkipSpecialTokens": true,
  "tokenizerAddSpecialTokens": true,
  "currReplicaCnt": 1,
  "desiredReplicaCnt": 1,
  "updatedReplicaCnt": 1
}

Update a Dedicated Endpoint deployment with new configuration. To request successfully, it is mandatory to enter a Friendli Token (e.g. flp_XXX) value in the Bearer Token field. Refer to the authentication section on our introduction page to learn how to acquire this variable and visit here to generate your token.

This API is currently in Beta. While we strive to provide a stable and reliable experience, this feature is still under active development. As a result, you may encounter unexpected behavior or limitations. We encourage you to provide feedback to help us improve the feature before its official release.

Authorizations

Authorization

string

header

required

When using Friendli Suite API for inference requests, you need to provide a Friendli Token for authentication and authorization purposes.

For more detailed information, please refer here.

Headers

X-Friendli-Team

string | null

ID of team to run requests as (optional parameter).

Path Parameters

endpoint_id

string

required

The ID of the endpoint

Body

application/json

Dedicated endpoint update request.

name

string | null

The name of the endpoint.

advanced

EndpointAdvancedConfig · object

Endpoint advanced config.

Show child attributes

simplescale

EndpointSimplescaleConfig · object

Simple scaling options.

Show child attributes

autoscalingPolicy

AutoscalingPolicy · object

Autoscaling policy.

Show child attributes

hfModelRepo

string | null

HF ID of the model.

hfModelRepoRevision

string | null

HF commit hash of the model.

newVersionComment

string | null

Comment for the new version.

instanceOptionId

string | null

The ID of the instance option.

Response

Successfully updated the endpoint specification.

Dedicated endpoint specification.

name

string

required

The name of the endpoint.

gpuType

string

required

The type of GPU to use for the endpoint.

numGpu

integer

required

The number of GPUs to use per replica.

projectId

string

required

The ID of the project that owns the endpoint.

creatorId

string

required

The ID of the user who created the endpoint.

teamId

string

required

The ID of the team that owns the endpoint.

autoscalingMin

integer

required

The minimum number of replicas to maintain.

autoscalingMax

integer

required

The maximum number of replicas allowed.

autoscalingCooldown

integer

required

The cooldown period in seconds between scaling operations.

maxBatchSize

integer

required

The maximum batch size for inference requests.

tokenizerSkipSpecialTokens

boolean

required

Whether to skip special tokens in tokenizer output.

tokenizerAddSpecialTokens

boolean

required

Whether to add special tokens in tokenizer input.

instanceId

string | null

The ID of the instance.

maxInputLength

integer | null

The maximum allowed input length.

currReplicaCnt

integer | null

The current number of replicas.

desiredReplicaCnt

integer | null

The desired number of replicas.

updatedReplicaCnt

integer | null

The updated number of replicas.

Create endpoint from W&B artifact Terminate an endpoint

⌘I

API Reference

Dedicated

Serverless

Container

Dataset & File

Friendli SDK

Dedicated update endpoint

Authorizations

Headers

Path Parameters

Body

Response