Build Smarter Agents with Nemotron 3 Nano Omni on FriendliAI — Explore models
curl --request PUT \
--url https://api.friendli.ai/dedicated/beta/endpoint/{endpoint_id} \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data '{}'{
"name": "endpoint-name",
"gpuType": "NVIDIA H100",
"numGpu": 1,
"instanceId": "instance-id",
"projectId": "project-id",
"creatorId": "creator-id",
"teamId": "team-id",
"autoscalingMin": 0,
"autoscalingMax": 1,
"autoscalingCooldown": 300,
"maxBatchSize": 10,
"maxInputLength": 1024,
"tokenizerSkipSpecialTokens": true,
"tokenizerAddSpecialTokens": true,
"currReplicaCnt": 1,
"desiredReplicaCnt": 1,
"updatedReplicaCnt": 1
}Update a Friendli Dedicated Endpoint with a new model, GPU type, or replica count. Changes are applied as a new version in the deployment history.
curl --request PUT \
--url https://api.friendli.ai/dedicated/beta/endpoint/{endpoint_id} \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data '{}'{
"name": "endpoint-name",
"gpuType": "NVIDIA H100",
"numGpu": 1,
"instanceId": "instance-id",
"projectId": "project-id",
"creatorId": "creator-id",
"teamId": "team-id",
"autoscalingMin": 0,
"autoscalingMax": 1,
"autoscalingCooldown": 300,
"maxBatchSize": 10,
"maxInputLength": 1024,
"tokenizerSkipSpecialTokens": true,
"tokenizerAddSpecialTokens": true,
"currReplicaCnt": 1,
"desiredReplicaCnt": 1,
"updatedReplicaCnt": 1
}Update a Dedicated Endpoint deployment with new configuration. To request successfully, it is mandatory to enter a Personal API Key (e.g. flp_XXX) value in the Bearer Token field. Refer to the authentication section on our introduction page to learn how to acquire this variable and visit here to generate your API Key.Documentation Index
Fetch the complete documentation index at: https://friendli.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
ID of team to run requests as (optional parameter).
The ID of the endpoint
Dedicated endpoint update request.
The name of the endpoint.
Endpoint advanced config.
Autoscaling policy.
Hide child attributes
Setting minReplica to 0 allows the endpoint to sleep when idle, reducing costs. The minimum value is 0.
x >= 0The maximum replicas that the endpoint can scale up to. The maximum value is 10.
x <= 10Determines how long the endpoint waits before scaling down after the last request.
HF ID of the model.
HF commit hash of the model.
Comment for the new version.
The ID of the instance option.
Available options:
ShbPuOs4tfGbmrAHuYt7T40oJkNob0NMdoF3sYH4kHmAcA5PTwD5AqnBSVN0zfTutSiLn0HqlfkRz5G48REcGUA4qYFmsYz8LnK1wTaKc7WOTu6GjBnfHPe4OhTzYtZuomzIahBzWtOuomsI8GiQTLKfJNOrbrTZGIuYgVrsAFoZMFXZnAdDdrbc6G9FxJWZSuccessfully updated the endpoint specification.
Dedicated endpoint specification.
The name of the endpoint.
The type of GPU to use for the endpoint.
The number of GPUs to use per replica.
The ID of the project that owns the endpoint.
The ID of the user who created the endpoint.
The ID of the team that owns the endpoint.
The minimum number of replicas to maintain.
The maximum number of replicas allowed.
The cooldown period in seconds between scaling operations.
The maximum batch size for inference requests.
Whether to skip special tokens in tokenizer output.
Whether to add special tokens in tokenizer input.
The ID of the instance.
The maximum allowed input length.
The current number of replicas.
The desired number of replicas.
The updated number of replicas.