Dedicated Get Endpoint Version

{ "data": { "0": { "name": "endpoint-name", "gpuType": "NVIDIA H100", "numGpu": 1, "instanceId": "instance-id", "projectId": "project-id", "creatorId": "creator-id", "teamId": "team-id", "autoscalingMin": 0, "autoscalingMax": 1, "autoscalingCooldown": 300, "maxBatchSize": 10, "maxInputLength": 1024, "tokenizerSkipSpecialTokens": true, "tokenizerAddSpecialTokens": true, "currReplicaCnt": 1, "desiredReplicaCnt": 1, "updatedReplicaCnt": 1 } } }

Authorizations

Authorization

string

header

required

When using Friendli Suite API for inference requests, you need to provide a Friendli Token for authentication and authorization purposes.

For more detailed information, please refer here.

Headers

X-Friendli-Team

string | null

ID of team to run requests as (optional parameter).

Path Parameters

endpoint_id

string

required

The ID of the endpoint

Query Parameters

cursor

file | null

Cursor for pagination

limit

integer | null

default:20

Limit of items per page

Required range: 1 <= x <= 100

Response

Successfully retrieved the endpoint version history.

Dedicated endpoint version history response.

data

Data · object

The response data containing endpoint versions.

Hide child attributes

data.{key}

DedicatedEndpointSpec · object

Dedicated endpoint specification.

Hide child attributes

data.{key}.name

string

required

The name of the endpoint.

data.{key}.gpuType

string

required

The type of GPU to use for the endpoint.

data.{key}.numGpu

integer

required

The number of GPUs to use per replica.

data.{key}.projectId

string

required

The ID of the project that owns the endpoint.

data.{key}.creatorId

string

required

The ID of the user who created the endpoint.

data.{key}.teamId

string

required

The ID of the team that owns the endpoint.

data.{key}.autoscalingMin

integer

required

The minimum number of replicas to maintain.

data.{key}.autoscalingMax

integer

required

The maximum number of replicas allowed.

data.{key}.autoscalingCooldown

integer

required

The cooldown period in seconds between scaling operations.

data.{key}.maxBatchSize

integer

required

The maximum batch size for inference requests.

data.{key}.tokenizerSkipSpecialTokens

boolean

required

Whether to skip special tokens in tokenizer output.

data.{key}.tokenizerAddSpecialTokens

boolean

required

Whether to add special tokens in tokenizer input.

data.{key}.instanceId

string | null

The ID of the instance.

data.{key}.maxInputLength

integer | null

The maximum allowed input length.

data.{key}.maxContextLength

integer | null

The maximum context length (input + output tokens) to serve with.

data.{key}.currReplicaCnt

integer | null

The current number of replicas.

data.{key}.desiredReplicaCnt

integer | null

The desired number of replicas.

data.{key}.updatedReplicaCnt

integer | null

The updated number of replicas.

nextCursor

file | null

The next cursor for pagination.

API Reference

Model APIs

Dedicated

Container

Administration

Dataset and File

Friendli SDK

Dedicated Get Endpoint Version

Authorizations

Headers

Path Parameters

Query Parameters

Response