Build Smarter Agents with Nemotron 3 Nano Omni on FriendliAI — Explore models
curl --request POST \
--url https://api.friendli.ai/dedicated/beta/endpoint \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data '
{
"projectId": "<string>",
"name": "<string>",
"instanceOptionId": "<string>",
"advanced": {
"tokenizer_skip_special_tokens": true,
"tokenizer_add_special_tokens": true,
"max_batch_size": 123,
"max_token_count": 2560,
"enable_content_logging": true,
"max_input_length": 123
},
"hfModelRepo": "<string>"
}
'{
"status": "INITIALIZING",
"createdAt": "2025-01-01T00:00:00Z",
"updatedAt": "2025-01-01T00:00:00Z",
"phase": "DOWNLOADING_MODEL"
}Create a Friendli Dedicated Endpoint deployment for a Hugging Face model via the API. Specify GPU type, replica count, and model configuration.
curl --request POST \
--url https://api.friendli.ai/dedicated/beta/endpoint \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data '
{
"projectId": "<string>",
"name": "<string>",
"instanceOptionId": "<string>",
"advanced": {
"tokenizer_skip_special_tokens": true,
"tokenizer_add_special_tokens": true,
"max_batch_size": 123,
"max_token_count": 2560,
"enable_content_logging": true,
"max_input_length": 123
},
"hfModelRepo": "<string>"
}
'{
"status": "INITIALIZING",
"createdAt": "2025-01-01T00:00:00Z",
"updatedAt": "2025-01-01T00:00:00Z",
"phase": "DOWNLOADING_MODEL"
}Create a Dedicated Endpoint deployment for a Hugging Face model. To request successfully, it is mandatory to enter a Personal API Key (e.g. flp_XXX) value in the Bearer Token field. Refer to the authentication section on our introduction page to learn how to acquire this variable and visit here to generate your API Key.Documentation Index
Fetch the complete documentation index at: https://friendli.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
ID of team to run requests as (optional parameter).
Dedicated endpoint create request.
The ID of the project that owns the endpoint.
The name of the endpoint.
The ID of the instance option.
Available options:
ShbPuOs4tfGbmrAHuYt7T40oJkNob0NMdoF3sYH4kHmAcA5PTwD5AqnBSVN0zfTutSiLn0HqlfkRz5G48REcGUA4qYFmsYz8LnK1wTaKc7WOTu6GjBnfHPe4OhTzYtZuomzIahBzWtOuomsI8GiQTLKfJNOrbrTZGIuYgVrsAFoZMFXZnAdDdrbc6G9FxJWZThe advanced configuration of the endpoint.
HF ID of the model.
Autoscaling policy.
Hide child attributes
Setting minReplica to 0 allows the endpoint to sleep when idle, reducing costs. The minimum value is 0.
x >= 0The maximum replicas that the endpoint can scale up to. The maximum value is 10.
x <= 10Determines how long the endpoint waits before scaling down after the last request.
HF commit hash of the model.
The comment for the initial version.
Successfully created the endpoint.
Dedicated endpoint status.
The current status of the endpoint deployment.
UNKNOWN, INITIALIZING, RUNNING, UPDATING, SLEEPING, AWAKING, FAILED, STOPPING, TERMINATING, TERMINATED, READY When the endpoint was created.
ErrorCode type.
WORKLOAD_INIT_UNKNOWN_ERROR, WORKLOAD_INIT_SETTINGS_ERROR, WORKLOAD_INIT_GRPC_ERROR, WORKLOAD_INIT_MANIFEST_NOT_FOUND_ERROR, WORKLOAD_INIT_MANIFEST_TYPE_ERROR, WORKLOAD_INIT_DOWNLOAD_ERROR, WORKLOAD_INIT_INVALID_TOKEN_ERROR, WORKLOAD_INIT_CANNOT_ACCESS_REPO_ERROR, WORKLOAD_INIT_HF_WANDB_API_ERROR, WORKLOAD_INIT_INSUFFICIENT_DISK_ERROR, INFERENCE_ENGINE_UNKNOWN_ERROR, INFERENCE_ENGINE_INVALID_ARGUMENT_ERROR, INFERENCE_ENGINE_MEMORY_ERROR, INFERENCE_ENGINE_METERING_CLIENT_CONFIG_ERROR When the endpoint was last updated.
The current phase of the endpoint.
REQUESTING_VIRTUAL_MACHINE, DOWNLOADING_MODEL, ENGINE_INITIALIZING