Dedicated Endpoints FAQ and Troubleshooting

Integrations

How do I integrate a Hugging Face account?

Log in to Hugging Face, then navigate to Access Tokens.
Create a new token. You can use a fine-grained token. In this case, make sure the token has view permission for the repository you’d like to use.
Integrate the key in Friendli Suite > Personal Settings > Integrations.

If you revoke / invalidate the key, you will have to update the key to avoid disrupting ongoing deployments, or to launch a new inference deployment.

Using a 3rd-Party Model

How can I use a Hugging Face repository as a model?

Use the repository id of the model. You can select the entry from the list of autocompleted model repositories.
You can select a specific branch, or manually enter a commit hash.

Format Requirements

What are the format requirements for a model?

A model should be in safetensors format.
The model should NOT be nested inside another directory.
Including other arbitrary files (that are not in the list) is totally fine. However, those files will not be downloaded nor used.

Required	Filename	Description
Yes	safetensors	Model weight, e.g. model.safetensors. Use model.safetensors.index.json for split safetensors files
Yes	config.json	Model config that includes the architecture. (Supported Models on Friendli)
No	tokenizer.json	Tokenizer for the model
No	tokenizer_config.json	Tokenizer config. This should be present & have a `chat_template` field for the Friendli Engine to provide chat APIs
No	special_tokens_map.json	Tokenizer’s special tokens to their corresponding token strings

What are the format requirements for a dataset?

The dataset should satisfy the following conditions:

The dataset must contain a column named “messages”.
Each row in the “messages” column should be compatible with the chat template of the base model. For example, tokenizer_config.json of mistralai/Mistral-7B-Instruct-v0.2 is a template that repeats the messages of a user and an assistant. Concretely, each row in the “messages” field should follow a format like: [{"role": "user", "content": "The 1st user's message"}, {"role": "assistant", "content": "The 1st assistant's message"}]. In this case, HuggingFaceH4/ultrachat_200k is a dataset that is compatible with the chat template.

Troubleshooting

Inference Request Errors

Common error codes for inference requests

Below is a table of common error codes you might encounter when making inference-related API requests.

Code	Name	Cause	Suggested Solution
`400`	Bad Request	The request is malformed or missing required fields.	Check your request payload. Ensure it is valid JSON with all required fields.
`401`	Unauthorized	Missing or invalid API key. The request lacks proper authentication.	Include a valid Personal API key in the `Authorization` header. Verify the key is active and correct.
`403`	Forbidden	The API key is valid but does not have permission to access the endpoint.	Ensure your Personal API key has access rights to the endpoint. Use the correct team key or add the `X-Friendli-Team` header if needed.
`404`	Not Found	The specified endpoint or resource does not exist. This typically occurs when the `endpoint_id` or `team_id` is invalid.	Verify the `endpoint_id` and model name in your request. Ensure they match an existing, non-deleted deployment. Also check for typos in your endpoint ID or team ID.
`422`	Unprocessable Entity	The request is syntactically correct but semantically invalid (e.g. exceeding token limits, invalid parameter values).	Adjust your request (e.g. reduce `max_tokens`, correct parameter values) and try again.
`429`	Too Many Requests	You have exceeded rate limits for your plan.	Reduce request frequency or upgrade your plan for higher limits. Wait before retrying after a 429 error.
`500`	Internal Server Error	A server-side error occurred while processing the request.	Retry the request after a short delay. If the error persists, check endpoint health in the overview dashboard or contact FriendliAI support.

Quick Checklist Before Retrying

Verify the endpoint URL, endpoint_id, and (if applicable) X-Friendli-Team header
Include the Authorization header with a valid key
Confirm the target deployment exists, is healthy, and is not deleted
Validate request JSON and required fields; reduce max_tokens if needed
Check rate limits; add retry with backoff when receiving 429

Model Selection Errors

You don't have access to this gated model

The repository is gated. Please follow the steps and gain approval from the owner using Hugging Face Hub.

The repository / artifact is invalid

The model does not meet the requirements. Please check if the model follows a correct safetensors format. See the format requirements for details.

The architecture is not supported

The model architecture is not supported. Please refer to the Supported Models page.

Endpoint Lifecycle

Why was my endpoint suddenly terminated?

Endpoints that remain in a sleep state for 48 hours are automatically terminated.

When min_replicas = 0, the endpoint enters a sleep state after the cooldown period if no requests are received.
A notification is sent after 24 hours of sleep, and the endpoint is terminated after another 24 hours if not reactivated.

This page may not cover all cases. If your issue persists, contact support.

Introduction

Capabilities

Friendli Model APIs

Friendli Dedicated Endpoints

Friendli Container

Friendli Suite Guide

Dedicated Endpoints FAQ and Troubleshooting

Integrations

Using a 3rd-Party Model

Format Requirements