Skip to main content
POST
/
serverless
/
v1
/
messages
Messages
curl --request POST \
  --url https://api.friendli.ai/serverless/v1/messages \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "max_tokens": 128,
  "messages": [
    {
      "content": "Hello, summarize what you can do in one sentence.",
      "role": "user"
    }
  ],
  "model": "meta-llama-3.1-8b-instruct"
}
'
{
  "id": "msg_4b71d12c86d94e719c7e3984a7bb7941",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "I can answer questions, generate text, and help with coding tasks."
    }
  ],
  "stop_reason": "end_turn",
  "usage": {
    "input_tokens": 17,
    "output_tokens": 14,
    "cache_read_input_tokens": 0
  },
  "model": "meta-llama-3.1-8b-instruct"
}
Send Anthropic Messages-style JSON. Detailed request/response field descriptions are provided in the OpenAPI schema below on this page. See available models at this pricing table.
The Messages API may not be supported by all models available on Serverless Endpoints.
To request successfully, it is mandatory to enter a Friendli Token (e.g. flp_XXX) value in the Bearer Token field. Refer to the authentication section on our introduction page to learn how to acquire this variable and visit here to generate your token. When streaming mode is used (i.e., stream option is set to true), the response is in MIME type text/event-stream. Otherwise, the content type is application/json. You can view the schema of the streamed sequence of chunk objects in streaming mode here.
Server-side tools are not supported in the Messages API. In tools, only custom/client function tools are used; non-custom tool types are ignored.
This API is currently in Beta. While we strive to provide a stable and reliable experience, this feature is still under active development. As a result, you may encounter unexpected behavior or limitations. We encourage you to provide feedback to help us improve the feature before its official release.

Authorizations

Authorization
string
header
required

When using Friendli Suite API for inference requests, you need to provide a Friendli Token for authentication and authorization purposes.

For more detailed information, please refer here.

Headers

X-Friendli-Team
string | null

ID of team to run requests as (optional parameter).

Body

application/json
messages
MessagesInputMessage · object[]
required

A list of conversation messages ordered from oldest to newest. Must contain at least one item.

Minimum array length: 1
Example:
[
{
"content": "Explain top_p in one sentence.",
"role": "user"
}
]
model
string
required

Code of the model to use. See available model list.

Example:

"meta-llama-3.1-8b-instruct"

max_tokens
integer | null

Maximum number of tokens to generate for the assistant response. Must be greater than 0 when provided.

Required range: x >= 1
system

Payload format for the top-level system instruction.

stream
boolean | null

Whether to stream output as server-sent events (text/event-stream). When false or omitted, returns a single JSON response.

temperature
number | null

Sampling temperature. Lower values make outputs more deterministic; higher values increase diversity.

top_p
number | null

Nucleus sampling parameter. The model samples from the smallest token set whose cumulative probability reaches top_p.

top_k
integer | null

Limits sampling to the k most likely tokens at each decoding step.

stop_sequences
string[] | null

Stop strings that terminate generation when matched in output. The matched value is returned in stop_sequence when applicable.

tools
MessagesToolDefinition · object[] | null

Tool definitions available to the model. Use this to allow tool calls with structured arguments.

tool_choice
MessagesToolChoice · object

Controls tool-calling behavior (auto, any, tool, none) and optional parallel-call behavior.

thinking
MessagesThinkingConfig · object

Controls reasoning behavior with mode (enabled, disabled, adaptive). enabled requires budget_tokens; disabled and adaptive must not include it.

output_config
MessagesOutputConfig · object

Output generation options including effort level and structured output format settings.

cache_control
Cache Control · object

Compatibility field accepted for request portability. Parsed but not used for generation.

container
Container · object

Compatibility field accepted for request portability. Parsed but not used for generation.

context_manager
Context Manager · object

Compatibility field accepted for request portability. Parsed but not used for generation.

inference_geo
Inference Geo · object

Compatibility field accepted for request portability. Parsed but not used for generation.

metadata
Metadata · object

Compatibility field accepted for request portability. Parsed but not used for generation.

service_tier

Compatibility field accepted for request portability. Parsed but not used for generation.

Response

Successfully generated a Messages-style response. For streaming (text/event-stream) event and chunk details, see Messages chunk object.

id
string
required

Unique identifier for this message response.

type
string
required

Response object type (message).

Allowed value: "message"
role
string
required

Role of the output message author (assistant).

Allowed value: "assistant"
content
(Thinking · object | Text · object | Tool Use · object)[]
required

Assistant output blocks in generation order.

usage
MessagesUsage · object
required

Token usage details for this response.

stop_reason
enum<string> | null

Why generation stopped (end_turn, max_tokens, tool_use, stop_sequence).

Available options:
end_turn,
max_tokens,
tool_use,
stop_sequence
stop_sequence
string | null

Matched stop string when stop_reason is stop_sequence; otherwise null.

model
string | null

Model used to generate the response.