Serverless messages

Authorizations

Authorization

string

header

required

When using Friendli Suite API for inference requests, you need to provide a Friendli Token for authentication and authorization purposes.

For more detailed information, please refer here.

Headers

X-Friendli-Team

string | null

ID of team to run requests as (optional parameter).

Body

application/json

messages

MessagesInputMessage · object[]

required

A list of conversation messages ordered from oldest to newest. Must contain at least one item.

Minimum array length: 1

Hide child attributes

messages.role

enum<string>

required

Author role for this message turn. Use user for user input and assistant for prior assistant turns.

Available options:

user,

assistant

messages.content

required

Message payload. Supports plain string shorthand or a typed block array.

Example:

[
  {
    "content": "Explain top_p in one sentence.",
    "role": "user"
  }
]

model

string

required

Code of the model to use. See available model list.

Example:

"meta-llama/Llama-3.1-8B-Instruct"

max_tokens

integer | null

Maximum number of tokens to generate for the assistant response. Must be greater than 0 when provided.

Required range: x >= 1

system

Payload format for the top-level system instruction.

stream

boolean | null

Whether to stream output as server-sent events (text/event-stream). When false or omitted, returns a single JSON response.

temperature

number | null

Sampling temperature. Lower values make outputs more deterministic; higher values increase diversity.

top_p

number | null

Nucleus sampling parameter. The model samples from the smallest token set whose cumulative probability reaches top_p.

top_k

integer | null

Limits sampling to the k most likely tokens at each decoding step.

stop_sequences

string[] | null

Stop strings that terminate generation when matched in output. The matched value is returned in stop_sequence when applicable.

tools

MessagesToolDefinition · object[] | null

Tool definitions available to the model. Use this to allow tool calls with structured arguments.

Hide child attributes

tools.name

string

required

Tool identifier used in model tool-call outputs. Keep this name stable and unique within the request.

tools.type

string | null

Tool category. If missing, empty, or custom, it is converted to a function tool. Any non-empty value other than custom is treated as a server-side tool and skipped.

tools.description

string | null

Natural-language description of what the tool does and when to call it.

tools.input_schema

Input Schema · object

JSON Schema describing tool arguments. The model uses this schema to construct the tool input object.

tools.strict

boolean | null

If true, applies stricter schema adherence when generating tool arguments.

tool_choice

MessagesToolChoice · object

Controls tool-calling behavior (auto, any, tool, none) and optional parallel-call behavior.

Hide child attributes

tool_choice.type

enum<string>

required

Tool-calling mode. auto lets the model decide, any requires at least one tool call, tool forces a named tool, and none disables tool calls.

Available options:

auto,

any,

tool,

none

tool_choice.name

string | null

Tool name to force when type=tool. Ignored for other type values.

tool_choice.disable_parallel_tool_use

boolean | null

If true, restricts the model to at most one tool call at a time. Must not be provided when type=none.

thinking

MessagesThinkingConfig · object

Controls reasoning behavior with mode (enabled, disabled, adaptive). enabled requires budget_tokens; disabled and adaptive must not include it.

Hide child attributes

thinking.type

enum<string>

required

Reasoning mode. enabled forces reasoning, disabled suppresses it, and adaptive lets the system decide.

Available options:

enabled,

disabled,

adaptive

thinking.budget_tokens

integer | null

Reasoning token budget. Required when type=enabled; must not be provided when type=disabled or type=adaptive. If max_tokens is set, this must be smaller than max_tokens.

Required range: x >= 1

output_config

MessagesOutputConfig · object

Output generation options including effort level and structured output format settings.

Hide child attributes

output_config.effort

enum<string> | null

Relative generation effort level (low, medium, high, max). Higher effort can improve quality on harder prompts at the cost of additional compute.

Available options:

low,

medium,

high,

max

output_config.format

MessagesOutputFormat · object

Structured output settings. Currently supports only json_schema.

Hide child attributes

output_config.format.type

string | null

Output format type. Set to json_schema to constrain output to a JSON schema.

Allowed value: "json_schema"

output_config.format.schema

Schema · object

JSON Schema object applied when type is json_schema.

cache_control

Cache Control · object