Build Smarter Agents with Nemotron 3 Nano Omni on FriendliAI — Explore models
curl --request POST \
--url https://api.friendli.ai/serverless/v1/messages \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data '
{
"messages": [
{
"content": "Hello, summarize what you can do in one sentence.",
"role": "user"
}
],
"model": "meta-llama/Llama-3.1-8B-Instruct"
}
'{
"id": "msg_4b71d12c86d94e719c7e3984a7bb7941",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "I can answer questions, generate text, and help with coding tasks."
}
],
"stop_reason": "end_turn",
"usage": {
"input_tokens": 17,
"output_tokens": 14,
"cache_read_input_tokens": 0
},
"model": "meta-llama/Llama-3.1-8B-Instruct"
}Use the Anthropic Messages-style API on Friendli Serverless Endpoints. Send structured message payloads and receive assistant responses.
curl --request POST \
--url https://api.friendli.ai/serverless/v1/messages \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data '
{
"messages": [
{
"content": "Hello, summarize what you can do in one sentence.",
"role": "user"
}
],
"model": "meta-llama/Llama-3.1-8B-Instruct"
}
'{
"id": "msg_4b71d12c86d94e719c7e3984a7bb7941",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "I can answer questions, generate text, and help with coding tasks."
}
],
"stop_reason": "end_turn",
"usage": {
"input_tokens": 17,
"output_tokens": 14,
"cache_read_input_tokens": 0
},
"model": "meta-llama/Llama-3.1-8B-Instruct"
}Send Anthropic Messages-style JSON. Detailed request/response field descriptions are provided in the OpenAPI schema below on this page. See available models at this pricing table.Documentation Index
Fetch the complete documentation index at: https://friendli.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
stream option is set to true), the response is in MIME type text/event-stream. Otherwise, the content type is application/json.
You can view the schema of the streamed sequence of chunk objects in streaming mode here.
tools, only custom/client function tools are used; non-custom tool types are ignored.ID of team to run requests as (optional parameter).
A list of conversation messages ordered from oldest to newest. Must contain at least one item.
1Hide child attributes
Author role for this message turn. Use user for user input and assistant for prior assistant turns.
user, assistant Message payload. Supports plain string shorthand or a typed block array.
[
{
"content": "Explain top_p in one sentence.",
"role": "user"
}
]Code of the model to use. See available model list.
"meta-llama/Llama-3.1-8B-Instruct"
Maximum number of tokens to generate for the assistant response. Must be greater than 0 when provided.
x >= 1Payload format for the top-level system instruction.
Whether to stream output as server-sent events (text/event-stream). When false or omitted, returns a single JSON response.
Sampling temperature. Lower values make outputs more deterministic; higher values increase diversity.
Nucleus sampling parameter. The model samples from the smallest token set whose cumulative probability reaches top_p.
Limits sampling to the k most likely tokens at each decoding step.
Stop strings that terminate generation when matched in output. The matched value is returned in stop_sequence when applicable.
Tool definitions available to the model. Use this to allow tool calls with structured arguments.
Hide child attributes
Tool identifier used in model tool-call outputs. Keep this name stable and unique within the request.
Tool category. If missing, empty, or custom, it is converted to a function tool. Any non-empty value other than custom is treated as a server-side tool and skipped.
Natural-language description of what the tool does and when to call it.
JSON Schema describing tool arguments. The model uses this schema to construct the tool input object.
If true, applies stricter schema adherence when generating tool arguments.
Controls tool-calling behavior (auto, any, tool, none) and optional parallel-call behavior.
Hide child attributes
Tool-calling mode. auto lets the model decide, any requires at least one tool call, tool forces a named tool, and none disables tool calls.
auto, any, tool, none Tool name to force when type=tool. Ignored for other type values.
If true, restricts the model to at most one tool call at a time. Must not be provided when type=none.
Controls reasoning behavior with mode (enabled, disabled, adaptive). enabled requires budget_tokens; disabled and adaptive must not include it.
Hide child attributes
Reasoning mode. enabled forces reasoning, disabled suppresses it, and adaptive lets the system decide.
enabled, disabled, adaptive Reasoning token budget. Required when type=enabled; must not be provided when type=disabled or type=adaptive. If max_tokens is set, this must be smaller than max_tokens.
x >= 1Output generation options including effort level and structured output format settings.
Hide child attributes
Relative generation effort level (low, medium, high, max). Higher effort can improve quality on harder prompts at the cost of additional compute.
low, medium, high, max Structured output settings. Currently supports only json_schema.
Compatibility field accepted for request portability. Parsed but not used for generation.
Compatibility field accepted for request portability. Parsed but not used for generation.
Compatibility field accepted for request portability. Parsed but not used for generation.
Compatibility field accepted for request portability. Parsed but not used for generation.
Compatibility field accepted for request portability. Parsed but not used for generation.
Compatibility field accepted for request portability. Parsed but not used for generation.
Successfully generated a Messages-style response. For streaming (text/event-stream) event and chunk details, see Messages chunk object.
Unique identifier for this message response.
Response object type (message).
"message"Role of the output message author (assistant).
"assistant"Assistant output blocks in generation order.
Token usage details for this response.
Hide child attributes
Number of billed input tokens for this request.
x >= 0Number of billed output tokens generated by the model.
x >= 0Number of input tokens served from cache, when cache reads are applicable.
x >= 0Why generation stopped (end_turn, max_tokens, tool_use, stop_sequence).
end_turn, max_tokens, tool_use, stop_sequence Matched stop string when stop_reason is stop_sequence; otherwise null.
Model used to generate the response.