Reasoning

Friendli offers comprehensive, model-agnostic reasoning parsing. No need for custom parsers. Leverage reasoning to build great AI products and let Friendli handle the complexity of reasoning.

What is Reasoning?

Reasoning models are LLMs trained to “think” before answering, enhancing precision of answers. This enables LLMs to excel in complex problem solving and multi-step planning for agentic workflows. When a model performs reasoning, the reasoning content is included in its response.

What makes reasoning parsing tedious?

Different models handle reasoning in different ways. Some models always generate reasoning, while others expose it as an optional feature. The format also varies. The reasoning content may be wrapped in <think> tags or model-specific tokens. As a result, separating reasoning content from the response can be non-trivial.

Reasoning Model Types

Always Reasoning Models: Reasoning is enabled by default. (e.g., DeepSeek-R1)
Controllable Reasoning Models: Reasoning can be toggled on or off. (e.g., Qwen3-32B)

Usage: Always Reasoning Models

curl -X POST https://api.friendli.ai/serverless/v1/chat/completions \
  -H "Authorization: Bearer $FRIENDLI_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-ai/DeepSeek-R1-0528",
    "messages": [
      {
        "role": "user",
        "content": "Does technology expand or limit human freedom?"
      }
    ]
  }'

Usage: Controllable Reasoning Models

These models let you control reasoning via the enable_thinking parameter.
Setting it to true enables reasoning, while false returns empty <think></think> tags.

Important: Support for enable_thinking parameter is model-specific—even among controllable reasoning models. Refer to the model card or release notes for details.

curl -X POST https://api.friendli.ai/serverless/v1/chat/completions \
  -H "Authorization: Bearer $FRIENDLI_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3-32B",
    "messages": [
      {
        "role": "user",
        "content": "Does technology expand or limit human freedom?"
      }
    ],
    "chat_template_kwargs": {
      "enable_thinking": true
    }
  }'

Reasoning Parsing with Friendli

Friendli deterministically separates reasoning content from the model response. Enable parsing with the following two parameters in the Chat Completions API:

parse_reasoning (boolean): Enables reasoning parsing.
include_reasoning (boolean): Effective when reasoning parsing is enabled. Decides whether the parsed reasoning content is included in the response.

The default behavior for when not specified in the request may vary between endpoints.

For Serverless Endpoints, when parse_reasoning is not specified, the default behavior is true. However, older endpoints might default to false so always specifying the parameter is recommended.

For Dedicated Endpoints, the default behavior is configurable on the endpoint level and can be set during creation and update. You may also find it on the endpoint overview page. If there is no Reasoning parser field, it may indicate that the selected model does not support reasoning parsing.

For the OpenAI SDK, place the parameters inside extra_body.

The reasoning content tokens are included in the token usage and billing, even when include_reasoning is false. For more detailed information, please refer to the Chat Completions API documentation.

Parse Reasoning: On vs Off

The following shows how responses differ when parse_reasoning is on vs off.

{
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "message": {
        "content": "Hello! How can I assist you today? 😊",
        "reasoning_content": "Okay, the user just said \"hello.\" I need to respond appropriately. Let's keep it simple and welcoming. Let's make sure there are no typos and the tone is warm.\n",
        "role": "assistant"
      }
    }
  ],
  // ...
}

Response Schema

parse_reasoning = false: Reasoning text remains inline in choices[].message.content.
parse_reasoning = true:
- include_reasoning = true: Reasoning text moves to choices[].message.reasoning_content.
- include_reasoning = false: Reasoning text is removed from choices[].message.content.

Streaming Response Schema

delta.reasoning_content streams reasoning tokens. delta.content streams answer tokens. When parse_reasoning is true and stream is true:

If include_reasoning is false, no delta.reasoning_content is sent.
If include_reasoning is true, both delta.reasoning_content and delta.content are sent.

data: {
  "choices": [
    { "index": 0, "delta": { "reasoning_content": "Let's break the problem down..." } }
  ]
}

data: {
  "choices": [
    { "index": 0, "delta": { "content": "The result is 1554." } }
  ]
}

data: [DONE]

Examples

Usage: Always Reasoning Models

curl -X POST https://api.friendli.ai/serverless/v1/chat/completions \
  -H "Authorization: Bearer $FRIENDLI_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-ai/DeepSeek-R1-0528",
    "messages": [
      { "role": "user", "content": "Explain why the sky is blue." }
    ],
    "parse_reasoning": true
  }'

Usage: Controllable Reasoning Models

curl -X POST https://api.friendli.ai/serverless/v1/chat/completions \
  -H "Authorization: Bearer $FRIENDLI_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3-32B",
    "messages": [
      { "role": "user", "content": "Solve 37 * 42." }
    ],
    "chat_template_kwargs": { "enable_thinking": true },
    "parse_reasoning": true,
    "include_reasoning": true
  }'

Get Started

Capabilities

Friendli Dedicated Endpoints

Friendli Serverless Endpoints

Friendli Container

What is Reasoning?

What makes reasoning parsing tedious?

Reasoning Model Types

Usage: Always Reasoning Models

Usage: Controllable Reasoning Models

Reasoning Parsing with Friendli

Parse Reasoning: On vs Off

Response Schema

Streaming Response Schema

Examples

Usage: Always Reasoning Models

Usage: Controllable Reasoning Models

Get Started

Capabilities

Friendli Dedicated Endpoints

Friendli Serverless Endpoints

Friendli Container

​What is Reasoning?

​What makes reasoning parsing tedious?

​Reasoning Model Types

​Usage: Always Reasoning Models

​Usage: Controllable Reasoning Models

​Reasoning Parsing with Friendli

​Parse Reasoning: On vs Off

​Response Schema

​Streaming Response Schema

​Examples

​Usage: Always Reasoning Models

​Usage: Controllable Reasoning Models

What is Reasoning?

What makes reasoning parsing tedious?

Reasoning Model Types

Usage: Always Reasoning Models

Usage: Controllable Reasoning Models

Reasoning Parsing with Friendli

Parse Reasoning: On vs Off

Response Schema

Streaming Response Schema

Examples

Usage: Always Reasoning Models

Usage: Controllable Reasoning Models