> ## Documentation Index
> Fetch the complete documentation index at: https://friendli.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Reasoning

> Enable model-agnostic reasoning on Friendli endpoints. Extract chain-of-thought traces from any supported model without writing custom parsers.

export const RoundedBorderBox = ({children, caption}) => <div className="rounded-border-box">
    {children}
    {caption && <p className="text-sm text-gray-700 dark:text-gray-400">{caption}</p>}
  </div>;

Friendli offers comprehensive, **model-agnostic reasoning parsing**. No need for custom parsers.
Leverage reasoning to build great AI products and let Friendli handle the complexity of reasoning.

## What Is Reasoning

Reasoning models are LLMs trained to "think" before answering, enhancing precision of answers.
This enables LLMs to excel in complex problem solving and multi-step planning for agentic workflows.
When a model performs reasoning, the reasoning content is included in its response.

<RoundedBorderBox>
  <img alt="Reasoning example" src="https://mintcdn.com/friendliai/OmGyXdVnt91Gfn8z/static/images/reasoning-response.png?fit=max&auto=format&n=OmGyXdVnt91Gfn8z&q=85&s=58287be9b97de4efe19b8afe0c52f7b4" width="2246" height="934" data-path="static/images/reasoning-response.png" />
</RoundedBorderBox>

### What Makes Reasoning Parsing Tedious?

Different models handle reasoning in different ways.
Some models always generate reasoning, while others expose it as an optional feature.
The format also varies. The reasoning content may be wrapped in `<think>` tags or model-specific tokens.
As a result, separating reasoning content from the response can be non-trivial.

## Reasoning Model Types

* **Always Reasoning Models**: Reasoning is enabled by default. (e.g., MiniMax-M2.5)
* **Controllable Reasoning Models**: Reasoning can be toggled on or off. (e.g., GLM-5.2)

### Usage: Always Reasoning Models

<CodeGroup>
  ```bash curl theme={null}
  curl -X POST https://api.friendli.ai/serverless/v1/chat/completions \
    -H "Authorization: Bearer $API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "MiniMaxAI/MiniMax-M2.5",
      "messages": [
        {
          "role": "user",
          "content": "Does technology expand or limit human freedom?"
        }
      ]
    }'
  ```

  ```python Friendli Python SDK theme={null}
  # pip install friendli

  import os
  from friendli import SyncFriendli

  client = SyncFriendli(token=os.getenv("API_KEY"))

  completion = client.serverless.chat.complete(
      model="MiniMaxAI/MiniMax-M2.5",
      messages=[
          {
              "role": "user",
              "content": "Tell me how to make a delicious pancake"
          }
      ]
  )

  print(completion.choices[0].message)
  ```

  ```python OpenAI Python SDK theme={null}
  import os
  from openai import OpenAI

  client = OpenAI(
      base_url="https://api.friendli.ai/serverless/v1",
      api_key=os.environ.get("API_KEY")
  )

  completion = client.chat.completions.create(
      model="MiniMaxAI/MiniMax-M2.5",
      messages=[
          {"role": "user", "content": "Does technology expand or limit human freedom?"}
      ]
  )

  print(completion.choices[0].message)
  ```
</CodeGroup>

### Usage: Controllable Reasoning Models

These models let you control reasoning via the `enable_thinking` parameter. \
Setting it to `true` enables reasoning, while `false` returns empty `<think></think>` tags.

<Note>
  <strong>Important:</strong> Support for `enable_thinking` parameter is model-specific—even among controllable reasoning models. Refer to the model card or release notes for details.
</Note>

<CodeGroup>
  ```bash {12-14} curl theme={null}
  curl -X POST https://api.friendli.ai/serverless/v1/chat/completions \
    -H "Authorization: Bearer $API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "zai-org/GLM-5.2",
      "messages": [
        {
          "role": "user",
          "content": "Does technology expand or limit human freedom?"
        }
      ],
      "chat_template_kwargs": {
        "enable_thinking": true
      }
    }'
  ```

  ```python {16-18} Friendli Python SDK theme={null}
  # pip install friendli

  import os
  from friendli import SyncFriendli

  client = SyncFriendli(token=os.getenv("API_KEY"))

  completion = client.serverless.chat.complete(
      model="zai-org/GLM-5.2",
      messages=[
          {
              "role": "user",
              "content": "Tell me how to make a delicious pancake"
          }
      ],
      chat_template_kwargs={
          "enable_thinking": True
      }
  )

  print(completion.choices[0].message)
  ```

  ```python {14-18} OpenAI Python SDK theme={null}
  import os
  from openai import OpenAI

  client = OpenAI(
      base_url="https://api.friendli.ai/serverless/v1",
      api_key=os.environ.get("API_KEY")
  )

  completion = client.chat.completions.create(
      model="zai-org/GLM-5.2",
      messages=[
          {"role": "user", "content": "Does technology expand or limit human freedom?"}
      ],
      extra_body={
          "chat_template_kwargs": {
              "enable_thinking": True
          }
      }
  )

  print(completion.choices[0].message)
  ```
</CodeGroup>

## Reasoning Parser

Friendli deterministically separates reasoning content from the model response.
Enable parsing with the following two parameters in the Chat Completions API:

* `parse_reasoning` (boolean): Enables reasoning parsing.
* `include_reasoning` (boolean): Effective when reasoning parsing is enabled. Decides whether the parsed reasoning content is included in the response.

The default behavior for when not specified in the request may vary between endpoints.

<Note>
  For Model APIs, when `parse_reasoning` is not specified, the default behavior is `true`. However, older endpoints might default to `false` so always specifying the parameter is recommended.
</Note>

<Note>
  For Dedicated Endpoints, the default behavior is configurable on the endpoint level and can be set during creation and update. You may also find it on the endpoint overview page. If there is no **Reasoning parser** field, it may indicate that the selected model does not support reasoning parsing.
</Note>

<Note>
  For the OpenAI SDK, place the parameters inside `extra_body`.
</Note>

The reasoning content tokens are included in the token usage and billing, even when `include_reasoning` is `false`.
For more detailed information, please refer to the [Chat Completions API](/openapi/dedicated/inference/chat-completions) documentation.

### Parse Reasoning: On vs Off

The following shows how responses differ when `parse_reasoning` is on vs off.

<CodeGroup>
  ```json Parse On theme={null}
  {
    "choices": [
      {
        "finish_reason": "stop",
        "index": 0,
        "logprobs": null,
        "message": {
          "content": "Hello! How can I assist you today? 😊",
          "reasoning_content": "Okay, the user just said \"hello.\" I need to respond appropriately. Let's keep it simple and welcoming. Let's make sure there are no typos and the tone is warm.\n",
          "role": "assistant"
        }
      }
    ],
    // ...
  }
  ```

  ```json Parse Off theme={null}
  {
    "choices": [
      {
        "finish_reason": "stop",
        "index": 0,
        "logprobs": null,
        "message": {
          "content": "<think>Okay, the user said \"hello.\" I need to respond appropriately. Let's keep it simple and welcoming. Let's make sure there are no typos and the tone is warm.</think>\nHello! How can I assist you today? 😊",
          "role": "assistant"
        }
      }
    ],
    // ...
  }
  ```
</CodeGroup>

### Response Schema

* `parse_reasoning = false`: Reasoning text remains inline in `choices[].message.content`.
* `parse_reasoning = true`:
  * `include_reasoning = true`: Reasoning text moves to `choices[].message.reasoning_content`.
  * `include_reasoning = false`: Reasoning text is removed from `choices[].message.content`.

### Streaming Response Schema

`delta.reasoning_content` streams reasoning tokens. `delta.content` streams answer tokens.

When `parse_reasoning` is `true` and `stream` is `true`:

* If `include_reasoning` is `false`, no `delta.reasoning_content` is sent.
* If `include_reasoning` is `true`, both `delta.reasoning_content` and `delta.content` are sent.

```json theme={null}
data: {
  "choices": [
    { "index": 0, "delta": { "reasoning_content": "Let's break the problem down..." } }
  ]
}

data: {
  "choices": [
    { "index": 0, "delta": { "content": "The result is 1554." } }
  ]
}

data: [DONE]
```

## Examples

### Usage: Always Reasoning Models

<CodeGroup>
  ```bash curl theme={null}
  curl -X POST https://api.friendli.ai/serverless/v1/chat/completions \
    -H "Authorization: Bearer $API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "MiniMaxAI/MiniMax-M2.5",
      "messages": [
        { "role": "user", "content": "Explain why the sky is blue." }
      ],
      "parse_reasoning": true
    }'
  ```

  ```python Friendli Python SDK theme={null}
  import os
  from friendli import SyncFriendli

  client = SyncFriendli(token=os.getenv("API_KEY"))

  completion = client.serverless.chat.complete(
      model="MiniMaxAI/MiniMax-M2.5",
      messages=[
          {"role": "user", "content": "Explain why the sky is blue."}
      ],
      parse_reasoning=True,
  )

  print(completion.choices[0].message)
  ```

  ```python OpenAI Python SDK theme={null}
  import os
  from openai import OpenAI

  client = OpenAI(
      base_url="https://api.friendli.ai/serverless/v1",
      api_key=os.environ.get("API_KEY")
  )

  completion = client.chat.completions.create(
      model="MiniMaxAI/MiniMax-M2.5",
      messages=[
          {"role": "user", "content": "Explain why the sky is blue."}
      ],
      extra_body={
          "parse_reasoning": True
      }
  )

  print(completion.choices[0].message)
  ```
</CodeGroup>

### Usage: Controllable Reasoning Models

<CodeGroup>
  ```bash curl theme={null}
  curl -X POST https://api.friendli.ai/serverless/v1/chat/completions \
    -H "Authorization: Bearer $API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "zai-org/GLM-5.2",
      "messages": [
        { "role": "user", "content": "Solve 37 * 42." }
      ],
      "chat_template_kwargs": { "enable_thinking": true },
      "parse_reasoning": true,
      "include_reasoning": true
    }'
  ```

  ```python Friendli Python SDK theme={null}
  import os
  from friendli import SyncFriendli

  client = SyncFriendli(token=os.getenv("API_KEY"))

  completion = client.serverless.chat.complete(
      model="zai-org/GLM-5.2",
      messages=[
          {"role": "user", "content": "Solve 37 * 42."}
      ],
      chat_template_kwargs={"enable_thinking": True},
      parse_reasoning=True,
      include_reasoning=True,
  )

  print(completion.choices[0].message)
  ```

  ```python OpenAI Python SDK theme={null}
  import os
  from openai import OpenAI

  client = OpenAI(
      base_url="https://api.friendli.ai/serverless/v1",
      api_key=os.environ.get("API_KEY")
  )

  completion = client.chat.completions.create(
      model="zai-org/GLM-5.2",
      messages=[
          {"role": "user", "content": "Solve 37 * 42."}
      ],
      extra_body={
          "chat_template_kwargs": {"enable_thinking": True},
          "parse_reasoning": True,
          "include_reasoning": True,
      }
  )

  print(completion.choices[0].message)
  ```
</CodeGroup>
