Friendli offers comprehensive, model-agnostic reasoning parsing. No need for custom parsers. Leverage reasoning to build great AI products and let Friendli handle the complexity of reasoning.

What is Reasoning?

Reasoning models are LLMs trained to “think” before answering, enhancing precision of answers. This enables LLMs excel in complex problem solving and multi-step planning for agentic workflows. When a model performs reasoning, the reasoning content is included in its response.
Reasoning example

What makes reasoning parsing tedious?

Different models handle reasoning in different ways. Some models always generate reasoning, while others expose it as an optional feature. The format also varies. The reasoning content may be wrapped in <think> tags or model-specific tokens. As a result, separating reasoning content from the response can be non-trivial.

Reasoning Model Types

  • Always Reasoning Models: Reasoning is enabled by default. (e.g., DeepSeek-R1)
  • Controllable Reasoning Models: Reasoning can be toggled on or off. (e.g., Qwen3-32B)

Usage: Always Reasoning Models

curl -X POST https://api.friendli.ai/serverless/v1/chat/completions \
  -H "Authorization: Bearer $FRIENDLI_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-ai/DeepSeek-R1-0528",
    "messages": [
      {
        "role": "user",
        "content": "Does technology expand or limit human freedom?"
      }
    ]
  }'

Usage: Controllable Reasoning Models

These models let you control reasoning via the enable_thinking parameter.
Setting it to true enables reasoning, while false returns empty <think></think> tags.
Important: Support for enable_thinking parameter is model-specific—even among controllable reasoning models. Refer to the model card or release notes for details.
curl -X POST https://api.friendli.ai/serverless/v1/chat/completions \
  -H "Authorization: Bearer $FRIENDLI_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3-32B",
    "messages": [
      {
        "role": "user",
        "content": "Does technology expand or limit human freedom?"
      }
    ],
    "chat_template_kwargs": {
      "enable_thinking": true
    }
  }'

Reasoning Parsing with Friendli

Friendli deterministically separates reasoning content from the model response. Enable parsing with the following two parameters in the Chat Completions API:
  • parse_reasoning (boolean): Enables reasoning parsing.
  • include_reasoning (boolean): Effective when reasoning parsing is enabled. Decides whether the parsed reasoning content is included in the response.
When using Dedicated Endpoints, you can set default value for parse_reasoning at the endpoint level.
For the OpenAI SDK, place the parameters inside extra_body.
The reasoning content tokens are included in the token usage and billing, even when include_reasoning is false. For more detailed information, please refer to the Chat Completions API documentation.

Parse Reasoning: On vs Off

The following shows how responses differ when parse_reasoning is on vs off.
{
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "message": {
        "content": "Hello! How can I assist you today? 😊",
        "reasoning_content": "Okay, the user just said \"hello.\" I need to respond appropriately. Let's keep it simple and welcoming. Let's make sure there are no typos and the tone is warm.\n",
        "role": "assistant"
      }
    }
  ],
  // ...
}

Response Schema

  • parse_reasoning = false: Reasoning text remains inline in choices[].message.content.
  • parse_reasoning = true:
    • include_reasoning = true: Reasoning text moves to choices[].message.reasoning_content.
    • include_reasoning = false: Reasoning text is removed from choices[].message.content.

Streaming Response Schema

delta.reasoning_content streams reasoning tokens. delta.content streams answer tokens. When parse_reasoning is true and stream is true:
  • If include_reasoning is false, no delta.reasoning_content is sent.
  • If include_reasoning is true, both delta.reasoning_content and delta.content are sent.
data: {
  "choices": [
    { "index": 0, "delta": { "reasoning_content": "Let's break the problem down..." } }
  ]
}

data: {
  "choices": [
    { "index": 0, "delta": { "content": "The result is 1554." } }
  ]
}

data: [DONE]

Examples

Usage: Always Reasoning Models

curl -X POST https://api.friendli.ai/serverless/v1/chat/completions \
  -H "Authorization: Bearer $FRIENDLI_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-ai/DeepSeek-R1-0528",
    "messages": [
      { "role": "user", "content": "Explain why the sky is blue." }
    ],
    "parse_reasoning": true
  }'

Usage: Controllable Reasoning Models

curl -X POST https://api.friendli.ai/serverless/v1/chat/completions \
  -H "Authorization: Bearer $FRIENDLI_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3-32B",
    "messages": [
      { "role": "user", "content": "Solve 37 * 42." }
    ],
    "chat_template_kwargs": { "enable_thinking": true },
    "parse_reasoning": true,
    "include_reasoning": true
  }'