What is Reasoning?
Reasoning models are LLMs trained to “think” before answering, enhancing precision of answers. This enables LLMs excel in complex problem solving and multi-step planning for agentic workflows. When a model performs reasoning, the reasoning content is included in its response.
What makes reasoning parsing tedious?
Different models handle reasoning in different ways. Some models always generate reasoning, while others expose it as an optional feature. The format also varies. The reasoning content may be wrapped in<think>
tags or model-specific tokens.
As a result, separating reasoning content from the response can be non-trivial.
Reasoning Model Types
- Always Reasoning Models: Reasoning is enabled by default. (e.g., DeepSeek-R1)
- Controllable Reasoning Models: Reasoning can be toggled on or off. (e.g., Qwen3-32B)
Usage: Always Reasoning Models
Usage: Controllable Reasoning Models
These models let you control reasoning via theenable_thinking
parameter. Setting it to
true
enables reasoning, while false
returns empty <think></think>
tags.
Important: Support for
enable_thinking
parameter is model-specific—even among controllable reasoning models. Refer to the model card or release notes for details.Reasoning Parsing with Friendli
Friendli deterministically separates reasoning content from the model response. Enable parsing with the following two parameters in the Chat Completions API:parse_reasoning
(boolean): Enables reasoning parsing.include_reasoning
(boolean): Effective when reasoning parsing is enabled. Decides whether the parsed reasoning content is included in the response.
When using Dedicated Endpoints, you can set default value for
parse_reasoning
at the endpoint level.For the OpenAI SDK, place the parameters inside
extra_body
.include_reasoning
is false
.
For more detailed information, please refer to the Chat Completions API documentation.
Parse Reasoning: On vs Off
The following shows how responses differ whenparse_reasoning
is on vs off.
Response Schema
parse_reasoning = false
: Reasoning text remains inline inchoices[].message.content
.parse_reasoning = true
:include_reasoning = true
: Reasoning text moves tochoices[].message.reasoning_content
.include_reasoning = false
: Reasoning text is removed fromchoices[].message.content
.
Streaming Response Schema
delta.reasoning_content
streams reasoning tokens. delta.content
streams answer tokens.
When parse_reasoning
is true
and stream
is true
:
- If
include_reasoning
isfalse
, nodelta.reasoning_content
is sent. - If
include_reasoning
istrue
, bothdelta.reasoning_content
anddelta.content
are sent.