Reasoning
Learn how to use reasoning-capable models that show their step-by-step thought process.
Reasoning
LLMGateway supports reasoning-capable models that can show their step-by-step thought process before providing a final answer. This feature is particularly useful for complex problem-solving tasks, mathematical calculations, and logical reasoning.
Reasoning-Enabled Models
You can find all reasoning-enabled models on our models page with reasoning filter. These models include:
- OpenAI's GPT-5 series (e.g.,
gpt-5,gpt-5-mini)- Note: GPT-5 models use reasoning but currently do not return the reasoning content in the response.
- Anthropic's Claude 3.7 Sonnet
- Google's Gemini 2.0 Flash Thinking and Gemini 2.5 Pro
- GPT OSS models such as
gpt-oss-120bandgpt-oss-20b - Z.AI's reasoning models
Some models may reason internally even if the reasoning_effort parameter is
not specified.
Using the Reasoning Parameter
To enable reasoning output, add the reasoning_effort parameter to your request. This parameter accepts the following values:
minimal- Fastest reasoning with minimal thought process (only for GPT-5 models)low- Light reasoning for simpler tasksmedium- Balanced reasoning for most taskshigh- Deep reasoning for complex problems
Example Request
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
-H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5-preview",
"messages": [
{
"role": "user",
"content": "What is 2/3 + 1/4 + 5/6?"
}
],
"reasoning_effort": "medium"
}'Example Response
The response will include a reasoning field in the message object containing the model's step-by-step thought process:
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1234567890,
"model": "gpt-5-preview",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The answer is 1.75 or 7/4.",
"reasoning": "First, I need to find a common denominator for 2/3, 1/4, and 5/6. The LCD is 12. Converting: 2/3 = 8/12, 1/4 = 3/12, 5/6 = 10/12. Adding: 8/12 + 3/12 + 10/12 = 21/12 = 1.75 or 7/4."
},
"finish_reason": "completed"
}
],
"usage": {
"prompt_tokens": 20,
"completion_tokens": 45,
"reasoning_tokens": 35,
"total_tokens": 65
}
}Streaming Reasoning Content
When streaming is enabled, reasoning content will be streamed as part of the response chunks:
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
-H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5-preview",
"messages": [
{
"role": "user",
"content": "Solve this logic puzzle: If all roses are flowers and some flowers fade quickly, can we conclude that some roses fade quickly?"
}
],
"reasoning_effort": "high",
"stream": true
}'The reasoning content will appear in the stream chunks before the final answer, allowing you to display the model's thought process in real-time.
Example:
data: {
"id": "chatcmpl-fb266880-1016-4797-9a70-f21a538edaf6",
"object": "chat.completion.chunk",
"created": 1761048126,
"model": "openai/gpt-oss-20b",
"choices": [
{
"index": 0,
"delta": {
"reasoning": "It's ",
"role": "assistant"
},
"finish_reason": null
}
]
}Usage Tracking
Response Payload
The usage object in the response includes reasoning-specific token counts:
reasoning_tokens- Number of tokens used for the reasoning processcompletion_tokens- Number of tokens in the final answerprompt_tokens- Number of tokens in the inputtotal_tokens- Sum of all token counts
Logs and Analytics
All requests using the reasoning_effort parameter are tracked in your dashboard logs with:
- The
reasoningContentfield containing the full reasoning text - Separate token counts for reasoning vs. completion
- Performance metrics for reasoning-enabled requests
You can view detailed logs for each request in the dashboard to analyze how models are reasoning through problems.
Auto-Routing with Reasoning
When using auto-routing (specifying a model like gpt-5 without a specific version), LLMGateway will:
- Automatically set
reasoning_efforttominimalfor GPT-5 models - Set
reasoning_efforttolowfor other auto-routed reasoning models - Only route to providers that support reasoning when
reasoning_effortis specified
This ensures optimal performance and cost when using auto-routing with reasoning-capable models.
Model-Specific Behavior
Not all reasoning models return reasoning content in the same way. Some models (like OpenAI models) may reason internally but not expose the reasoning content in the response. LLMGateway makes sure the response is unified across different providers, but the depth and format of reasoning may vary.
Best Practices
- Choose appropriate reasoning effort: Use
loworminimalfor simple tasks,mediumfor most tasks, andhighonly for complex problems that require deep reasoning - Monitor token usage: Reasoning can significantly increase token consumption - monitor your
reasoning_tokensin the usage object - Stream for better UX: When building user-facing applications, enable streaming to show the reasoning process in real-time
- Check logs: Review the
reasoningContentin your dashboard logs to understand how models are solving problems
Error Handling
If you specify reasoning_effort for a model that doesn't support reasoning, you'll receive an error:
{
"error": {
"message": "Model gpt-4o does not support reasoning. Remove the reasoning_effort parameter or use a reasoning-capable model.",
"type": "invalid_request_error",
"code": "model_not_supported"
}
}To avoid this error, only use the reasoning_effort parameter with reasoning-enabled models.
How is this guide?
Last updated on