Merge pull request #22823 from Chesars/docs/thinking-summary-field

Chesars · web-flow · commit b0e644b01792 · 2026-03-04T22:12:43.000-03:00
docs: add thinking.summary field to /v1/messages and reasoning docs
diff --git a/docs/my-website/docs/anthropic_unified/index.md b/docs/my-website/docs/anthropic_unified/index.md
@@ -506,12 +506,15 @@ Request body will be in the Anthropic messages API format. **litellm follows the
   A system prompt providing context or specific instructions to the model.
 - **temperature** (number):  
   Controls randomness in the model's responses. Valid range: `0 < temperature < 1`.
-- **thinking** (object):  
+- **thinking** (object):
   Configuration for enabling extended thinking. If enabled, it includes:
-  - **budget_tokens** (integer):  
+  - **budget_tokens** (integer):
     Minimum of 1024 tokens (and less than `max_tokens`).
-  - **type** (enum):  
+  - **type** (enum):
     E.g., `"enabled"`.
+  - **summary** (string, optional):
+    Enables the summary style for thinking blocks. Possible values: `"auto"`, `"concise"`, `"detailed"`, `"disabled"`.
+    When routing to non-Anthropic providers (e.g., `openai/gpt-5.1`), the `summary` value is preserved and forwarded to the downstream API.
 - **tool_choice** (object):  
   Instructs how the model should utilize any provided tools.
 - **tools** (array of objects):  
diff --git a/docs/my-website/docs/reasoning_content.md b/docs/my-website/docs/reasoning_content.md
@@ -675,3 +675,19 @@ response = litellm.completion(
     reasoning_effort={"effort": "low", "summary": "detailed"},  # Explicit control
 )
 ```
+
+### Summary Preservation via `/v1/messages` Adapter
+
+When using the Anthropic `/v1/messages` adapter to route non-Claude models (e.g., `openai/gpt-5.1`), the `thinking.summary` value is preserved and forwarded to the downstream provider. For example:
+
+```python
+import litellm
+
+response = await litellm.anthropic.messages.acreate(
+    model="openai/gpt-5.1",
+    messages=[{"role": "user", "content": "Hello"}],
+    max_tokens=8096,
+    thinking={"type": "enabled", "budget_tokens": 5000, "summary": "concise"},
+)
+# The summary="concise" is preserved when routing to OpenAI's Responses API
+```