Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
15 commits
Select commit Hold shift + click to select a range
23e20fa
feat(advisor): add ADVISOR_MAX_USES, ADVISOR_NATIVE_PROVIDERS, ADVISO…
ishaan-berri Apr 12, 2026
a89b067
feat(advisor): add replace_with_text param to strip_advisor_blocks_fr…
ishaan-berri Apr 12, 2026
ebc57a1
feat(advisor): wire MessagesInterceptor registry into anthropic_messa…
ishaan-berri Apr 12, 2026
ea765f7
feat(advisor): add MessagesInterceptor ABC and registry
ishaan-berri Apr 12, 2026
c12ebdb
feat(advisor): add AdvisorOrchestrationHandler for non-Anthropic prov…
ishaan-berri Apr 12, 2026
b92e4c5
docs(advisor): add interceptors README explaining when to use vs pre-…
ishaan-berri Apr 12, 2026
ce3d039
test(advisor): add unit tests for orchestration loop (mocked backends…
ishaan-berri Apr 12, 2026
742e2fe
test(advisor): add live e2e tests for advisor orchestration against r…
ishaan-berri Apr 12, 2026
844e34b
test(advisor): remove live e2e test file (tests run locally via script)
ishaan-berri Apr 12, 2026
d29f40d
fix(advisor): inject max_uses_exceeded error result instead of raisin…
ishaan-berri Apr 12, 2026
22f45c6
fix(advisor): restore AdvisorMaxIterationsError, raise on cap, fix ma…
ishaan-berri Apr 12, 2026
fa52584
test(advisor): add unit tests for max_uses=0, missing model, default …
ishaan-berri Apr 12, 2026
9be7b4c
test(advisor): add integration tests for full dispatch path, max_uses…
ishaan-berri Apr 12, 2026
a8bc7bf
docs(advisor): add how it works section with mermaid diagram + non-na…
ishaan-berri Apr 12, 2026
dd87f3b
docs(advisor): move supported providers to top, focus how it works on…
ishaan-berri Apr 12, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
79 changes: 73 additions & 6 deletions docs/my-website/docs/completion/anthropic_advisor_tool.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,48 @@ The advisor tool is in beta. Include `anthropic-beta: advisor-tool-2026-03-01` i

## Supported Providers

| Provider | Chat Completions API | Messages API |
|----------|---------------------|--------------|
| **Anthropic API** | ✅ | ✅ |
| **Azure Anthropic** | ❌ (coming soon) | ❌ (coming soon) |
| **Google Cloud Vertex AI** | ❌ (coming soon) | ❌ (coming soon) |
| **Amazon Bedrock** | ❌ (coming soon) | ❌ (coming soon) |
| Provider | Chat Completions API | Messages API | Notes |
|----------|---------------------|--------------|-------|
| **Anthropic API** | ✅ | ✅ | Native — runs server-side |
| **OpenAI / Azure OpenAI** | ✅ | ✅ | LiteLLM orchestration loop |
| **Amazon Bedrock** | ✅ | ✅ | LiteLLM orchestration loop |
| **Google Vertex AI** | ✅ | ✅ | LiteLLM orchestration loop |
| **Groq / Mistral / others** | ✅ | ✅ | LiteLLM orchestration loop |

## How it works (LiteLLM native orchestration)

For non-Anthropic providers, LiteLLM implements the advisor loop itself. The API you call is identical — LiteLLM handles everything transparently.

When a request arrives with an `advisor_20260301` tool and a non-Anthropic provider, `AdvisorOrchestrationHandler` intercepts it. It translates the advisor tool into a regular function tool the provider understands, then runs an orchestration loop:

```mermaid
flowchart TD
A["Your request\ntools: advisor_20260301\nmodel: e.g. openai/gpt-4.1-mini"] --> B["AdvisorOrchestrationHandler\ntranslates advisor → regular fn tool"]

B --> C["EXECUTOR CALL\nopenai / bedrock / vertex / etc."]

C --> D{"executor calls\nadvisor tool?"}

D -->|"yes — tool_use\nname=advisor"| E{"max_uses\nexceeded?"}

E -->|no| F["ADVISOR SUB-CALL\nclaude-opus-4-6\nfull transcript forwarded\nno tools"]

F --> G["Inject advice as\ntool_result into history"]

G --> C

E -->|yes| H["AdvisorMaxIterationsError"]

D -->|"no — end_turn\nor other stop reason"| I["Clean final response\nno advisor blocks in output"]
```

**What LiteLLM does for you:**

- Strips `advisor_20260301` from the outgoing request — the provider only sees a standard function tool named `advisor`
- When the executor calls it, intercepts before the result reaches you, runs the advisor sub-call, and injects the advice
- Strips any `advisor_tool_result` / `server_tool_use` blocks from message history on re-send so non-Anthropic providers never see Anthropic-specific types
- Wraps the final response in an SSE stream if you requested `stream=True`
- Enforces `max_uses` as a hard cap — `AdvisorMaxIterationsError` is raised if exceeded; `max_uses=0` disables the advisor entirely

## Model Compatibility

Expand Down Expand Up @@ -305,6 +341,37 @@ response = client.beta.messages.create(
print(response)
```

#### Non-Anthropic Provider (LiteLLM orchestration loop)

```python showLineNumbers title="Advisor Tool with OpenAI executor"
import asyncio
import litellm

async def main():
# executor: openai/gpt-4.1-mini | advisor: claude-opus-4-6
# LiteLLM runs the orchestration loop automatically
response = await litellm.anthropic.messages.acreate(
model="openai/gpt-4.1-mini",
messages=[
{"role": "user", "content": "Implement a Python LRU cache with O(1) get and put."}
],
tools=[
{
"type": "advisor_20260301",
"name": "advisor",
"model": "claude-opus-4-6",
"max_uses": 3,
}
],
max_tokens=1024,
custom_llm_provider="openai",
)
# Final response is clean — no advisor tool_use blocks
print(response["content"][0]["text"])

asyncio.run(main())
```

---

## Response Structure
Expand Down
15 changes: 14 additions & 1 deletion litellm/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -1421,7 +1421,7 @@
"1",
] # always replace existing jobs

# The number of tag entries are higher than number of user, team entries. This leads to a higher QPS.
# The number of tag entries are higher than number of user, team entries. This leads to a higher QPS.
# This will run tag spcific tasks at a later time to smooth QPS
DAILY_TAG_SPEND_BATCH_MULTIPLIER = 2.3

Expand Down Expand Up @@ -1581,3 +1581,16 @@
MAX_COMPETITOR_NAMES = int(os.getenv("MAX_COMPETITOR_NAMES", 100))
COMPETITOR_LLM_TEMPERATURE = float(os.getenv("COMPETITOR_LLM_TEMPERATURE", 0.3))
DEFAULT_COMPETITOR_DISCOVERY_MODEL = "gpt-4o-mini"

# Advisor tool orchestration
# Providers that support advisor_20260301 natively (no LiteLLM orchestration needed).
# Add vertex_ai here once verified.
ADVISOR_NATIVE_PROVIDERS: frozenset = frozenset({"anthropic"})
# Hard cap on advisor iterations per request to prevent runaway loops.
ADVISOR_MAX_USES: int = 5
# Description injected into the synthetic advisor tool definition sent to non-native providers.
ADVISOR_TOOL_DESCRIPTION: str = (
"Consult a highly intelligent advisor model when you need expert guidance, "
"want to verify your reasoning, or face a complex decision. "
"Describe your question or challenge clearly in the 'question' field."
)
91 changes: 70 additions & 21 deletions litellm/llms/anthropic/common_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -639,23 +639,33 @@ def get_token_counter(self) -> Optional[BaseTokenCounter]:
return AnthropicTokenCounter()


def strip_advisor_blocks_from_messages(messages: List[Any]) -> List[Any]:
def strip_advisor_blocks_from_messages(
messages: List[Any], replace_with_text: bool = False
) -> List[Any]:
"""
Remove server_tool_use (name='advisor') and advisor_tool_result blocks from
assistant message content when the advisor tool is absent from the request.
Remove (or replace) server_tool_use (name='advisor') and advisor_tool_result blocks
from assistant message content.

Prevents Anthropic 400 invalid_request_error: if advisor_tool_result blocks
exist in history but the advisor tool is not in the tools array, the API rejects
the request. This happens when the user has removed the advisor tool for cost
control or on a follow-up turn.

Args:
messages: Conversation history to process (mutated in-place).
replace_with_text: When True, replace the advisor exchange with an
<advisor_feedback> text block so the executor retains the semantic
context of what the advisor said. When False (default), strip silently.
"""
for message in messages:
if not isinstance(message, dict) or message.get("role") != "assistant":
continue
content = message.get("content")
if not isinstance(content, list):
continue
advisor_ids: set = set()

# Collect advisor server_tool_use ids and their advice text (for replace mode).
advisor_id_to_text: dict = {}
for block in content:
if (
isinstance(block, dict)
Expand All @@ -664,26 +674,65 @@ def strip_advisor_blocks_from_messages(messages: List[Any]) -> List[Any]:
):
bid = block.get("id")
if bid:
advisor_ids.add(bid)
if not advisor_ids:
advisor_id_to_text[bid] = None # text filled in below

if not advisor_id_to_text:
continue
message["content"] = [
block
for block in content
if not (
isinstance(block, dict)
and (
(
block.get("type") == "server_tool_use"
and block.get("name") == "advisor"
)
or (
block.get("type") == "advisor_tool_result"
and block.get("tool_use_id") in advisor_ids

# If replacing, collect the advisor response text from advisor_tool_result blocks.
if replace_with_text:
for block in content:
if (
isinstance(block, dict)
and block.get("type") == "advisor_tool_result"
and block.get("tool_use_id") in advisor_id_to_text
):
raw = block.get("content") or ""
text = (
raw
if isinstance(raw, str)
else next(
(
b.get("text", "")
for b in raw
if isinstance(b, dict) and b.get("type") == "text"
),
"",
)
)
)
advisor_id_to_text[block["tool_use_id"]] = text

new_content = []
for block in content:
if not isinstance(block, dict):
new_content.append(block)
continue
is_advisor_use = (
block.get("type") == "server_tool_use"
and block.get("name") == "advisor"
and block.get("id") in advisor_id_to_text
)
]
is_advisor_result = (
block.get("type") == "advisor_tool_result"
and block.get("tool_use_id") in advisor_id_to_text
)
if is_advisor_use:
if replace_with_text:
advice = advisor_id_to_text.get(block.get("id")) or ""
if advice:
new_content.append(
{
"type": "text",
"text": f"<advisor_feedback>\n{advice}\n</advisor_feedback>",
}
)
# else: drop silently
elif is_advisor_result:
pass # always drop — replaced above (or stripped)
else:
new_content.append(block)

message["content"] = new_content
return messages


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@

from ..adapters.handler import LiteLLMMessagesToCompletionTransformationHandler
from ..responses_adapters.handler import LiteLLMMessagesToResponsesAPIHandler
from .interceptors import get_messages_interceptors
from .utils import AnthropicMessagesRequestUtils, mock_response

# Providers that are routed directly to the OpenAI Responses API instead of
Expand Down Expand Up @@ -236,6 +237,23 @@ async def anthropic_messages(
if short_circuit_response is not None:
return short_circuit_response

# Run registered MessagesInterceptors (e.g. advisor orchestration loop).
# api_key and api_base are explicit params (not in **kwargs) so pass them
# explicitly so interceptor sub-calls can route to the same backend.
for interceptor in get_messages_interceptors():
if interceptor.can_handle(tools, custom_llm_provider):
return await interceptor.handle(
model=model,
messages=messages,
tools=tools,
stream=original_stream,
max_tokens=max_tokens,
custom_llm_provider=custom_llm_provider,
api_key=api_key,
api_base=api_base,
**kwargs,
)

loop = asyncio.get_event_loop()
kwargs["is_async"] = True

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# Messages Interceptors

Interceptors are short-circuit handlers for the `/v1/messages` path. They run **before** the normal backend call and can fully replace it with their own response.

## When to add an interceptor

Use an interceptor when you need to **replace the backend call entirely** with your own logic — for example, running an orchestration loop, synthesizing a response from multiple sub-calls, or short-circuiting to a non-LLM backend.

Use a **pre-request hook** (`_execute_pre_request_hooks` / `CustomLogger.async_pre_request_hook`) instead when you only need to **mutate request parameters** (tools, stream flag, metadata) before the normal call proceeds.

| Scenario | Use |
|---|---|
| Replace the backend call with a loop or synthetic response | Interceptor |
| Translate or strip tools before the call | Pre-request hook |
| Feature that is always active (built-in LiteLLM behavior) | Interceptor |
| Optional integration that operators register | `CustomLogger` callback |

## How to add a new interceptor

1. Create `your_feature.py` in this directory.
2. Implement `MessagesInterceptor` from `base.py`:
- `can_handle(tools, custom_llm_provider) -> bool` — return True when your interceptor owns this request.
- `async handle(...) -> Union[AnthropicMessagesResponse, AsyncIterator]` — do your work and return the response.
3. Register it in `__init__.py` by appending to `_interceptors`.

```python
# your_feature.py
from .base import MessagesInterceptor

class MyFeatureHandler(MessagesInterceptor):
def can_handle(self, tools, custom_llm_provider):
return some_condition(tools, custom_llm_provider)

async def handle(self, *, model, messages, tools, stream, max_tokens,
custom_llm_provider, **kwargs):
...
return response
```

```python
# __init__.py
from .your_feature import MyFeatureHandler

_interceptors = [
AdvisorOrchestrationHandler(),
MyFeatureHandler(), # add here
]
```

## Existing interceptors

### `AdvisorOrchestrationHandler`

Handles `advisor_20260301` tool for providers that don't support it natively (all non-Anthropic providers for now).

**Triggers when:** `advisor_20260301` is in `tools` AND `custom_llm_provider` is not in `ADVISOR_NATIVE_PROVIDERS`.

**What it does:**
- Translates the advisor tool to a regular function tool the provider understands.
- Runs the executor model; when it calls the `advisor` tool, runs the advisor model and injects the result as a `tool_result`.
- Loops until the executor produces a final text response or `max_uses` is exceeded.
- Wraps the final response in `FakeAnthropicMessagesStreamIterator` if the caller requested streaming.
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
from typing import List

from .advisor import AdvisorOrchestrationHandler
from .base import MessagesInterceptor

_interceptors: List[MessagesInterceptor] = [
AdvisorOrchestrationHandler(),
]


def get_messages_interceptors() -> List[MessagesInterceptor]:
"""Return the list of active MessagesInterceptors.

Order matters: interceptors are tried in list order; the first one whose
``can_handle()`` returns True wins.
"""
return _interceptors
Loading
Loading