BerriAI · ishaan-berri · Apr 12, 2026 · Apr 12, 2026 · Apr 12, 2026 · Apr 12, 2026
diff --git a/docs/my-website/docs/completion/anthropic_advisor_tool.md b/docs/my-website/docs/completion/anthropic_advisor_tool.md
@@ -14,12 +14,48 @@ The advisor tool is in beta. Include `anthropic-beta: advisor-tool-2026-03-01` i
 
 ## Supported Providers
 
-| Provider | Chat Completions API | Messages API |
-|----------|---------------------|--------------|
-| **Anthropic API** | ✅ | ✅ |
-| **Azure Anthropic** | ❌ (coming soon) | ❌ (coming soon) |
-| **Google Cloud Vertex AI** | ❌ (coming soon) | ❌ (coming soon) |
-| **Amazon Bedrock** | ❌ (coming soon) | ❌ (coming soon) |
+| Provider | Chat Completions API | Messages API | Notes |
+|----------|---------------------|--------------|-------|
+| **Anthropic API** | ✅ | ✅ | Native — runs server-side |
+| **OpenAI / Azure OpenAI** | ✅ | ✅ | LiteLLM orchestration loop |
+| **Amazon Bedrock** | ✅ | ✅ | LiteLLM orchestration loop |
+| **Google Vertex AI** | ✅ | ✅ | LiteLLM orchestration loop |
+| **Groq / Mistral / others** | ✅ | ✅ | LiteLLM orchestration loop |
+
+## How it works (LiteLLM native orchestration)
+
+For non-Anthropic providers, LiteLLM implements the advisor loop itself. The API you call is identical — LiteLLM handles everything transparently.
+
+When a request arrives with an `advisor_20260301` tool and a non-Anthropic provider, `AdvisorOrchestrationHandler` intercepts it. It translates the advisor tool into a regular function tool the provider understands, then runs an orchestration loop:
+
+```mermaid
+flowchart TD
+    A["Your request\ntools: advisor_20260301\nmodel: e.g. openai/gpt-4.1-mini"] --> B["AdvisorOrchestrationHandler\ntranslates advisor → regular fn tool"]
+
+    B --> C["EXECUTOR CALL\nopenai / bedrock / vertex / etc."]
+
+    C --> D{"executor calls\nadvisor tool?"}
+
+    D -->|"yes — tool_use\nname=advisor"| E{"max_uses\nexceeded?"}
+
+    E -->|no| F["ADVISOR SUB-CALL\nclaude-opus-4-6\nfull transcript forwarded\nno tools"]
+
+    F --> G["Inject advice as\ntool_result into history"]
+
+    G --> C
+
+    E -->|yes| H["AdvisorMaxIterationsError"]
+
+    D -->|"no — end_turn\nor other stop reason"| I["Clean final response\nno advisor blocks in output"]
+```
+
+**What LiteLLM does for you:**
+
+- Strips `advisor_20260301` from the outgoing request — the provider only sees a standard function tool named `advisor`
+- When the executor calls it, intercepts before the result reaches you, runs the advisor sub-call, and injects the advice
+- Strips any `advisor_tool_result` / `server_tool_use` blocks from message history on re-send so non-Anthropic providers never see Anthropic-specific types
+- Wraps the final response in an SSE stream if you requested `stream=True`
+- Enforces `max_uses` as a hard cap — `AdvisorMaxIterationsError` is raised if exceeded; `max_uses=0` disables the advisor entirely
 
 ## Model Compatibility
 
@@ -305,6 +341,37 @@ response = client.beta.messages.create(
 print(response)
 ```
 
+#### Non-Anthropic Provider (LiteLLM orchestration loop)
+
+```python showLineNumbers title="Advisor Tool with OpenAI executor"
+import asyncio
+import litellm
+
+async def main():
+    # executor: openai/gpt-4.1-mini  |  advisor: claude-opus-4-6
+    # LiteLLM runs the orchestration loop automatically
+    response = await litellm.anthropic.messages.acreate(
+        model="openai/gpt-4.1-mini",
+        messages=[
+            {"role": "user", "content": "Implement a Python LRU cache with O(1) get and put."}
+        ],
+        tools=[
+            {
+                "type": "advisor_20260301",
+                "name": "advisor",
+                "model": "claude-opus-4-6",
+                "max_uses": 3,
+            }
+        ],
+        max_tokens=1024,
+        custom_llm_provider="openai",
+    )
+    # Final response is clean — no advisor tool_use blocks
+    print(response["content"][0]["text"])
+
+asyncio.run(main())
+```
+
 ---
 
 ## Response Structure

diff --git a/litellm/constants.py b/litellm/constants.py
@@ -1421,7 +1421,7 @@
     "1",
 ]  # always replace existing jobs
 
-# The number of tag entries are higher than number of user, team entries. This leads to a higher QPS. 
+# The number of tag entries are higher than number of user, team entries. This leads to a higher QPS.
 # This will run tag spcific tasks at a later time to smooth QPS
 DAILY_TAG_SPEND_BATCH_MULTIPLIER = 2.3
 
@@ -1581,3 +1581,16 @@
 MAX_COMPETITOR_NAMES = int(os.getenv("MAX_COMPETITOR_NAMES", 100))
 COMPETITOR_LLM_TEMPERATURE = float(os.getenv("COMPETITOR_LLM_TEMPERATURE", 0.3))
 DEFAULT_COMPETITOR_DISCOVERY_MODEL = "gpt-4o-mini"
+
+# Advisor tool orchestration
+# Providers that support advisor_20260301 natively (no LiteLLM orchestration needed).
+# Add vertex_ai here once verified.
+ADVISOR_NATIVE_PROVIDERS: frozenset = frozenset({"anthropic"})
+# Hard cap on advisor iterations per request to prevent runaway loops.
+ADVISOR_MAX_USES: int = 5
+# Description injected into the synthetic advisor tool definition sent to non-native providers.
+ADVISOR_TOOL_DESCRIPTION: str = (
+    "Consult a highly intelligent advisor model when you need expert guidance, "
+    "want to verify your reasoning, or face a complex decision. "
+    "Describe your question or challenge clearly in the 'question' field."
+)
diff --git a/litellm/llms/anthropic/common_utils.py b/litellm/llms/anthropic/common_utils.py
@@ -639,23 +639,33 @@ def get_token_counter(self) -> Optional[BaseTokenCounter]:
         return AnthropicTokenCounter()
 
 
-def strip_advisor_blocks_from_messages(messages: List[Any]) -> List[Any]:
+def strip_advisor_blocks_from_messages(
+    messages: List[Any], replace_with_text: bool = False
+) -> List[Any]:
     """
-    Remove server_tool_use (name='advisor') and advisor_tool_result blocks from
-    assistant message content when the advisor tool is absent from the request.
+    Remove (or replace) server_tool_use (name='advisor') and advisor_tool_result blocks
+    from assistant message content.
 
     Prevents Anthropic 400 invalid_request_error: if advisor_tool_result blocks
     exist in history but the advisor tool is not in the tools array, the API rejects
     the request. This happens when the user has removed the advisor tool for cost
     control or on a follow-up turn.
+
+    Args:
+        messages: Conversation history to process (mutated in-place).
+        replace_with_text: When True, replace the advisor exchange with an
+            <advisor_feedback> text block so the executor retains the semantic
+            context of what the advisor said.  When False (default), strip silently.
     """
     for message in messages:
         if not isinstance(message, dict) or message.get("role") != "assistant":
             continue
         content = message.get("content")
         if not isinstance(content, list):
             continue
-        advisor_ids: set = set()
+
+        # Collect advisor server_tool_use ids and their advice text (for replace mode).
+        advisor_id_to_text: dict = {}
         for block in content:
             if (
                 isinstance(block, dict)
@@ -664,26 +674,65 @@ def strip_advisor_blocks_from_messages(messages: List[Any]) -> List[Any]:
             ):
                 bid = block.get("id")
                 if bid:
-                    advisor_ids.add(bid)
-        if not advisor_ids:
+                    advisor_id_to_text[bid] = None  # text filled in below
+
+        if not advisor_id_to_text:
             continue
-        message["content"] = [
-            block
-            for block in content
-            if not (
-                isinstance(block, dict)
-                and (
-                    (
-                        block.get("type") == "server_tool_use"
-                        and block.get("name") == "advisor"
-                    )
-                    or (
-                        block.get("type") == "advisor_tool_result"
-                        and block.get("tool_use_id") in advisor_ids
+
+        # If replacing, collect the advisor response text from advisor_tool_result blocks.
+        if replace_with_text:
+            for block in content:
+                if (
+                    isinstance(block, dict)
+                    and block.get("type") == "advisor_tool_result"
+                    and block.get("tool_use_id") in advisor_id_to_text
+                ):
+                    raw = block.get("content") or ""
+                    text = (
+                        raw
+                        if isinstance(raw, str)
+                        else next(
+                            (
+                                b.get("text", "")
+                                for b in raw
+                                if isinstance(b, dict) and b.get("type") == "text"
+                            ),
+                            "",
+                        )
                     )
-                )
+                    advisor_id_to_text[block["tool_use_id"]] = text
+
+        new_content = []
+        for block in content:
+            if not isinstance(block, dict):
+                new_content.append(block)
+                continue
+            is_advisor_use = (
+                block.get("type") == "server_tool_use"
+                and block.get("name") == "advisor"
+                and block.get("id") in advisor_id_to_text
             )
-        ]
+            is_advisor_result = (
+                block.get("type") == "advisor_tool_result"
+                and block.get("tool_use_id") in advisor_id_to_text
+            )
+            if is_advisor_use:
+                if replace_with_text:
+                    advice = advisor_id_to_text.get(block.get("id")) or ""
+                    if advice:
+                        new_content.append(
+                            {
+                                "type": "text",
+                                "text": f"<advisor_feedback>\n{advice}\n</advisor_feedback>",
+                            }
+                        )
+                # else: drop silently
+            elif is_advisor_result:
+                pass  # always drop — replaced above (or stripped)
+            else:
+                new_content.append(block)
+
+        message["content"] = new_content
     return messages
 
 

diff --git a/litellm/llms/anthropic/experimental_pass_through/messages/handler.py b/litellm/llms/anthropic/experimental_pass_through/messages/handler.py
@@ -26,6 +26,7 @@
 
 from ..adapters.handler import LiteLLMMessagesToCompletionTransformationHandler
 from ..responses_adapters.handler import LiteLLMMessagesToResponsesAPIHandler
+from .interceptors import get_messages_interceptors
 from .utils import AnthropicMessagesRequestUtils, mock_response
 
 # Providers that are routed directly to the OpenAI Responses API instead of
@@ -236,6 +237,23 @@ async def anthropic_messages(
     if short_circuit_response is not None:
         return short_circuit_response
 
+    # Run registered MessagesInterceptors (e.g. advisor orchestration loop).
+    # api_key and api_base are explicit params (not in **kwargs) so pass them
+    # explicitly so interceptor sub-calls can route to the same backend.
+    for interceptor in get_messages_interceptors():
+        if interceptor.can_handle(tools, custom_llm_provider):
+            return await interceptor.handle(
+                model=model,
+                messages=messages,
+                tools=tools,
+                stream=original_stream,
+                max_tokens=max_tokens,
+                custom_llm_provider=custom_llm_provider,
+                api_key=api_key,
+                api_base=api_base,
+                **kwargs,
+            )
+
     loop = asyncio.get_event_loop()
     kwargs["is_async"] = True
 

diff --git a/litellm/llms/anthropic/experimental_pass_through/messages/interceptors/README.md b/litellm/llms/anthropic/experimental_pass_through/messages/interceptors/README.md
@@ -0,0 +1,62 @@
+# Messages Interceptors
+
+Interceptors are short-circuit handlers for the `/v1/messages` path. They run **before** the normal backend call and can fully replace it with their own response.
+
+## When to add an interceptor
+
+Use an interceptor when you need to **replace the backend call entirely** with your own logic — for example, running an orchestration loop, synthesizing a response from multiple sub-calls, or short-circuiting to a non-LLM backend.
+
+Use a **pre-request hook** (`_execute_pre_request_hooks` / `CustomLogger.async_pre_request_hook`) instead when you only need to **mutate request parameters** (tools, stream flag, metadata) before the normal call proceeds.
+
+| Scenario | Use |
+|---|---|
+| Replace the backend call with a loop or synthetic response | Interceptor |
+| Translate or strip tools before the call | Pre-request hook |
+| Feature that is always active (built-in LiteLLM behavior) | Interceptor |
+| Optional integration that operators register | `CustomLogger` callback |
+
+## How to add a new interceptor
+
+1. Create `your_feature.py` in this directory.
+2. Implement `MessagesInterceptor` from `base.py`:
+   - `can_handle(tools, custom_llm_provider) -> bool` — return True when your interceptor owns this request.
+   - `async handle(...) -> Union[AnthropicMessagesResponse, AsyncIterator]` — do your work and return the response.
+3. Register it in `__init__.py` by appending to `_interceptors`.
+
+```python
+# your_feature.py
+from .base import MessagesInterceptor
+
+class MyFeatureHandler(MessagesInterceptor):
+    def can_handle(self, tools, custom_llm_provider):
+        return some_condition(tools, custom_llm_provider)
+
+    async def handle(self, *, model, messages, tools, stream, max_tokens,
+                     custom_llm_provider, **kwargs):
+        ...
+        return response
+```
+
+```python
+# __init__.py
+from .your_feature import MyFeatureHandler
+
+_interceptors = [
+    AdvisorOrchestrationHandler(),
+    MyFeatureHandler(),  # add here
+]
+```
+
+## Existing interceptors
+
+### `AdvisorOrchestrationHandler`
+
+Handles `advisor_20260301` tool for providers that don't support it natively (all non-Anthropic providers for now).
+
+**Triggers when:** `advisor_20260301` is in `tools` AND `custom_llm_provider` is not in `ADVISOR_NATIVE_PROVIDERS`.
+
+**What it does:**
+- Translates the advisor tool to a regular function tool the provider understands.
+- Runs the executor model; when it calls the `advisor` tool, runs the advisor model and injects the result as a `tool_result`.
+- Loops until the executor produces a final text response or `max_uses` is exceeded.
+- Wraps the final response in `FakeAnthropicMessagesStreamIterator` if the caller requested streaming.
diff --git a/litellm/llms/anthropic/experimental_pass_through/messages/interceptors/__init__.py b/litellm/llms/anthropic/experimental_pass_through/messages/interceptors/__init__.py
@@ -0,0 +1,17 @@
+from typing import List
+
+from .advisor import AdvisorOrchestrationHandler
+from .base import MessagesInterceptor
+
+_interceptors: List[MessagesInterceptor] = [
+    AdvisorOrchestrationHandler(),
+]
+
+
+def get_messages_interceptors() -> List[MessagesInterceptor]:
+    """Return the list of active MessagesInterceptors.
+
+    Order matters: interceptors are tried in list order; the first one whose
+    ``can_handle()`` returns True wins.
+    """
+    return _interceptors