Skip to content

feat(advisor): advisor tool orchestration loop for non-Anthropic providers#25579

Merged
ishaan-berri merged 15 commits intolitellm_ishaan_april11from
feat/anthropic-advisor-tool
Apr 12, 2026
Merged

feat(advisor): advisor tool orchestration loop for non-Anthropic providers#25579
ishaan-berri merged 15 commits intolitellm_ishaan_april11from
feat/anthropic-advisor-tool

Conversation

@ishaan-berri
Copy link
Copy Markdown
Contributor

@ishaan-berri ishaan-berri commented Apr 12, 2026

Relevant issues

Fixes #25516

Pre-Submission checklist

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem

CI (LiteLLM team)

  • Branch creation CI run
    Link:
  • CI run for the last commit
    Link:
  • Merge / cherry-pick CI run
    Links:

Type

  • Bug fix
  • New feature

Changes

Implements the advisor tool loop for non-Anthropic providers. The advisor_20260301 tool lets a fast executor model (Sonnet/Haiku) consult a high-intelligence advisor (Opus) mid-generation. Anthropic handles this server-side natively; for all other providers LiteLLM now runs the orchestration loop itself.

How it works

User /messages request (OpenAI/Bedrock/Vertex/any non-Anthropic provider)
        │
        ▼
MessagesInterceptor registry
  └── AdvisorOrchestrationHandler.can_handle()
        ├── tools contains advisor_20260301?  YES
        └── provider is non-native?           YES  ──► intercept
                                               NO   ──► pass through to provider
        │
        ▼  (intercept path)
Strip advisor_20260301 from tools,
replace with synthetic function tool
        │
        ▼
┌─────────────────────────────────────────────────────────┐
│  Orchestration loop                                     │
│                                                         │
│   ┌─────────────────────────────────────────┐          │
│   │  EXECUTOR CALL  (non-streaming)         │          │
│   │  model: e.g. openai/gpt-4.1-mini        │          │
│   │  tools: [synthetic advisor fn tool, ...]│          │
│   └─────────────────┬───────────────────────┘          │
│                     │                                   │
│          ┌──────────┴──────────┐                       │
│          │ stop_reason?        │                       │
│          │                     │                       │
│    tool_use(advisor)      end_turn / other             │
│          │                     │                       │
│          ▼                     ▼                       │
│   ┌─────────────┐      ◄── EXIT LOOP                  │
│   │ ADVISOR     │          return final response        │
│   │ SUB-CALL    │          (or wrap in FakeStream)      │
│   │ model:      │                                       │
│   │  opus-4-6   │                                       │
│   │ no tools    │                                       │
│   └──────┬──────┘                                       │
│          │ advice text                                  │
│          ▼                                              │
│   inject tool_result into messages                      │
│   (or max_uses_exceeded error if cap hit)               │
│          │                                              │
│          └──────────────────────► repeat               │
└─────────────────────────────────────────────────────────┘

Advisor tool definition (caller sets this):

{
  "type": "advisor_20260301",
  "name": "advisor",
  "model": "claude-opus-4-6",
  "max_uses": 5,
  "api_base": "optional-proxy-url",
  "api_key":  "optional-key"
}

Live E2E results (gpt-4.1-mini executor + claude-opus-4-6 advisor)

Scenario 1 — Complex coding task (LRU cache)

USER: Implement a Python LRU Cache class with O(1) get and put.
      Use the advisor tool before you start writing code.

[EXECUTOR → LiteLLM]
  stop_reason: tool_use  (1.0s)
  TOOL_USE advisor — "What is the best way to implement a Python LRU Cache
  class that supports O(1) get and put? Outline the data structures involved."

[ADVISOR SUB-CALL → claude-opus-4-6]  (15.2s)
  Advice: Use a hash map + doubly linked list with dummy head/tail nodes.
  dict gives O(1) lookup; linked list gives O(1) move-to-front and eviction.

[EXECUTOR → LiteLLM]  (7.0s)
  stop_reason: max_tokens
  TEXT: "I have gathered the recommended approach... Now I will implement..."
  → class Node with __slots__; class LRUCache with get/put in O(1)

FINAL: 23.1s total | 1 advisor call | 0 advisor blocks in output ✓

Scenario 2 — Tricky concurrency (async bounded semaphore)

USER: Write a thread-safe Python bounded semaphore supporting async context
      managers and a non-blocking tryacquire(). Ask the advisor first.

[EXECUTOR → LiteLLM]
  stop_reason: tool_use  (0.9s)
  TOOL_USE advisor — "How to implement a thread-safe Python bounded semaphore
  with async context managers and non-blocking try_acquire()?"

[ADVISOR SUB-CALL → claude-opus-4-6]  (16.8s)
  Advice: Use asyncio.Lock for internal state; threading.Lock for cross-thread
  safety; FIFO deque for waiters; loop.call_soon_threadsafe for thread→async bridge.

[EXECUTOR → LiteLLM]  (6.5s)
  stop_reason: max_tokens
  TEXT: "I have consulted an advisor... Here is the complete implementation:"
  → class AsyncBoundedSemaphore with acquire/release/__aenter__/__aexit__/tryacquire

FINAL: 24.3s total | 1 advisor call | 0 advisor blocks in output ✓

Scenario 3 — max_uses=1 cap enforced

USER: Design a Python priority queue backed by a Fibonacci heap.
      Use the advisor as many times as you want.  (max_uses=1 set by caller)

[EXECUTOR → LiteLLM]
  stop_reason: tool_use  (0.8s)  → calls advisor (use 1/1)

[ADVISOR SUB-CALL → claude-opus-4-6]  (11.9s)
  Advice: Fibonacci heap with circular doubly-linked root list;
  O(1) insert/find-min, O(log n) amortized extract-min, O(1) decrease-key.

[EXECUTOR → LiteLLM]
  stop_reason: tool_use  (0.6s)  → tries to call advisor again

[LiteLLM injects error tool_result — no sub-call made]
  content: "Advisor unavailable: max_uses limit reached. Continue without advisor."

[EXECUTOR → LiteLLM]  (6.1s)
  stop_reason: max_tokens
  TEXT: "Here is the continuation... class FibonacciNode / class FibonacciHeap..."

FINAL: 19.4s total | 1 advisor call (cap respected) | 0 advisor blocks ✓

Scenario 4 — Trivial question (advisor not called)

USER: What does list.append() do in Python? One sentence.

[EXECUTOR → LiteLLM]
  stop_reason: end_turn  (0.5s)
  TEXT: "list.append() in Python adds a single element to the end of a list."

FINAL: 0.5s total | 0 advisor calls | clean passthrough ✓

Tests (11 unit tests, all passing)

  • can_handle routing (anthropic skips, non-anthropic fires)
  • Orchestration loop with mocked backends (3 iterations, advisor injected correctly)
  • max_uses cap — injects max_uses_exceeded error result, executor continues
  • Streaming — FakeAnthropicMessagesStreamIterator wraps final response
  • History stripping — prior advisor_tool_result blocks cleaned before re-send
  • Synthetic tool translation — advisor_20260301 → regular function tool
========================= 11 passed in 13.34s =========================

@gitguardian
Copy link
Copy Markdown

gitguardian bot commented Apr 12, 2026

️✅ There are no secrets present in this pull request anymore.

If these secrets were true positive and are still valid, we highly recommend you to revoke them.
While these secrets were previously flagged, we no longer have a reference to the
specific commits where they were detected. Once a secret has been leaked into a git
repository, you should consider it compromised, even if it was deleted immediately.
Find here more information about risks.


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq bot commented Apr 12, 2026

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing feat/anthropic-advisor-tool (dd87f3b) with main (f74d626)

Open in CodSpeed

@ishaan-berri ishaan-berri force-pushed the feat/anthropic-advisor-tool branch from 6b397f8 to 742e2fe Compare April 12, 2026 00:48
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 12, 2026

Codecov Report

❌ Patch coverage is 91.60305% with 11 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...ntal_pass_through/messages/interceptors/advisor.py 89.65% 9 Missing ⚠️
litellm/llms/anthropic/common_utils.py 92.30% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

…g exception

When the advisor loop hits max_uses, inject a tool_result error so the executor
sees the cap and continues without further advice — matches Anthropic server-side
behaviour (error_code: max_uses_exceeded).
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 12, 2026

Greptile Summary

This PR adds an orchestration loop for the advisor_20260301 tool on non-Anthropic providers (OpenAI, Bedrock, Vertex, etc.), intercepting the /messages path, translating the advisor tool to a regular function tool, and running an executor→advisor→executor loop until the model finishes or max_uses is exceeded. Previously flagged issues (AdvisorMaxIterationsError not defined, max_uses=0 falsy coercion) are fully resolved in this revision.

Confidence Score: 5/5

Safe to merge; only one minor dead-code cleanup remains.

All prior P0/P1 concerns are resolved. The one remaining finding is a dead helper function that has no effect on correctness. All 11 unit tests and integration tests are mocked and pass; the implementation logic is sound.

advisor.py — the unused _inject_max_uses_error function should be removed.

Important Files Changed

Filename Overview
litellm/llms/anthropic/experimental_pass_through/messages/interceptors/advisor.py Core orchestration loop — AdvisorMaxIterationsError is defined and tests pass, but _inject_max_uses_error is dead code never called anywhere in the loop.
litellm/llms/anthropic/experimental_pass_through/messages/handler.py Interceptor dispatch injected cleanly before the normal backend path; api_key/api_base forwarded explicitly as required.
litellm/llms/anthropic/common_utils.py strip_advisor_blocks_from_messages gains replace_with_text mode; backward-compatible default, shallow copy in handle() ensures original messages are not mutated.
litellm/constants.py Adds ADVISOR_NATIVE_PROVIDERS, ADVISOR_MAX_USES, ADVISOR_TOOL_DESCRIPTION constants; clean additions with no side-effects.
tests/test_litellm/llms/anthropic/messages/test_advisor_orchestration.py 11 unit tests; all paths mocked — no real network calls. AdvisorMaxIterationsError import resolved. Covers can_handle, loop, max_uses, streaming, history stripping, tool translation.
tests/test_litellm/llms/anthropic/experimental_pass_through/messages/test_advisor_integration.py Integration tests exercise the full anthropic_messages() dispatch path with mocked LLM sub-calls; covers native bypass and max_uses propagation.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant anthropic_messages
    participant AdvisorOrchestrationHandler
    participant Executor as Executor (e.g. openai/gpt-4.1-mini)
    participant Advisor as Advisor (e.g. claude-opus-4-6)

    Caller->>anthropic_messages: request(tools=[advisor_20260301], model=openai/...)
    anthropic_messages->>AdvisorOrchestrationHandler: can_handle? → True
    anthropic_messages->>AdvisorOrchestrationHandler: handle(...)

    loop Until end_turn or max_uses exceeded
        AdvisorOrchestrationHandler->>Executor: call(synthetic_advisor_tool)
        alt stop_reason = tool_use (advisor called)
            Executor-->>AdvisorOrchestrationHandler: tool_use(name=advisor, question=...)
            AdvisorOrchestrationHandler->>Advisor: sub-call(no tools, question)
            Advisor-->>AdvisorOrchestrationHandler: advice text
            AdvisorOrchestrationHandler->>AdvisorOrchestrationHandler: inject tool_result into messages
        else stop_reason = end_turn
            Executor-->>AdvisorOrchestrationHandler: final text response
            AdvisorOrchestrationHandler-->>Caller: response (or FakeStream if stream=True)
        end
    end

    Note over AdvisorOrchestrationHandler: If iteration > max_uses → raise AdvisorMaxIterationsError
Loading

Reviews (4): Last reviewed commit: "docs(advisor): move supported providers ..." | Re-trigger Greptile

Comment on lines +132 to +139
iteration += 1
if iteration > max_uses:
# Per Anthropic spec: inject max_uses_exceeded error result so the
# executor sees the cap and continues without further advice.
current_messages = _inject_max_uses_error(
current_messages, executor_response, advisor_use_block
)
continue
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0 Infinite loop + missing AdvisorMaxIterationsError class

After iteration > max_uses, the code injects the error message and continues — but there is no exit condition. If the executor keeps calling the advisor tool on the next iteration, iteration is incremented again, iteration > max_uses is still true, another error is injected, and the loop runs forever. Additionally, the test test_loop_max_uses_raises imports AdvisorMaxIterationsError from this module, but the class is never defined here (or anywhere in the codebase), so that test will fail with ImportError at runtime.

The fix is to define the exception class and raise it instead of continue-ing after the cap is exceeded:

class AdvisorMaxIterationsError(Exception):
    """Raised when the advisor orchestration loop exceeds max_uses."""
    pass
            iteration += 1
            if iteration > max_uses:
                raise AdvisorMaxIterationsError(
                    f"Advisor tool called more than max_uses={max_uses} times. "
                    "Increase max_uses in the advisor tool definition to allow more iterations."
                )

raise ValueError(
"advisor tool definition must include a 'model' field specifying the advisor model"
)
max_uses: int = advisor_tool.get("max_uses") or ADVISOR_MAX_USES
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 max_uses=0 silently ignored

advisor_tool.get("max_uses") or ADVISOR_MAX_USES evaluates 0 as falsy, so a caller setting "max_uses": 0 to disable advisor calls gets the default limit (5) instead of an immediate cap. Use an explicit None check to preserve intent:

Suggested change
max_uses: int = advisor_tool.get("max_uses") or ADVISOR_MAX_USES
max_uses: int = advisor_tool.get("max_uses") if advisor_tool.get("max_uses") is not None else ADVISOR_MAX_USES

Comment on lines +221 to +224
from litellm.llms.anthropic.experimental_pass_through.messages.interceptors.advisor import (
AdvisorMaxIterationsError,
AdvisorOrchestrationHandler,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0 Import of non-existent AdvisorMaxIterationsError will fail

AdvisorMaxIterationsError is not defined anywhere in advisor.py or any other module in this PR. This import will raise ImportError at test runtime, making test_loop_max_uses_raises always fail. Once the class is defined in advisor.py (see the companion comment on the loop), this import will work correctly.

@ishaan-berri
Copy link
Copy Markdown
Contributor Author

@greptile review

@vercel
Copy link
Copy Markdown

vercel bot commented Apr 12, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Apr 12, 2026 1:29am

Request Review

@ishaan-berri ishaan-berri changed the base branch from main to litellm_ishaan_april11 April 12, 2026 01:32
@ishaan-berri ishaan-berri merged commit 329a526 into litellm_ishaan_april11 Apr 12, 2026
47 of 48 checks passed
@ishaan-berri ishaan-berri deleted the feat/anthropic-advisor-tool branch April 12, 2026 01:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(advisor): Anthropic Claude Code /advisor rollout

2 participants