Conversation
…R_TOOL_DESCRIPTION constants
…g exception When the advisor loop hits max_uses, inject a tool_result error so the executor sees the cap and continues without further advice — matches Anthropic server-side behaviour (error_code: max_uses_exceeded).
…, provider bypass
…tive provider table
… litellm native loop
feat(advisor): advisor tool orchestration loop for non-Anthropic providers
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
|
Greptile SummaryThis PR adds a native orchestration loop for Anthropic's Confidence Score: 4/5Mostly safe to merge; one P1 in advisor.py where dead code diverges from the documented native-Anthropic max_uses behavior. The interceptor infrastructure and orchestration loop are well-designed and well-tested. The P1 finding is that litellm/llms/anthropic/experimental_pass_through/messages/interceptors/advisor.py — dead
|
| Filename | Overview |
|---|---|
| litellm/llms/anthropic/experimental_pass_through/messages/interceptors/advisor.py | Core orchestration handler: implements the executor↔advisor loop. Two issues: (1) _inject_max_uses_error is dead code while the loop raises an exception instead, diverging from native Anthropic behavior; (2) advisor sub-call doesn't forward **kwargs. |
| litellm/llms/anthropic/experimental_pass_through/messages/interceptors/base.py | Clean ABC definition for the interceptor plugin system; no issues found. |
| litellm/llms/anthropic/experimental_pass_through/messages/interceptors/init.py | Registers interceptors; simple and correct. |
| litellm/llms/anthropic/experimental_pass_through/messages/handler.py | Wires interceptor dispatch into the messages handler; correctly placed before the normal backend path and passes api_key/api_base explicitly. |
| litellm/llms/anthropic/common_utils.py | Extends strip_advisor_blocks_from_messages with replace_with_text mode; logic is correct and well-tested. |
| litellm/constants.py | Adds three advisor-related constants; no issues. |
| tests/test_litellm/llms/anthropic/experimental_pass_through/messages/test_advisor_integration.py | Integration tests for the full dispatch path; all LLM calls are mocked via patch, no real network calls. |
| tests/test_litellm/llms/anthropic/messages/test_advisor_orchestration.py | Unit tests for handler logic; comprehensive coverage of edge cases (can_handle, streaming, multi-turn, max_uses cap). All mocked. |
Sequence Diagram
sequenceDiagram
participant Caller
participant anthropic_messages
participant InterceptorRegistry
participant AdvisorOrchestrationHandler
participant ExecutorLLM as Executor LLM
participant AdvisorLLM as Advisor LLM
Caller->>anthropic_messages: request with advisor_20260301 tool
anthropic_messages->>InterceptorRegistry: get_messages_interceptors()
anthropic_messages->>AdvisorOrchestrationHandler: can_handle(tools, provider)?
AdvisorOrchestrationHandler-->>anthropic_messages: True
anthropic_messages->>AdvisorOrchestrationHandler: handle(...)
loop Until no advisor tool_use or max_uses exceeded
AdvisorOrchestrationHandler->>ExecutorLLM: _call_messages_handler(synthetic tool)
ExecutorLLM-->>AdvisorOrchestrationHandler: tool_use(name=advisor)
AdvisorOrchestrationHandler->>AdvisorLLM: _call_messages_handler(tools=None)
AdvisorLLM-->>AdvisorOrchestrationHandler: advice text
AdvisorOrchestrationHandler->>AdvisorOrchestrationHandler: _inject_advisor_turn
end
AdvisorOrchestrationHandler->>ExecutorLLM: final executor call
ExecutorLLM-->>AdvisorOrchestrationHandler: final text response
AdvisorOrchestrationHandler-->>Caller: clean response
Reviews (1): Last reviewed commit: "Merge pull request #25579 from BerriAI/f..." | Re-trigger Greptile
| def _inject_max_uses_error( | ||
| messages: List[Dict], | ||
| executor_response: Any, | ||
| advisor_use_block: Dict, | ||
| ) -> List[Dict]: | ||
| """ | ||
| Inject a max_uses_exceeded error tool_result so the executor continues | ||
| without further advisor calls (mirrors Anthropic's server-side behaviour). | ||
| """ | ||
| executor_content = ( | ||
| executor_response.get("content") if isinstance(executor_response, dict) else [] | ||
| ) or [] | ||
| tool_use_id = advisor_use_block.get("id", "") | ||
| return [ | ||
| *messages, | ||
| {"role": "assistant", "content": executor_content}, | ||
| { | ||
| "role": "user", | ||
| "content": [ | ||
| { | ||
| "type": "tool_result", | ||
| "tool_use_id": tool_use_id, | ||
| "content": "Advisor unavailable: max_uses limit reached. Continue without advisor guidance.", | ||
| } | ||
| ], | ||
| }, | ||
| ] |
There was a problem hiding this comment.
Dead code diverges from documented native-Anthropic behavior
_inject_max_uses_error is never called — confirmed with a global search. Its docstring says it "mirrors Anthropic's server-side behaviour" (inject an error tool_result so the executor gracefully continues), but the actual loop raises AdvisorMaxIterationsError instead, which propagates as an unhandled exception to the caller. On a production proxy this lands as a 500 error, whereas Anthropic native returns a structured response. Either wire up _inject_max_uses_error in place of the raise, or remove it and update the orchestration docs to clearly state the behavior diverges from native.
| advisor_response: AnthropicMessagesResponse = await _call_messages_handler( | ||
| model=advisor_model, | ||
| messages=advisor_messages, | ||
| tools=None, | ||
| stream=False, | ||
| max_tokens=max_tokens, | ||
| custom_llm_provider=None, # let litellm resolve from model name | ||
| metadata={ | ||
| **metadata_base, | ||
| "advisor_sub_call": True, | ||
| "parent_request_id": parent_request_id, | ||
| }, | ||
| api_key=advisor_api_key, | ||
| api_base=advisor_api_base, | ||
| ) |
There was a problem hiding this comment.
Advisor sub-call silently drops request-level kwargs
The executor call at line 114 correctly forwards **kwargs (carrying timeout, custom headers, tracing spans, etc.), but this advisor sub-call omits **kwargs entirely and only threads the two explicit routing overrides. Any caller-set timeout, OpenTelemetry parent span, or proxy-routing config won't reach the advisor model, potentially causing silent mis-routing or different retry behaviour between the two legs. Consider forwarding **kwargs here with the already-popped keys excluded (metadata, litellm_call_id are already handled above).
Relevant issues
Pre-Submission checklist
Please complete all items before asking a LiteLLM maintainer to review your PR
tests/test_litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unit@greptileaiand received a Confidence Score of at least 4/5 before requesting a maintainer reviewDelays in PR merge?
If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).
CI (LiteLLM team)
Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:
Screenshots / Proof of Fix
Type
🆕 New Feature
🐛 Bug Fix
🧹 Refactoring
📖 Documentation
🚄 Infrastructure
✅ Test
Changes