Skip to content

Litellm ishaan april11#25586

Merged
ishaan-berri merged 16 commits intomainfrom
litellm_ishaan_april11
Apr 13, 2026
Merged

Litellm ishaan april11#25586
ishaan-berri merged 16 commits intomainfrom
litellm_ishaan_april11

Conversation

@ishaan-berri
Copy link
Copy Markdown
Contributor

Relevant issues

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Screenshots / Proof of Fix

Type

🆕 New Feature
🐛 Bug Fix
🧹 Refactoring
📖 Documentation
🚄 Infrastructure
✅ Test

Changes

ishaan-berri and others added 16 commits April 11, 2026 17:43
…g exception

When the advisor loop hits max_uses, inject a tool_result error so the executor
sees the cap and continues without further advice — matches Anthropic server-side
behaviour (error_code: max_uses_exceeded).
feat(advisor): advisor tool orchestration loop for non-Anthropic providers
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 12, 2026

@vercel
Copy link
Copy Markdown

vercel bot commented Apr 12, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Apr 12, 2026 1:39am

Request Review

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq bot commented Apr 12, 2026

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing litellm_ishaan_april11 (329a526) with main (12c1467)

Open in CodSpeed

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 12, 2026

Greptile Summary

This PR adds a native orchestration loop for Anthropic's advisor_20260301 tool on non-Anthropic providers (OpenAI, Bedrock, Vertex, etc.). It introduces an interceptors/ plugin system for the /messages path, a new AdvisorOrchestrationHandler that translates the advisor tool and drives the executor↔advisor loop, and extends strip_advisor_blocks_from_messages to replace past advisor blocks with inline <advisor_feedback> text so history stays clean for non-Anthropic providers.

Confidence Score: 4/5

Mostly safe to merge; one P1 in advisor.py where dead code diverges from the documented native-Anthropic max_uses behavior.

The interceptor infrastructure and orchestration loop are well-designed and well-tested. The P1 finding is that _inject_max_uses_error is never called: the loop raises AdvisorMaxIterationsError (an uncaught exception reaching the caller as a 500) while the dead function's docstring says it should mirror Anthropic's graceful server-side continuation. This behavioral gap needs resolving before merge. The P2 finding (missing **kwargs on the advisor sub-call) is a quality gap but not immediately breaking.

litellm/llms/anthropic/experimental_pass_through/messages/interceptors/advisor.py — dead _inject_max_uses_error and missing kwargs forwarding on the advisor sub-call.

Important Files Changed

Filename Overview
litellm/llms/anthropic/experimental_pass_through/messages/interceptors/advisor.py Core orchestration handler: implements the executor↔advisor loop. Two issues: (1) _inject_max_uses_error is dead code while the loop raises an exception instead, diverging from native Anthropic behavior; (2) advisor sub-call doesn't forward **kwargs.
litellm/llms/anthropic/experimental_pass_through/messages/interceptors/base.py Clean ABC definition for the interceptor plugin system; no issues found.
litellm/llms/anthropic/experimental_pass_through/messages/interceptors/init.py Registers interceptors; simple and correct.
litellm/llms/anthropic/experimental_pass_through/messages/handler.py Wires interceptor dispatch into the messages handler; correctly placed before the normal backend path and passes api_key/api_base explicitly.
litellm/llms/anthropic/common_utils.py Extends strip_advisor_blocks_from_messages with replace_with_text mode; logic is correct and well-tested.
litellm/constants.py Adds three advisor-related constants; no issues.
tests/test_litellm/llms/anthropic/experimental_pass_through/messages/test_advisor_integration.py Integration tests for the full dispatch path; all LLM calls are mocked via patch, no real network calls.
tests/test_litellm/llms/anthropic/messages/test_advisor_orchestration.py Unit tests for handler logic; comprehensive coverage of edge cases (can_handle, streaming, multi-turn, max_uses cap). All mocked.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant anthropic_messages
    participant InterceptorRegistry
    participant AdvisorOrchestrationHandler
    participant ExecutorLLM as Executor LLM
    participant AdvisorLLM as Advisor LLM

    Caller->>anthropic_messages: request with advisor_20260301 tool
    anthropic_messages->>InterceptorRegistry: get_messages_interceptors()
    anthropic_messages->>AdvisorOrchestrationHandler: can_handle(tools, provider)?
    AdvisorOrchestrationHandler-->>anthropic_messages: True
    anthropic_messages->>AdvisorOrchestrationHandler: handle(...)

    loop Until no advisor tool_use or max_uses exceeded
        AdvisorOrchestrationHandler->>ExecutorLLM: _call_messages_handler(synthetic tool)
        ExecutorLLM-->>AdvisorOrchestrationHandler: tool_use(name=advisor)
        AdvisorOrchestrationHandler->>AdvisorLLM: _call_messages_handler(tools=None)
        AdvisorLLM-->>AdvisorOrchestrationHandler: advice text
        AdvisorOrchestrationHandler->>AdvisorOrchestrationHandler: _inject_advisor_turn
    end

    AdvisorOrchestrationHandler->>ExecutorLLM: final executor call
    ExecutorLLM-->>AdvisorOrchestrationHandler: final text response
    AdvisorOrchestrationHandler-->>Caller: clean response
Loading

Reviews (1): Last reviewed commit: "Merge pull request #25579 from BerriAI/f..." | Re-trigger Greptile

Comment on lines +294 to +320
def _inject_max_uses_error(
messages: List[Dict],
executor_response: Any,
advisor_use_block: Dict,
) -> List[Dict]:
"""
Inject a max_uses_exceeded error tool_result so the executor continues
without further advisor calls (mirrors Anthropic's server-side behaviour).
"""
executor_content = (
executor_response.get("content") if isinstance(executor_response, dict) else []
) or []
tool_use_id = advisor_use_block.get("id", "")
return [
*messages,
{"role": "assistant", "content": executor_content},
{
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": tool_use_id,
"content": "Advisor unavailable: max_uses limit reached. Continue without advisor guidance.",
}
],
},
]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Dead code diverges from documented native-Anthropic behavior

_inject_max_uses_error is never called — confirmed with a global search. Its docstring says it "mirrors Anthropic's server-side behaviour" (inject an error tool_result so the executor gracefully continues), but the actual loop raises AdvisorMaxIterationsError instead, which propagates as an unhandled exception to the caller. On a production proxy this lands as a 500 error, whereas Anthropic native returns a structured response. Either wire up _inject_max_uses_error in place of the raise, or remove it and update the orchestration docs to clearly state the behavior diverges from native.

Comment on lines +150 to +164
advisor_response: AnthropicMessagesResponse = await _call_messages_handler(
model=advisor_model,
messages=advisor_messages,
tools=None,
stream=False,
max_tokens=max_tokens,
custom_llm_provider=None, # let litellm resolve from model name
metadata={
**metadata_base,
"advisor_sub_call": True,
"parent_request_id": parent_request_id,
},
api_key=advisor_api_key,
api_base=advisor_api_base,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Advisor sub-call silently drops request-level kwargs

The executor call at line 114 correctly forwards **kwargs (carrying timeout, custom headers, tracing spans, etc.), but this advisor sub-call omits **kwargs entirely and only threads the two explicit routing overrides. Any caller-set timeout, OpenTelemetry parent span, or proxy-routing config won't reach the advisor model, potentially causing silent mis-routing or different retry behaviour between the two legs. Consider forwarding **kwargs here with the already-popped keys excluded (metadata, litellm_call_id are already handled above).

@ishaan-berri ishaan-berri merged commit 548225e into main Apr 13, 2026
102 of 108 checks passed
@ishaan-berri ishaan-berri deleted the litellm_ishaan_april11 branch April 13, 2026 21:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants