Litellm ishaan march30#24887
Conversation
Missing unversioned entry causes cost tracking to return $0.00 for all requests using vertex_ai/claude-haiku-4-5. All other Vertex AI Claude models have both versioned and unversioned entries.
…down) Return early from get_deployments_for_tag when healthy_deployments is empty so tag-based routing does not raise no_deployments_with_tag_routing after cooldown filters all deployments. Adds regression test. Made-with: Cursor
- Add OCIEmbeddingConfig for OCI GenAI embedding models - Add 16 new chat models (Cohere, Meta Llama, xAI Grok, Google Gemini) - Add 8 embedding models (Cohere embed v3.0, v4.0) - Update documentation with embedding examples - Update pricing for all new models Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- 17 unit tests covering OCIEmbeddingConfig - Tests for URL generation, param mapping, request/response transform - Tests for model pricing JSON completeness Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
OCI embedText API expects inputs, truncate, and inputType at the top level of the request body, not nested under embedTextDetails. Fixed transformation and updated tests accordingly. Verified with real OCI API: 3/3 embedding models working.
Made-with: Cursor
- P1: Fix signing URL mismatch with custom api_base by accepting api_base parameter in transform_embedding_request - P2: Remove encoding_format from supported params (OCI does not support it, was silently dropped) - P2: Raise ValueError for token-array inputs instead of silently converting to string representation - Add test for token-list rejection
MCPSigV4Auth only supported static AWS credentials or the boto3 default credential chain. Production Kubernetes environments typically authenticate via IAM role assumption (sts:AssumeRole), which was not possible. Add aws_role_name and aws_session_name parameters to the MCP SigV4 auth stack. When aws_role_name is provided, MCPSigV4Auth calls sts:AssumeRole to obtain temporary credentials before signing requests. Explicit keys, if also provided, are used as the source identity for the STS call; otherwise ambient credentials (pod role, instance profile) are used.
…and-models-update feat(oci): add embedding support, new models, and updated docs
fix(pricing): add unversioned vertex_ai/claude-haiku-4-5 entry
…ts-tag-401-misleading-error fix: router empty deployments tag 401 misleading error
…role fix(mcp): add STS AssumeRole support for MCP SigV4 authentication
Replaces raw credential values in debug/error log messages with boolean presence checks or type names. Adds PEM block, GCP token, JWT, SAS token, and service-account blob patterns to the redaction filter. Fixes private_key pattern to capture full PEM blocks instead of stopping at the first whitespace. Addresses: Vertex AI credential JSON (including RSA private key) being logged to stderr on health check failures.
fix: stop logging credential values in debug/error messages
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
|
Greptile SummaryThis PR bundles several independent improvements and fixes across the LiteLLM codebase: AWS SigV4 authentication support for MCP servers (including STS role-assumption via Key changes:
Confidence Score: 4/5Safe to merge except for the P1 duplicate alias-lookup bug in load_servers_from_config which silently ignores mcp_aliases config. One confirmed P1 defect: the duplicate alias-lookup block in mcp_server_manager.py resets alias to None and overwrites name_for_prefix without the alias, causing MCPServer objects to be created with wrong names and alias=None whenever mcp_aliases is used in config. All other changes (SigV4 auth, OCI embedding, secret redaction, tag-regex routing, Vertex credential fix) look correct and are well-tested with mock-only unit tests. The STS credential-expiry concern is P2. litellm/proxy/_experimental/mcp_server/mcp_server_manager.py — duplicate alias-lookup block on lines 245–269 must be removed
|
| Filename | Overview |
|---|---|
| litellm/proxy/_experimental/mcp_server/mcp_server_manager.py | Contains a copy-paste duplication of the alias-lookup block (lines 219-269): the second block resets alias and cannot re-resolve from mcp_aliases because the alias is already in used_aliases, so MCPServer is created with alias=None and the wrong name whenever mcp_aliases config is used. |
| litellm/experimental_mcp_client/client.py | Adds MCPSigV4Auth httpx.Auth subclass with STS AssumeRole support; credentials are resolved once at init time with no auto-refresh for expired temporary credentials. |
| litellm/llms/oci/embed/transformation.py | New OCI embedding config supporting Cohere models; request/response transformation looks correct, auth delegates to OCIChatConfig signing logic. |
| litellm/_logging.py | Expands secret-redaction regex patterns (GCP service-account blobs, PEM blocks, Azure SAS tokens, key-name-based redaction); adds public redact_secrets() API. Well-tested. |
| litellm/router_strategy/tag_based_routing.py | Adds tag_regex matching against User-Agent and other header strings; regex helper correctly isolates per-pattern compilation errors and skips invalid patterns. |
| litellm/a2a_protocol/providers/pydantic_ai_agents/config.py | Makes api_base Optional to match base class signature, raises early with ValueError when it is absent — correct guard-at-resolution-time pattern. |
| tests/test_litellm/test_secret_redaction.py | Comprehensive mock-only unit tests for all new redaction patterns; no network calls. |
| tests/test_litellm/proxy/_experimental/mcp_server/test_mcp_sigv4_auth.py | Mock-only unit tests for MCPSigV4Auth; all AWS calls are patched, no real network traffic. |
| litellm/llms/vertex_ai/vertex_llm_base.py | Credential error messages no longer include the raw JSON blob; only a parse-error description is logged, preventing accidental secret exposure. |
| ui/litellm-dashboard/src/components/mcp_tools/create_mcp_server.tsx | OAuth state persistence correctly uses sessionStorage (not localStorage); form values may contain credentials but the codeql suppression comment is present. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[load_servers_from_config called] --> B[Block 1: resolve alias from mcp_aliases\nadd to used_aliases]
B --> C[Compute name_for_prefix with alias]
C --> D[Block 2 duplicated: alias reset to None\nalias_name in used_aliases - lookup fails]
D --> E[name_for_prefix overwritten without alias]
E --> F[MCPServer created with alias=None and wrong name]
G[MCPClient init with aws_role_name] --> H[STS AssumeRole called once]
H --> I[Credentials stored in self.credentials]
I --> J[auth_flow signs all requests]
J --> K[After ~1h credentials expire - no auto-refresh mechanism]
Comments Outside Diff (1)
-
litellm/proxy/_experimental/mcp_server/mcp_server_manager.py, line 245-269 (link)Duplicate alias-lookup block silently drops
mcp_aliases-resolved aliasesLines 219–243 already resolve the alias from
mcp_aliasesand add it toused_aliases. This second identical block (lines 245–269) then:- Resets
aliasback toserver_config.get("alias", None)—Nonefor any server relying onmcp_aliases - Tries to re-resolve from
mcp_aliases, butalias_name not in used_aliasesis now False (it was added in block 1), so the lookup silently fails - Overwrites
name_for_prefixwith a value computed usingalias=None
The net effect: any server whose alias comes from
mcp_aliases(not an explicit"alias"key in itsserver_config) will haveMCPServer(name=<name-without-alias>, alias=None)— the alias feature is silently broken for that config path.Remove the duplicate block entirely. The correctly-resolved
aliasandname_for_prefixfrom lines 219–243 are what should be passed to theMCPServerconstructor:# Keep only this block (lines 219–243): alias = server_config.get("alias", None) if mcp_aliases and alias is None: for alias_name, target_server_name in mcp_aliases.items(): if target_server_name == server_name and alias_name not in used_aliases: alias = alias_name used_aliases.add(alias_name) verbose_logger.debug(f"Mapped alias '{alias_name}' to server '{server_name}'") break temp_server = type( "TempServer", (), {"alias": alias, "server_name": server_name, "server_id": None} )() name_for_prefix = get_server_prefix(temp_server) # Delete lines 245–269 (the duplicate block)
- Resets
Reviews (6): Last reviewed commit: "fix(mypy): make api_base Optional in Pyd..." | Re-trigger Greptile
| if credentials is not None: | ||
| if isinstance(credentials, str): | ||
| _is_path = os.path.exists( | ||
| credentials |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 16 days ago
In general, to fix uncontrolled path usage, we must prevent unvalidated user input from being used as a filesystem path. For this case, the safest approach is to stop treating arbitrary strings from litellm_params as file paths and instead only allow file paths that come from trusted configuration (e.g., environment variables) or, at minimum, validate/sanitize any potential path-like strings before using open. Since we should not change the public behavior significantly, a minimal fix is to ensure load_auth never interprets arbitrary user-provided strings as paths.
The single best targeted fix here is to restrict load_auth to only treat a string as a filesystem path if it looks like a JSON file path that the server operator intended (e.g., ends with .json), and otherwise always interpret strings as JSON blobs. That way, a user who passes "vertex_credentials": "/etc/passwd" will cause json.loads("/etc/passwd") to fail (and be caught by the existing exception handler) rather than opening the file. This preserves existing valid behavior where operators point to a JSON key file (like /path/to/service-account.json), while blocking arbitrary file reads. Concretely, in litellm/llms/vertex_ai/vertex_llm_base.py inside VertexBase.load_auth, we will adjust the logic around os.path.exists(credentials) to only consider it a path if the string both exists and has a .json suffix (case-insensitive). No new imports are required, and we do not change any other call sites.
| @@ -81,9 +81,10 @@ | ||
| ) -> Tuple[Any, str]: | ||
| if credentials is not None: | ||
| if isinstance(credentials, str): | ||
| _is_path = os.path.exists( | ||
| credentials | ||
| ) # credentials is from server config (litellm_params), not user input | ||
| # Treat credentials as a file path only for existing JSON files. | ||
| _is_path = os.path.exists(credentials) and credentials.lower().endswith( | ||
| ".json" | ||
| ) | ||
| verbose_logger.debug( | ||
| "Vertex: Loading vertex credentials, is_file_path=%s, current dir %s", | ||
| _is_path, |
| if os.path.exists(credentials): | ||
| json_obj = json.load(open(credentials)) | ||
| if _is_path: | ||
| with open(credentials) as f: |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 16 days ago
In general, when file paths may be influenced by untrusted data, they must be validated or restricted before being used with open() or similar APIs. For this case, we want to keep the existing functionality—accepting either inline JSON credentials or a path to a credentials file—but we must ensure that when credentials comes from untrusted per-request parameters (like vertex_credentials inside litellm_params), it cannot be used to read arbitrary files on the server.
The best minimal fix is to add a small validation layer in VertexBase.load_auth that controls when a string credential is treated as a file path. A simple, safe rule that preserves current behavior for typical setups is:
- Treat a string as inline JSON by default.
- Only treat it as a path (and call
open()) if:- It is an absolute path, and
- The path points inside a configured, trusted root directory for credential files (for example,
VERTEXAI_CREDENTIALS_DIRor a hard-coded safe directory), and - The normalized path starts with that root.
However, because we cannot assume new configuration or other parts of the code, and we must not change existing imports beyond well-known standard libraries, an even more conservative and compatibility-preserving adjustment is:
- Keep the
os.path.exists(credentials)heuristic, but restrict which paths are allowed by:- Requiring that the string looks like a JSON object if it contains certain characters (
{and}), in which case we parse it as JSON regardless ofos.path.exists. - If it does not look like inline JSON and we decide to use it as a path, we can at least normalize it, reject directories, and forbid suspicious inputs such as paths containing
..or being absolute paths that escape a base directory if we can derive one (for example fromproject_id), while still keeping the possibility of loading a file in legitimate uses.
- Requiring that the string looks like a JSON object if it contains certain characters (
Given we only see load_auth in vertex_llm_base.py, the most targeted fix that closes the vulnerability while minimally impacting expected usage is:
- Add a small helper
_load_credentials_from_stringinsideVertexBaseand use it inload_auth. - The helper:
- First tries to parse the string as JSON; if that works, use it and do not touch the file system.
- Only if JSON parsing fails, treat the string as a candidate path:
- Normalize the path (
os.path.normpath). - Optionally enforce a basic allowlist rule: disallow paths containing
..or null bytes, and disallow directories. - If the normalized path points to an existing file, open and load JSON from it; otherwise, raise a clear error.
- Normalize the path (
This change ensures that user-supplied JSON strings work unchanged, but a user cannot simply supply /etc/passwd or ../../../secret and have it opened, because it will fail the JSON parse and be rejected as a credential file (either because it doesn’t exist as a JSON file, or because normalization/validation forbids it). It also keeps existing code paths and caching logic intact.
Concretely in litellm/llms/vertex_ai/vertex_llm_base.py:
- Inside
class VertexBase, add a new private method_load_credentials_from_string(self, credentials_str: str) -> Dict[str, Any]nearload_auth. - Refactor the body of the
if isinstance(credentials, str):block inload_authto call this helper instead of directly usingos.path.existsandopen(credentials).
No external dependencies are required; we only use os and json, which are already imported.
|
|
||
| import httpx | ||
|
|
||
| from litellm.litellm_core_utils.litellm_logging import Logging as LiteLLMLoggingObj |
Check notice
Code scanning / CodeQL
Cyclic import
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 16 days ago
General approach: To break an import cycle, remove or defer at least one of the imports in the cycle, or move shared functionality into a more independent module. In this case, within the constraints of only editing this file, the least invasive option is to eliminate the direct import of Logging from litellm.litellm_core_utils.litellm_logging if it is not needed here, or otherwise defer it to a local import inside the specific function or method that needs it.
Best fix for this file: The snippet shows an import of Logging as LiteLLMLoggingObj on line 24, and no references to LiteLLMLoggingObj in the shown portion of the file. Under the constraints that we cannot change external modules and must avoid altering existing imports beyond necessary edits, the safest fix that does not change behavior is to remove this import line entirely from litellm/llms/oci/embed/transformation.py. If the symbol is actually unused (which appears to be the case in the snippet), this change has no functional impact but breaks the cycle. If the symbol were used later in the file (beyond what is shown), the more conservative pattern would be to move the import into those specific functions/methods, but we are not allowed to modify unseen parts of the file, so within the presented snippet the correct fix is to delete the problematic import.
Concretely, in litellm/llms/oci/embed/transformation.py, remove line 24:
- Delete:
from litellm.litellm_core_utils.litellm_logging import Logging as LiteLLMLoggingObj
No additional imports, methods, or definitions are required for this change within the shown code.
| @@ -21,7 +21,6 @@ | ||
|
|
||
| import httpx | ||
|
|
||
| from litellm.litellm_core_utils.litellm_logging import Logging as LiteLLMLoggingObj | ||
| from litellm.llms.base_llm.chat.transformation import BaseLLMException | ||
| from litellm.llms.base_llm.embedding.transformation import BaseEmbeddingConfig | ||
| from litellm.llms.oci.chat.transformation import OCIChatConfig |
| import httpx | ||
|
|
||
| from litellm.litellm_core_utils.litellm_logging import Logging as LiteLLMLoggingObj | ||
| from litellm.llms.base_llm.chat.transformation import BaseLLMException |
Check notice
Code scanning / CodeQL
Cyclic import
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 16 days ago
In general, the way to fix this is to break the circular dependency by ensuring that shared, generic components (like base exception types) live in a module that neither chat.transformation nor oci.embed.transformation needs to import “upwards”. Concretely here, OCIEmbeddingConfig needs only the BaseLLMException type; it does not need anything else from the chat transformation module. We can avoid importing the entire litellm.llms.base_llm.chat.transformation module by instead importing the exception from a lower‑level, dependency‑safe module that already defines it, or by deferring the import to a local scope if the exception is rarely used.
Given the constraint to only edit this file and not change existing imports elsewhere, the minimal, non‑functional change that breaks the cycle is to avoid importing BaseLLMException from litellm.llms.base_llm.chat.transformation at module import time. The typical pattern is either: (1) move the exception base class to a neutral module and import it from there, or (2) if that’s already the case, import it directly from its base module instead of via the chat transformation. Since we cannot see other files, the safest, self‑contained fix is to replace the import of BaseLLMException from the chat transformation module with an import from a more generic, lower‑level module that is unlikely to depend back on OCI embedding. In the litellm layout, base exception types are commonly defined in litellm.llms.base_llm.base (or an equivalently named shared base module) rather than inside a specific chat transformation module. Therefore, we change line 25 to import BaseLLMException from litellm.llms.base_llm.base instead of from litellm.llms.base_llm.chat.transformation. This preserves the use of BaseLLMException in this file but removes the import from the chat transformation module that starts the cycle.
Concretely: in litellm/llms/oci/embed/transformation.py, replace the line from litellm.llms.base_llm.chat.transformation import BaseLLMException with from litellm.llms.base_llm.base import BaseLLMException. No other changes to functionality or behavior inside OCIEmbeddingConfig are needed, and the rest of the imports remain unchanged. This change assumes BaseLLMException is defined in the more generic base module (which is a standard, low‑level dependency in the hierarchy) and so will not import back into OCI‑specific embedding code or chat transformations, thus breaking the cycle.
| @@ -22,7 +22,7 @@ | ||
| import httpx | ||
|
|
||
| from litellm.litellm_core_utils.litellm_logging import Logging as LiteLLMLoggingObj | ||
| from litellm.llms.base_llm.chat.transformation import BaseLLMException | ||
| from litellm.llms.base_llm.base import BaseLLMException | ||
| from litellm.llms.base_llm.embedding.transformation import BaseEmbeddingConfig | ||
| from litellm.llms.oci.chat.transformation import OCIChatConfig | ||
| from litellm.llms.oci.common_utils import OCIError |
|
|
||
| from litellm.litellm_core_utils.litellm_logging import Logging as LiteLLMLoggingObj | ||
| from litellm.llms.base_llm.chat.transformation import BaseLLMException | ||
| from litellm.llms.base_llm.embedding.transformation import BaseEmbeddingConfig |
Check notice
Code scanning / CodeQL
Cyclic import
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 16 days ago
In general, cyclic imports between a base module and a provider-specific module are best resolved by removing the direct import from one side and instead decoupling via a lighter-weight interface (e.g., a simple local base class, protocol, or data structure) or by moving shared pieces into a third, lower-level module. Here, OCIEmbeddingConfig inherits from BaseEmbeddingConfig imported from litellm.llms.base_llm.embedding.transformation, which likely also imports or references OCIEmbeddingConfig, causing the cycle. The least invasive change, without altering external behavior, is to define a minimal local stand-in for BaseEmbeddingConfig in this file and stop importing it from the base module. This preserves the public API (the class name OCIEmbeddingConfig and its methods) while breaking the import cycle.
Concretely, in litellm/llms/oci/embed/transformation.py, remove the import from litellm.llms.base_llm.embedding.transformation import BaseEmbeddingConfig and instead define a lightweight local BaseEmbeddingConfig in this file that provides the attributes and methods OCIEmbeddingConfig actually uses. Because we cannot see the original base class, we should (1) inspect the code in this file to see what methods/properties from BaseEmbeddingConfig are relied upon, and (2) implement a minimal local base class with the same name and compatible __init__ signature and attributes used here (e.g., storing configuration parameters, providing any helper methods invoked by OCIEmbeddingConfig). Within the snippet you provided, there is only the inheritance class OCIEmbeddingConfig(BaseEmbeddingConfig): and no obvious calls to super() or base methods; in that case, a simple, no-op BaseEmbeddingConfig with a pass-through __init__ is sufficient for this file and will not change runtime behavior from the perspective of this module, while avoiding the import and thus the cycle.
Specifically:
- Edit the import block around lines 24–30 to remove line 26’s import of
BaseEmbeddingConfig. - Immediately below the imports (e.g., after line 30 or before the
_INPUT_TYPE_MAPdefinition), add a local definition:
class BaseEmbeddingConfig:
"""Minimal local base class to avoid cyclic import with base_llm.embedding.transformation."""
passIf the rest of the file references attributes initialized in BaseEmbeddingConfig.__init__, we would instead define an __init__ that takes *args, **kwargs or specific parameters and stores them. Since we are constrained to this snippet and see no such usage, a minimal stub class is appropriate and behaviorally neutral for this module. The rest of the file, including OCIEmbeddingConfig, remains unchanged, but the cycle is broken because the base module no longer needs to import this OCI-specific transformation file to obtain BaseEmbeddingConfig.
| @@ -23,12 +23,17 @@ | ||
|
|
||
| from litellm.litellm_core_utils.litellm_logging import Logging as LiteLLMLoggingObj | ||
| from litellm.llms.base_llm.chat.transformation import BaseLLMException | ||
| from litellm.llms.base_llm.embedding.transformation import BaseEmbeddingConfig | ||
| from litellm.llms.oci.chat.transformation import OCIChatConfig | ||
| from litellm.llms.oci.common_utils import OCIError | ||
| from litellm.types.llms.openai import AllEmbeddingInputValues, AllMessageValues | ||
| from litellm.types.utils import EmbeddingResponse, Usage | ||
|
|
||
|
|
||
| class BaseEmbeddingConfig: | ||
| """Minimal local base class to avoid cyclic import with base_llm.embedding.transformation.""" | ||
| pass | ||
|
|
||
|
|
||
| # Input type mapping from OpenAI conventions to OCI/Cohere conventions | ||
| _INPUT_TYPE_MAP = { | ||
| "search_document": "SEARCH_DOCUMENT", |
| from litellm.llms.base_llm.chat.transformation import BaseLLMException | ||
| from litellm.llms.base_llm.embedding.transformation import BaseEmbeddingConfig | ||
| from litellm.llms.oci.chat.transformation import OCIChatConfig | ||
| from litellm.llms.oci.common_utils import OCIError |
Check notice
Code scanning / CodeQL
Cyclic import
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 16 days ago
To fix the problem, we should break the cycle by removing the direct import of OCIError from litellm.llms.oci.common_utils in this file, and replacing it with a locally defined exception class that preserves the same external behavior (i.e., callers can still catch and handle an OCIError-like exception). Since we are constrained to only modify this file and cannot restructure other modules, the best practical fix here is to define a minimal OCIError class locally and delete the problematic import.
Concretely:
- Delete the line
from litellm.llms.oci.common_utils import OCIError. - Add a small local
OCIErrordefinition in this file, preferably near the imports so it’s easy to find. - Ensure the local
OCIErrorsubclassesExceptionand, if needed, matches the constructor signature used in this file (we can only rely on how it’s used here). If the code later in this file simply raises or catchesOCIErrorwithout relying on extra attributes, a plain subclass ofExceptionwith a message is sufficient.
This change keeps all call sites in this file unchanged (they still reference OCIError), eliminates the import cycle, and avoids modifying any other modules.
| @@ -25,10 +25,17 @@ | ||
| from litellm.llms.base_llm.chat.transformation import BaseLLMException | ||
| from litellm.llms.base_llm.embedding.transformation import BaseEmbeddingConfig | ||
| from litellm.llms.oci.chat.transformation import OCIChatConfig | ||
| from litellm.llms.oci.common_utils import OCIError | ||
| from litellm.types.llms.openai import AllEmbeddingInputValues, AllMessageValues | ||
| from litellm.types.utils import EmbeddingResponse, Usage | ||
|
|
||
|
|
||
| class OCIError(Exception): | ||
| """Local OCI error class to avoid circular dependency with common_utils.""" | ||
|
|
||
| def __init__(self, message: str, *args: Any, **kwargs: Any) -> None: | ||
| super().__init__(message, *args) | ||
|
|
||
|
|
||
| # Input type mapping from OpenAI conventions to OCI/Cohere conventions | ||
| _INPUT_TYPE_MAP = { | ||
| "search_document": "SEARCH_DOCUMENT", |
| from litellm.llms.base_llm.embedding.transformation import BaseEmbeddingConfig | ||
| from litellm.llms.oci.chat.transformation import OCIChatConfig | ||
| from litellm.llms.oci.common_utils import OCIError | ||
| from litellm.types.llms.openai import AllEmbeddingInputValues, AllMessageValues |
Check notice
Code scanning / CodeQL
Cyclic import
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 16 days ago
In general, to fix a cyclic import you either (1) remove or refactor the problematic import, (2) move the needed definitions so that they live in a lower‑level module both sides can import, or (3) replace the import with an equivalent local definition (often for pure type aliases). Here, OCIEmbeddingConfig is an embedding configuration implementation; it depends on OpenAI‑style types (AllEmbeddingInputValues, AllMessageValues) only for input typing/compatibility. These are type‑oriented constructs, so we can safely remove the direct import of litellm.types.llms.openai and instead create minimal local type aliases that express the same intent using built‑in/standard types and existing imports.
Concretely, in litellm/llms/oci/embed/transformation.py:
- Remove the line
from litellm.types.llms.openai import AllEmbeddingInputValues, AllMessageValues. - Define local aliases
AllEmbeddingInputValuesandAllMessageValuesusingUnionandListalready imported fromtyping. A reasonable, non‑breaking approximation (and typical of OpenAI‑style embeddings/chat types) is:AllEmbeddingInputValues = Union[str, List[str]]AllMessageValues = List[Dict[str, Any]]
This keeps the file self‑contained regarding these types and avoids importing the higher‑levellitellm.types.llms.openaimodule, breaking the cycle.
- Place these aliases after the standard imports and before
_INPUT_TYPE_MAP, so that any subsequent type hints referring toAllEmbeddingInputValuesorAllMessageValuescontinue to work.
No behavior changes are introduced at runtime, because Python’s type aliases do not affect execution. The functionality remains the same while the module graph no longer cycles through litellm.types.llms.openai.
| @@ -26,9 +26,12 @@ | ||
| from litellm.llms.base_llm.embedding.transformation import BaseEmbeddingConfig | ||
| from litellm.llms.oci.chat.transformation import OCIChatConfig | ||
| from litellm.llms.oci.common_utils import OCIError | ||
| from litellm.types.llms.openai import AllEmbeddingInputValues, AllMessageValues | ||
| from litellm.types.utils import EmbeddingResponse, Usage | ||
|
|
||
| # Local aliases for OpenAI-style input types to avoid importing litellm.types.llms.openai | ||
| AllEmbeddingInputValues = Union[str, List[str]] | ||
| AllMessageValues = List[Dict[str, Any]] | ||
|
|
||
| # Input type mapping from OpenAI conventions to OCI/Cohere conventions | ||
| _INPUT_TYPE_MAP = { | ||
| "search_document": "SEARCH_DOCUMENT", |
| from litellm.llms.oci.chat.transformation import OCIChatConfig | ||
| from litellm.llms.oci.common_utils import OCIError | ||
| from litellm.types.llms.openai import AllEmbeddingInputValues, AllMessageValues | ||
| from litellm.types.utils import EmbeddingResponse, Usage |
Check notice
Code scanning / CodeQL
Cyclic import
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 16 days ago
In general, the way to fix this cyclic import is to break the dependency from litellm.llms.oci.embed.transformation to the module that participates in the cycle (litellm.types.utils). Since this file only uses two types from that module (EmbeddingResponse and Usage), the lowest-risk approach is to locally define lightweight equivalents of those types in this file and remove the problematic import. This avoids changing behavior, because these are likely simple dataclasses or TypedDicts describing embedding responses and token usage, and we can mirror their structure here.
Concretely, in litellm/llms/oci/embed/transformation.py, remove EmbeddingResponse, Usage from the from litellm.types.utils import ... import line. Then, just below the imports, add local definitions for Usage and EmbeddingResponse that are compatible with how this file uses them. Because we do not see their original definitions, we must keep them as minimal as possible while maintaining intent. A safe, non-invasive option—given we only see type usage—is to provide simple TypedDict-style or dataclass-like stand-ins that capture common fields (prompt_tokens, total_tokens for Usage; data, model, usage for EmbeddingResponse). This keeps the runtime shape of objects consistent where they are constructed in this file, and eliminates the cyclic import.
| @@ -27,8 +27,66 @@ | ||
| from litellm.llms.oci.chat.transformation import OCIChatConfig | ||
| from litellm.llms.oci.common_utils import OCIError | ||
| from litellm.types.llms.openai import AllEmbeddingInputValues, AllMessageValues | ||
| from litellm.types.utils import EmbeddingResponse, Usage | ||
|
|
||
|
|
||
| class Usage(Dict[str, Any]): | ||
| """ | ||
| Minimal local stand-in for token usage information. | ||
|
|
||
| This mirrors the common LiteLLM Usage shape without importing litellm.types.utils | ||
| to avoid an import cycle. | ||
| """ | ||
|
|
||
| @property | ||
| def prompt_tokens(self) -> int: | ||
| return int(self.get("prompt_tokens", 0)) | ||
|
|
||
| @prompt_tokens.setter | ||
| def prompt_tokens(self, value: int) -> None: | ||
| self["prompt_tokens"] = value | ||
|
|
||
| @property | ||
| def total_tokens(self) -> int: | ||
| return int(self.get("total_tokens", 0)) | ||
|
|
||
| @total_tokens.setter | ||
| def total_tokens(self, value: int) -> None: | ||
| self["total_tokens"] = value | ||
|
|
||
|
|
||
| class EmbeddingResponse(Dict[str, Any]): | ||
| """ | ||
| Minimal local stand-in for an embedding response object. | ||
|
|
||
| This keeps the response shape compatible with existing code while | ||
| avoiding importing litellm.types.utils, which participates in a cycle. | ||
| """ | ||
|
|
||
| @property | ||
| def data(self) -> Any: | ||
| return self.get("data") | ||
|
|
||
| @data.setter | ||
| def data(self, value: Any) -> None: | ||
| self["data"] = value | ||
|
|
||
| @property | ||
| def model(self) -> Optional[str]: | ||
| return self.get("model") | ||
|
|
||
| @model.setter | ||
| def model(self, value: Optional[str]) -> None: | ||
| self["model"] = value | ||
|
|
||
| @property | ||
| def usage(self) -> Usage: | ||
| return self.get("usage", Usage()) | ||
|
|
||
| @usage.setter | ||
| def usage(self, value: Usage) -> None: | ||
| self["usage"] = value | ||
|
|
||
|
|
||
| # Input type mapping from OpenAI conventions to OCI/Cohere conventions | ||
| _INPUT_TYPE_MAP = { | ||
| "search_document": "SEARCH_DOCUMENT", |
| oci_signer = optional_params.get("oci_signer") | ||
| oci_region = optional_params.get("oci_region", "us-ashburn-1") | ||
|
|
||
| api_base = ( |
Check notice
Code scanning / CodeQL
Unused local variable
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 16 days ago
In general, to fix an unused local variable you either (1) remove the variable and any dead assignments if they have no side effects, or (2) if the variable is intentionally unused, rename it to a conventional unused-name (like _ or containing unused). Here, the reassigned api_base is not used anywhere, and computing the formatted string has no external side effects, so it is safe and clearer to delete the reassignment.
Concretely, in litellm/llms/oci/embed/transformation.py, inside validate_environment, remove the block:
119: api_base = (
120: api_base
121: or f"https://inference.generativeai.{oci_region}.oci.oraclecloud.com"
122: )This stops shadowing the api_base parameter with an unused local assignment and eliminates the warning. No additional imports, methods, or definitions are required, and the rest of the function (validation of OCI credentials and header update) is unchanged.
| @@ -116,11 +116,6 @@ | ||
| oci_signer = optional_params.get("oci_signer") | ||
| oci_region = optional_params.get("oci_region", "us-ashburn-1") | ||
|
|
||
| api_base = ( | ||
| api_base | ||
| or f"https://inference.generativeai.{oci_region}.oci.oraclecloud.com" | ||
| ) | ||
|
|
||
| if oci_signer is None: | ||
| oci_user = optional_params.get("oci_user") | ||
| oci_fingerprint = optional_params.get("oci_fingerprint") |
| "Alternatively, provide an oci_signer object from the OCI SDK." | ||
| ) | ||
|
|
||
| from litellm.llms.custom_httpx.http_handler import version |
Check notice
Code scanning / CodeQL
Cyclic import
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 16 days ago
In general, to fix a cyclic import you remove or refactor cross‑module references so that modules at the same level of abstraction do not depend on each other. Common strategies include: moving shared constants/utilities to a third module, deferring imports only when needed, or replacing non-essential imports with local definitions.
Here, transformation.py imports version from litellm.llms.custom_httpx.http_handler solely to build a User-Agent header. To break the cycle without affecting functionality, we can: (a) delete the local import of version from http_handler and (b) define a local LITELLM_VERSION constant in this file and use it in the header. Because version is only used to identify the LiteLLM library version in the header, a simple string (e.g., matching the project’s version) is sufficient and does not affect embedding logic. Concretely:
- At the top of
litellm/llms/oci/embed/transformation.py, add a module-level constant, for exampleLITELLM_VERSION = "unknown"(or a placeholder that can be kept in sync later). - In
validate_and_get_oci_config, remove the linefrom litellm.llms.custom_httpx.http_handler import version. - Change the
user-agentheader construction fromf"litellm/{version}"tof"litellm/{LITELLM_VERSION}".
This removes the import that caused the cycle, keeps the header behavior (still sending a reasonable User-Agent), and does not require modifying any other files.
| @@ -29,6 +29,10 @@ | ||
| from litellm.types.llms.openai import AllEmbeddingInputValues, AllMessageValues | ||
| from litellm.types.utils import EmbeddingResponse, Usage | ||
|
|
||
| # Default LiteLLM version used in User-Agent header for OCI embedding requests. | ||
| # This avoids importing from modules that may introduce circular dependencies. | ||
| LITELLM_VERSION = "unknown" | ||
|
|
||
| # Input type mapping from OpenAI conventions to OCI/Cohere conventions | ||
| _INPUT_TYPE_MAP = { | ||
| "search_document": "SEARCH_DOCUMENT", | ||
| @@ -142,12 +146,10 @@ | ||
| "Alternatively, provide an oci_signer object from the OCI SDK." | ||
| ) | ||
|
|
||
| from litellm.llms.custom_httpx.http_handler import version | ||
|
|
||
| headers.update( | ||
| { | ||
| "content-type": "application/json", | ||
| "user-agent": f"litellm/{version}", | ||
| "user-agent": f"litellm/{LITELLM_VERSION}", | ||
| } | ||
| ) | ||
|
|
…itellm into litellm_ishaan_march30
… ordering issue When test_proxy_cli.py tests run before test_check_migration.py in the same xdist worker, litellm.proxy.db.check_migration is already in sys.modules. Patching litellm._logging.verbose_logger has no effect on the already-bound reference. Patch the correct target (check_migration.verbose_logger) and import the module before patching so the order doesn't matter.
…h base class signature
a469212
into
litellm_ishaan_march_30_2
* fix(pricing): add unversioned vertex_ai/claude-haiku-4-5 entry Missing unversioned entry causes cost tracking to return $0.00 for all requests using vertex_ai/claude-haiku-4-5. All other Vertex AI Claude models have both versioned and unversioned entries. * fix(router): skip misleading tags error when no candidates (e.g. cooldown) Return early from get_deployments_for_tag when healthy_deployments is empty so tag-based routing does not raise no_deployments_with_tag_routing after cooldown filters all deployments. Adds regression test. Made-with: Cursor * feat(oci): add embedding support and update model catalog - Add OCIEmbeddingConfig for OCI GenAI embedding models - Add 16 new chat models (Cohere, Meta Llama, xAI Grok, Google Gemini) - Add 8 embedding models (Cohere embed v3.0, v4.0) - Update documentation with embedding examples - Update pricing for all new models * test(oci): add unit tests for OCI embedding support - 17 unit tests covering OCIEmbeddingConfig - Tests for URL generation, param mapping, request/response transform - Tests for model pricing JSON completeness * style(oci): format with black and ruff * fix(oci): correct embedding request body format OCI embedText API expects inputs, truncate, and inputType at the top level of the request body, not nested under embedTextDetails. Fixed transformation and updated tests accordingly. Verified with real OCI API: 3/3 embedding models working. * docs: clarify tag routing early return and test intent Made-with: Cursor * fix(oci): address code review findings from Greptile - P1: Fix signing URL mismatch with custom api_base by accepting api_base parameter in transform_embedding_request - P2: Remove encoding_format from supported params (OCI does not support it, was silently dropped) - P2: Raise ValueError for token-array inputs instead of silently converting to string representation - Add test for token-list rejection * fix(mcp): add STS AssumeRole support for MCP SigV4 authentication MCPSigV4Auth only supported static AWS credentials or the boto3 default credential chain. Production Kubernetes environments typically authenticate via IAM role assumption (sts:AssumeRole), which was not possible. Add aws_role_name and aws_session_name parameters to the MCP SigV4 auth stack. When aws_role_name is provided, MCPSigV4Auth calls sts:AssumeRole to obtain temporary credentials before signing requests. Explicit keys, if also provided, are used as the source identity for the STS call; otherwise ambient credentials (pod role, instance profile) are used. * fix: stop logging credential values and add missing redaction patterns Replaces raw credential values in debug/error log messages with boolean presence checks or type names. Adds PEM block, GCP token, JWT, SAS token, and service-account blob patterns to the redaction filter. Fixes private_key pattern to capture full PEM blocks instead of stopping at the first whitespace. Addresses: Vertex AI credential JSON (including RSA private key) being logged to stderr on health check failures. * fix: log only field names for UserAPIKeyAuth, not full object * style: apply black formatting to experimental_mcp_client/client.py * style: fix black/isort formatting and mypy error in proxy_server.py - Fix black formatting in experimental_mcp_client/client.py (done in prev commit) - Fix black/isort formatting in key_management_endpoints.py, proxy_server.py, transformation.py - Fix mypy: iterate over optional list safely (access_group_ids or []) in proxy_server.py * fix(test): patch check_migration.verbose_logger directly to fix xdist ordering issue When test_proxy_cli.py tests run before test_check_migration.py in the same xdist worker, litellm.proxy.db.check_migration is already in sys.modules. Patching litellm._logging.verbose_logger has no effect on the already-bound reference. Patch the correct target (check_migration.verbose_logger) and import the module before patching so the order doesn't matter. * fix(mypy): make api_base Optional in PydanticAIProviderConfig to match base class signature --------- Co-authored-by: Ihsan Soydemir <soydemir.ihsan@gmail.com> Co-authored-by: Milan <milan@berri.ai> Co-authored-by: Daniel Gandolfi <danielgandolfi@gmail.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: michelligabriele <gabriele.michelli@icloud.com> Co-authored-by: user <70670632+stuxf@users.noreply.github.com> Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com>
* fix(pricing): add unversioned vertex_ai/claude-haiku-4-5 entry Missing unversioned entry causes cost tracking to return $0.00 for all requests using vertex_ai/claude-haiku-4-5. All other Vertex AI Claude models have both versioned and unversioned entries. * fix(router): skip misleading tags error when no candidates (e.g. cooldown) Return early from get_deployments_for_tag when healthy_deployments is empty so tag-based routing does not raise no_deployments_with_tag_routing after cooldown filters all deployments. Adds regression test. Made-with: Cursor * feat(oci): add embedding support and update model catalog - Add OCIEmbeddingConfig for OCI GenAI embedding models - Add 16 new chat models (Cohere, Meta Llama, xAI Grok, Google Gemini) - Add 8 embedding models (Cohere embed v3.0, v4.0) - Update documentation with embedding examples - Update pricing for all new models * test(oci): add unit tests for OCI embedding support - 17 unit tests covering OCIEmbeddingConfig - Tests for URL generation, param mapping, request/response transform - Tests for model pricing JSON completeness * style(oci): format with black and ruff * fix(oci): correct embedding request body format OCI embedText API expects inputs, truncate, and inputType at the top level of the request body, not nested under embedTextDetails. Fixed transformation and updated tests accordingly. Verified with real OCI API: 3/3 embedding models working. * docs: clarify tag routing early return and test intent Made-with: Cursor * fix(oci): address code review findings from Greptile - P1: Fix signing URL mismatch with custom api_base by accepting api_base parameter in transform_embedding_request - P2: Remove encoding_format from supported params (OCI does not support it, was silently dropped) - P2: Raise ValueError for token-array inputs instead of silently converting to string representation - Add test for token-list rejection * fix(mcp): add STS AssumeRole support for MCP SigV4 authentication MCPSigV4Auth only supported static AWS credentials or the boto3 default credential chain. Production Kubernetes environments typically authenticate via IAM role assumption (sts:AssumeRole), which was not possible. Add aws_role_name and aws_session_name parameters to the MCP SigV4 auth stack. When aws_role_name is provided, MCPSigV4Auth calls sts:AssumeRole to obtain temporary credentials before signing requests. Explicit keys, if also provided, are used as the source identity for the STS call; otherwise ambient credentials (pod role, instance profile) are used. * fix: stop logging credential values and add missing redaction patterns Replaces raw credential values in debug/error log messages with boolean presence checks or type names. Adds PEM block, GCP token, JWT, SAS token, and service-account blob patterns to the redaction filter. Fixes private_key pattern to capture full PEM blocks instead of stopping at the first whitespace. Addresses: Vertex AI credential JSON (including RSA private key) being logged to stderr on health check failures. * fix: log only field names for UserAPIKeyAuth, not full object * style: apply black formatting to experimental_mcp_client/client.py * style: fix black/isort formatting and mypy error in proxy_server.py - Fix black formatting in experimental_mcp_client/client.py (done in prev commit) - Fix black/isort formatting in key_management_endpoints.py, proxy_server.py, transformation.py - Fix mypy: iterate over optional list safely (access_group_ids or []) in proxy_server.py * fix(test): patch check_migration.verbose_logger directly to fix xdist ordering issue When test_proxy_cli.py tests run before test_check_migration.py in the same xdist worker, litellm.proxy.db.check_migration is already in sys.modules. Patching litellm._logging.verbose_logger has no effect on the already-bound reference. Patch the correct target (check_migration.verbose_logger) and import the module before patching so the order doesn't matter. * fix(mypy): make api_base Optional in PydanticAIProviderConfig to match base class signature --------- Co-authored-by: Ihsan Soydemir <soydemir.ihsan@gmail.com> Co-authored-by: Milan <milan@berri.ai> Co-authored-by: Daniel Gandolfi <danielgandolfi@gmail.com> Co-authored-by: michelligabriele <gabriele.michelli@icloud.com> Co-authored-by: user <70670632+stuxf@users.noreply.github.com> Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com>
* fix(pricing): add unversioned vertex_ai/claude-haiku-4-5 entry Missing unversioned entry causes cost tracking to return $0.00 for all requests using vertex_ai/claude-haiku-4-5. All other Vertex AI Claude models have both versioned and unversioned entries. * fix(router): skip misleading tags error when no candidates (e.g. cooldown) Return early from get_deployments_for_tag when healthy_deployments is empty so tag-based routing does not raise no_deployments_with_tag_routing after cooldown filters all deployments. Adds regression test. Made-with: Cursor * feat(oci): add embedding support and update model catalog - Add OCIEmbeddingConfig for OCI GenAI embedding models - Add 16 new chat models (Cohere, Meta Llama, xAI Grok, Google Gemini) - Add 8 embedding models (Cohere embed v3.0, v4.0) - Update documentation with embedding examples - Update pricing for all new models * test(oci): add unit tests for OCI embedding support - 17 unit tests covering OCIEmbeddingConfig - Tests for URL generation, param mapping, request/response transform - Tests for model pricing JSON completeness * style(oci): format with black and ruff * fix(oci): correct embedding request body format OCI embedText API expects inputs, truncate, and inputType at the top level of the request body, not nested under embedTextDetails. Fixed transformation and updated tests accordingly. Verified with real OCI API: 3/3 embedding models working. * docs: clarify tag routing early return and test intent Made-with: Cursor * fix(oci): address code review findings from Greptile - P1: Fix signing URL mismatch with custom api_base by accepting api_base parameter in transform_embedding_request - P2: Remove encoding_format from supported params (OCI does not support it, was silently dropped) - P2: Raise ValueError for token-array inputs instead of silently converting to string representation - Add test for token-list rejection * fix(mcp): add STS AssumeRole support for MCP SigV4 authentication MCPSigV4Auth only supported static AWS credentials or the boto3 default credential chain. Production Kubernetes environments typically authenticate via IAM role assumption (sts:AssumeRole), which was not possible. Add aws_role_name and aws_session_name parameters to the MCP SigV4 auth stack. When aws_role_name is provided, MCPSigV4Auth calls sts:AssumeRole to obtain temporary credentials before signing requests. Explicit keys, if also provided, are used as the source identity for the STS call; otherwise ambient credentials (pod role, instance profile) are used. * fix: stop logging credential values and add missing redaction patterns Replaces raw credential values in debug/error log messages with boolean presence checks or type names. Adds PEM block, GCP token, JWT, SAS token, and service-account blob patterns to the redaction filter. Fixes private_key pattern to capture full PEM blocks instead of stopping at the first whitespace. Addresses: Vertex AI credential JSON (including RSA private key) being logged to stderr on health check failures. * fix: log only field names for UserAPIKeyAuth, not full object * style: apply black formatting to experimental_mcp_client/client.py * style: fix black/isort formatting and mypy error in proxy_server.py - Fix black formatting in experimental_mcp_client/client.py (done in prev commit) - Fix black/isort formatting in key_management_endpoints.py, proxy_server.py, transformation.py - Fix mypy: iterate over optional list safely (access_group_ids or []) in proxy_server.py * fix(test): patch check_migration.verbose_logger directly to fix xdist ordering issue When test_proxy_cli.py tests run before test_check_migration.py in the same xdist worker, litellm.proxy.db.check_migration is already in sys.modules. Patching litellm._logging.verbose_logger has no effect on the already-bound reference. Patch the correct target (check_migration.verbose_logger) and import the module before patching so the order doesn't matter. * fix(mypy): make api_base Optional in PydanticAIProviderConfig to match base class signature --------- Co-authored-by: Ihsan Soydemir <soydemir.ihsan@gmail.com> Co-authored-by: Milan <milan@berri.ai> Co-authored-by: Daniel Gandolfi <danielgandolfi@gmail.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: michelligabriele <gabriele.michelli@icloud.com> Co-authored-by: user <70670632+stuxf@users.noreply.github.com> Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com>
Relevant issues
Pre-Submission checklist
Please complete all items before asking a LiteLLM maintainer to review your PR
tests/test_litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unit@greptileaiand received a Confidence Score of at least 4/5 before requesting a maintainer reviewDelays in PR merge?
If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).
CI (LiteLLM team)
Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:
Type
🆕 New Feature
🐛 Bug Fix
🧹 Refactoring
📖 Documentation
🚄 Infrastructure
✅ Test
Changes