Fix langfuse otel traceparent propagation#24048
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile SummaryThis PR restores OpenTelemetry-standard
Confidence Score: 4/5
|
| Filename | Overview |
|---|---|
| litellm/integrations/opentelemetry.py | Core logic change: replaces hardcoded langfuse_otel callback check with configurable ignore_context_propagation field on OpenTelemetryConfig. The field is read from OTEL_IGNORE_CONTEXT_PROPAGATION env var in __post_init__, but str_to_bool(None) leaves it as None rather than False when the env var is absent, which is a minor type-consistency issue. |
| tests/test_litellm/integrations/test_opentelemetry.py | Tests correctly use if/elif (not match/case), parameterized.expand, and timezone-aware datetimes. However, test_handle_success_failure_default_preserves_parent_span instantiates OpenTelemetry without an explicit config, making it sensitive to OTEL_IGNORE_CONTEXT_PROPAGATION being set in the test runner environment, which could cause unexpected failures. |
| docs/my-website/docs/observability/langfuse_otel_integration.md | Documentation correctly documents OTEL_IGNORE_CONTEXT_PROPAGATION as an optional opt-in flag for both SDK and proxy usage, with appropriate comments indicating it is disabled by default. |
| pyproject.toml | Adds parameterized = "^0.9.0" to the dev dependency group, which is the correct placement for a test utility library. |
| poetry.lock | Lock file regenerated by Poetry 2.3.2, adding parameterized 0.9.0 and updating marker expressions for various packages. Changes appear to be auto-generated and consistent with the pyproject.toml updates. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["LangfuseOtelLogger / OpenTelemetry\n_handle_success / _handle_failure"] --> B["_get_span_context(kwargs)\nExtract parent_span & ctx"]
B --> C{"config.ignore_context_propagation\n== True?"}
C -- "Yes\n(OTEL_IGNORE_CONTEXT_PROPAGATION=true)" --> D["Set parent_span = None\nSet ctx = None\nCreate root-level span"]
C -- "No (default)\n(traceparent respected)" --> E{"parent_span is None\nOR USE_OTEL_LITELLM_REQUEST_SPAN?"}
E -- "Yes" --> F["Create primary litellm_request span\nparented to extracted ctx"]
E -- "No" --> G["Add attributes to existing\nparent_span directly"]
D --> H["Finish & export span"]
F --> H
G --> H
style D fill:#f9c,stroke:#c66
style C fill:#ff9,stroke:#990
Comments Outside Diff (1)
-
tests/test_litellm/integrations/test_opentelemetry.py, line 2244 (link)Test is susceptible to environment contamination
OpenTelemetry(tracer_provider=tracer_provider)with noconfigargument causesOpenTelemetryConfig.from_env()to be called, which readsOTEL_IGNORE_CONTEXT_PROPAGATIONfrom the environment in__post_init__. If this env var is set to"true"in the test runner (e.g. in CI or a developer's local environment),ignore_context_propagationwill beTrue, which means parent spans will be ignored — causing this test to fail despite the assertion that parent spans are preserved.The parallel test
test_handle_success_failure_with_context_propagation_preserves_parent_spanalready covers the explicit-Falsecase; this test should mirror that safety by also patching or isolating the env:Alternatively, wrap the setup with
patch.dict(os.environ, {}, clear=False)and removeOTEL_IGNORE_CONTEXT_PROPAGATIONfrom the env.
Last reviewed commit: "Code review"
| if self.config.ignore_context_propagation: | ||
| parent_span = None # Ignore parent spans from other providers | ||
| ctx = None |
There was a problem hiding this comment.
Backwards-incompatible change breaks existing
langfuse_otel users
The previous code unconditionally nulled parent spans for all langfuse_otel callbacks (the "CRITICAL FIX" comment even noted "Don't use parent spans from other providers as they cause trace corruption"). This PR replaces it with an opt-in ignore_context_propagation flag, but the LangfuseOtelLogger initialization path does not set this flag to True by default.
Looking at litellm/litellm_core_utils/litellm_logging.py, LangfuseOtelLogger is created as:
_otel_logger = LangfuseOtelLogger(config=None, callback_name="langfuse_otel")And LangfuseOtelLogger._create_open_telemetry_config_from_langfuse_env() creates an OpenTelemetryConfig(...) without setting ignore_context_propagation=True, so it falls back to reading OTEL_IGNORE_CONTEXT_PROPAGATION from the environment (which is unset for existing users).
The net effect is that all existing langfuse_otel users who did not set OTEL_IGNORE_CONTEXT_PROPAGATION=true will silently lose the parent span isolation that was previously automatic. Their traces may start showing parent-child relationships with spans from other providers, causing the "trace corruption" the original fix was designed to prevent.
To fix this while keeping the code configurable, LangfuseOtelLogger should default to ignore_context_propagation=True in its config construction, or the __post_init__ logic should preserve the previous automatic behavior for langfuse_otel.
There was a problem hiding this comment.
The traceparent header propagation is the OpenTelemetry standard, and it is being ignored after this recent PR #21309. That PR was a breaking change for existing users because the traces were no longer grouped in langfuse, and they were sent as separate traces. Ignoring the traceparent may be useful in some cases, but that should be an opt-in feature and not a default behavior since the OpenTelemetry standard should follow the propagation header.
There was a problem hiding this comment.
You're absolutely right. The OpenTelemetry standard dictates that traceparent headers should be propagated by default, and this PR correctly restores that behavior.
Looking at the code changes:
Before (PR #21309): Hardcoded to always ignore parent spans for langfuse_otel
is_langfuse_otel = (
hasattr(self, "callback_name") and self.callback_name == "langfuse_otel"
)
if is_langfuse_otel:
parent_span = None # Ignore parent spans from other providers
ctx = NoneAfter (this PR): Configurable via ignore_context_propagation flag, defaults to False
if self.config.ignore_context_propagation:
parent_span = None
ctx = NoneAnalysis:
- PR Litellm fix langfuse otel trace v2 #21309 was the breaking change that violated OpenTelemetry standards
- This PR correctly restores standard-compliant behavior
- Users who need trace isolation can opt-in via
OTEL_IGNORE_CONTEXT_PROPAGATION=true - The documentation updates clearly explain the optional flag
Recommendation: The only consideration is ensuring users who adapted to the post-#21309 behavior are aware of this change. The migration path is straightforward—they can set OTEL_IGNORE_CONTEXT_PROPAGATION=true to maintain isolated traces. The PR description could explicitly mention this is reverting the behavior from PR #21309 and note the migration path for affected users.
This change aligns with the custom rule about backwards compatibility—the opt-in flag provides a user-controlled way to preserve the old behavior without forcing non-standard behavior on everyone.
b1fd371 to
dbbec40
Compare
dbbec40 to
02ebd13
Compare
| if self.ignore_context_propagation is None: | ||
| self.ignore_context_propagation = str_to_bool( | ||
| os.getenv("OTEL_IGNORE_CONTEXT_PROPAGATION") | ||
| ) |
There was a problem hiding this comment.
ignore_context_propagation stays None when env var is unset
str_to_bool(None) returns None (not False) when OTEL_IGNORE_CONTEXT_PROPAGATION is absent. While if self.config.ignore_context_propagation: correctly treats None as falsy, keeping the field typed as Optional[bool] with a None sentinel after construction is slightly unexpected — callers who inspect the config object (e.g. for logging or assertions) cannot easily distinguish "explicitly set to False" from "env var not set".
Additionally, str_to_bool only recognises the literal strings "true" and "false". Common alternatives like "1", "yes", or "True" (capital T) all return None, which silently behaves as False. This may surprise users who set OTEL_IGNORE_CONTEXT_PROPAGATION=1 expecting it to take effect.
Consider normalising to False when the env var is not set:
| if self.ignore_context_propagation is None: | |
| self.ignore_context_propagation = str_to_bool( | |
| os.getenv("OTEL_IGNORE_CONTEXT_PROPAGATION") | |
| ) | |
| if self.ignore_context_propagation is None: | |
| self.ignore_context_propagation = ( | |
| str_to_bool(os.getenv("OTEL_IGNORE_CONTEXT_PROPAGATION")) or False | |
| ) |
There was a problem hiding this comment.
@mubashir1osmani this seems like a patch. Can you look into a better long-term fix here?
Relevant issues
Fixes #24027
Pre-Submission checklist
Please complete all items before asking a LiteLLM maintainer to review your PR
tests/test_litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unit@greptileaiand received a Confidence Score of at least 4/5 before requesting a maintainer reviewDelays in PR merge?
If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).
CI (LiteLLM team)
Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:
Type
🆕 New Feature
🐛 Bug Fix
🧹 Refactoring
📖 Documentation
🚄 Infrastructure
✅ Test
Changes