Skip to content

Fix langfuse otel traceparent propagation#24048

Merged
4 commits merged intoBerriAI:mainfrom
jyeros:fix-langfuse-otel-traceparent-propagation
Mar 18, 2026
Merged

Fix langfuse otel traceparent propagation#24048
4 commits merged intoBerriAI:mainfrom
jyeros:fix-langfuse-otel-traceparent-propagation

Conversation

@jyeros
Copy link
Copy Markdown
Contributor

@jyeros jyeros commented Mar 18, 2026

Relevant issues

Fixes #24027

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Type

🆕 New Feature
🐛 Bug Fix
🧹 Refactoring
📖 Documentation
🚄 Infrastructure
✅ Test

Changes

@vercel
Copy link
Copy Markdown

vercel bot commented Mar 18, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Mar 18, 2026 9:27pm

Request Review

@jyeros
Copy link
Copy Markdown
Contributor Author

jyeros commented Mar 18, 2026

@greptileai

@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq bot commented Mar 18, 2026

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing jyeros:fix-langfuse-otel-traceparent-propagation (02ebd13) with main (af8363a)

Open in CodSpeed

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 18, 2026

Greptile Summary

This PR restores OpenTelemetry-standard traceparent propagation for langfuse_otel by replacing the hardcoded callback_name == "langfuse_otel" guard (introduced in PR #21309) with a configurable ignore_context_propagation flag on OpenTelemetryConfig. Users who relied on the old isolation behaviour can opt back in via OTEL_IGNORE_CONTEXT_PROPAGATION=true. The core logic change is minimal and correct; the main concerns are in the test layer.

  • OpenTelemetryConfig.ignore_context_propagation is a new Optional[bool] field that defaults to None after __post_init__ when OTEL_IGNORE_CONTEXT_PROPAGATION is not set. It is treated as falsy (False) in the if self.config.ignore_context_propagation: guard, which is functionally correct but leaves a type-consistency gap (callers cannot distinguish "explicitly False" from "unset").
  • str_to_bool recognises only "true" / "false" — common alternatives such as "1", "yes", or mixed-case "True" silently fall back to None (i.e. propagation enabled), which may surprise users.
  • test_handle_success_failure_default_preserves_parent_span creates OpenTelemetry with no config, causing from_env() to read OTEL_IGNORE_CONTEXT_PROPAGATION from the runtime environment. If that variable is set in the test runner, the test will unexpectedly fail; the sibling test test_handle_success_failure_with_context_propagation_preserves_parent_span already covers the explicit-False path and is environment-independent.
  • Documentation is updated clearly for both SDK and proxy usage.

Confidence Score: 4/5

  • The core fix is correct and restores standard OpenTelemetry behaviour; minor test-reliability and type-consistency nits remain.
  • The main code change in opentelemetry.py is clean, well-scoped, and correctly replaces a hardcoded provider check with a configurable flag. The functional logic is sound: ignore_context_propagation defaults to falsy (None) when the env var is absent, so traceparent propagation is on by default as the OTel spec requires. One test (test_handle_success_failure_default_preserves_parent_span) is environment-sensitive and could be flaky if OTEL_IGNORE_CONTEXT_PROPAGATION is set in the runner. The Optional[bool] staying None instead of resolving to False is a minor type-consistency issue, not a runtime bug.
  • tests/test_litellm/integrations/test_opentelemetry.pytest_handle_success_failure_default_preserves_parent_span is susceptible to environment contamination; and litellm/integrations/opentelemetry.pyignore_context_propagation remains None (not False) when the env var is absent.

Important Files Changed

Filename Overview
litellm/integrations/opentelemetry.py Core logic change: replaces hardcoded langfuse_otel callback check with configurable ignore_context_propagation field on OpenTelemetryConfig. The field is read from OTEL_IGNORE_CONTEXT_PROPAGATION env var in __post_init__, but str_to_bool(None) leaves it as None rather than False when the env var is absent, which is a minor type-consistency issue.
tests/test_litellm/integrations/test_opentelemetry.py Tests correctly use if/elif (not match/case), parameterized.expand, and timezone-aware datetimes. However, test_handle_success_failure_default_preserves_parent_span instantiates OpenTelemetry without an explicit config, making it sensitive to OTEL_IGNORE_CONTEXT_PROPAGATION being set in the test runner environment, which could cause unexpected failures.
docs/my-website/docs/observability/langfuse_otel_integration.md Documentation correctly documents OTEL_IGNORE_CONTEXT_PROPAGATION as an optional opt-in flag for both SDK and proxy usage, with appropriate comments indicating it is disabled by default.
pyproject.toml Adds parameterized = "^0.9.0" to the dev dependency group, which is the correct placement for a test utility library.
poetry.lock Lock file regenerated by Poetry 2.3.2, adding parameterized 0.9.0 and updating marker expressions for various packages. Changes appear to be auto-generated and consistent with the pyproject.toml updates.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["LangfuseOtelLogger / OpenTelemetry\n_handle_success / _handle_failure"] --> B["_get_span_context(kwargs)\nExtract parent_span & ctx"]
    B --> C{"config.ignore_context_propagation\n== True?"}
    C -- "Yes\n(OTEL_IGNORE_CONTEXT_PROPAGATION=true)" --> D["Set parent_span = None\nSet ctx = None\nCreate root-level span"]
    C -- "No (default)\n(traceparent respected)" --> E{"parent_span is None\nOR USE_OTEL_LITELLM_REQUEST_SPAN?"}
    E -- "Yes" --> F["Create primary litellm_request span\nparented to extracted ctx"]
    E -- "No" --> G["Add attributes to existing\nparent_span directly"]
    D --> H["Finish & export span"]
    F --> H
    G --> H

    style D fill:#f9c,stroke:#c66
    style C fill:#ff9,stroke:#990
Loading

Comments Outside Diff (1)

  1. tests/test_litellm/integrations/test_opentelemetry.py, line 2244 (link)

    P2 Test is susceptible to environment contamination

    OpenTelemetry(tracer_provider=tracer_provider) with no config argument causes OpenTelemetryConfig.from_env() to be called, which reads OTEL_IGNORE_CONTEXT_PROPAGATION from the environment in __post_init__. If this env var is set to "true" in the test runner (e.g. in CI or a developer's local environment), ignore_context_propagation will be True, which means parent spans will be ignored — causing this test to fail despite the assertion that parent spans are preserved.

    The parallel test test_handle_success_failure_with_context_propagation_preserves_parent_span already covers the explicit-False case; this test should mirror that safety by also patching or isolating the env:

    Alternatively, wrap the setup with patch.dict(os.environ, {}, clear=False) and remove OTEL_IGNORE_CONTEXT_PROPAGATION from the env.

Last reviewed commit: "Code review"

Comment on lines +718 to 720
if self.config.ignore_context_propagation:
parent_span = None # Ignore parent spans from other providers
ctx = None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Backwards-incompatible change breaks existing langfuse_otel users

The previous code unconditionally nulled parent spans for all langfuse_otel callbacks (the "CRITICAL FIX" comment even noted "Don't use parent spans from other providers as they cause trace corruption"). This PR replaces it with an opt-in ignore_context_propagation flag, but the LangfuseOtelLogger initialization path does not set this flag to True by default.

Looking at litellm/litellm_core_utils/litellm_logging.py, LangfuseOtelLogger is created as:

_otel_logger = LangfuseOtelLogger(config=None, callback_name="langfuse_otel")

And LangfuseOtelLogger._create_open_telemetry_config_from_langfuse_env() creates an OpenTelemetryConfig(...) without setting ignore_context_propagation=True, so it falls back to reading OTEL_IGNORE_CONTEXT_PROPAGATION from the environment (which is unset for existing users).

The net effect is that all existing langfuse_otel users who did not set OTEL_IGNORE_CONTEXT_PROPAGATION=true will silently lose the parent span isolation that was previously automatic. Their traces may start showing parent-child relationships with spans from other providers, causing the "trace corruption" the original fix was designed to prevent.

To fix this while keeping the code configurable, LangfuseOtelLogger should default to ignore_context_propagation=True in its config construction, or the __post_init__ logic should preserve the previous automatic behavior for langfuse_otel.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The traceparent header propagation is the OpenTelemetry standard, and it is being ignored after this recent PR #21309. That PR was a breaking change for existing users because the traces were no longer grouped in langfuse, and they were sent as separate traces. Ignoring the traceparent may be useful in some cases, but that should be an opt-in feature and not a default behavior since the OpenTelemetry standard should follow the propagation header.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're absolutely right. The OpenTelemetry standard dictates that traceparent headers should be propagated by default, and this PR correctly restores that behavior.

Looking at the code changes:

Before (PR #21309): Hardcoded to always ignore parent spans for langfuse_otel

is_langfuse_otel = (
    hasattr(self, "callback_name") and self.callback_name == "langfuse_otel"
)
if is_langfuse_otel:
    parent_span = None  # Ignore parent spans from other providers
    ctx = None

After (this PR): Configurable via ignore_context_propagation flag, defaults to False

if self.config.ignore_context_propagation:
    parent_span = None
    ctx = None

Analysis:

  • PR Litellm fix langfuse otel trace v2 #21309 was the breaking change that violated OpenTelemetry standards
  • This PR correctly restores standard-compliant behavior
  • Users who need trace isolation can opt-in via OTEL_IGNORE_CONTEXT_PROPAGATION=true
  • The documentation updates clearly explain the optional flag

Recommendation: The only consideration is ensuring users who adapted to the post-#21309 behavior are aware of this change. The migration path is straightforward—they can set OTEL_IGNORE_CONTEXT_PROPAGATION=true to maintain isolated traces. The PR description could explicitly mention this is reverting the behavior from PR #21309 and note the migration path for affected users.

This change aligns with the custom rule about backwards compatibility—the opt-in flag provides a user-controlled way to preserve the old behavior without forcing non-standard behavior on everyone.

Comment thread docs/my-website/docs/proxy/config_settings.md Outdated
Comment thread tests/test_litellm/integrations/test_opentelemetry.py
Comment thread tests/test_litellm/integrations/test_opentelemetry.py Outdated
Comment thread tests/test_litellm/integrations/test_opentelemetry.py Outdated
@jyeros jyeros force-pushed the fix-langfuse-otel-traceparent-propagation branch from b1fd371 to dbbec40 Compare March 18, 2026 21:19
Comment thread tests/test_litellm/integrations/test_opentelemetry.py Outdated
@jyeros jyeros force-pushed the fix-langfuse-otel-traceparent-propagation branch from dbbec40 to 02ebd13 Compare March 18, 2026 21:25
Comment on lines +93 to +96
if self.ignore_context_propagation is None:
self.ignore_context_propagation = str_to_bool(
os.getenv("OTEL_IGNORE_CONTEXT_PROPAGATION")
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 ignore_context_propagation stays None when env var is unset

str_to_bool(None) returns None (not False) when OTEL_IGNORE_CONTEXT_PROPAGATION is absent. While if self.config.ignore_context_propagation: correctly treats None as falsy, keeping the field typed as Optional[bool] with a None sentinel after construction is slightly unexpected — callers who inspect the config object (e.g. for logging or assertions) cannot easily distinguish "explicitly set to False" from "env var not set".

Additionally, str_to_bool only recognises the literal strings "true" and "false". Common alternatives like "1", "yes", or "True" (capital T) all return None, which silently behaves as False. This may surprise users who set OTEL_IGNORE_CONTEXT_PROPAGATION=1 expecting it to take effect.

Consider normalising to False when the env var is not set:

Suggested change
if self.ignore_context_propagation is None:
self.ignore_context_propagation = str_to_bool(
os.getenv("OTEL_IGNORE_CONTEXT_PROPAGATION")
)
if self.ignore_context_propagation is None:
self.ignore_context_propagation = (
str_to_bool(os.getenv("OTEL_IGNORE_CONTEXT_PROPAGATION")) or False
)

@ghost ghost merged commit cbb4c2c into BerriAI:main Mar 18, 2026
37 of 40 checks passed
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mubashir1osmani this seems like a patch. Can you look into a better long-term fix here?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: langfuse_otel discards traceparent context, breaking trace linking (v1.82.x)

2 participants