Skip to content

fix(langsmith): avoid no running event loop during sync init#23727

Merged
5 commits merged intoBerriAI:litellm_oss_staging_03_17_2026from
pandego:fix/langsmith-sync-no-event-loop
Mar 17, 2026
Merged

fix(langsmith): avoid no running event loop during sync init#23727
5 commits merged intoBerriAI:litellm_oss_staging_03_17_2026from
pandego:fix/langsmith-sync-no-event-loop

Conversation

@pandego
Copy link
Copy Markdown
Contributor

@pandego pandego commented Mar 16, 2026

Summary

Fixes #23733.

Fix the LangSmith logger init path for synchronous callers.

Today LangsmithLogger.__init__() always calls asyncio.create_task(self.periodic_flush()). When LiteLLM is initialized from a synchronous path, there is no running event loop yet, so logger startup raises RuntimeError: no running event loop.

This change starts the periodic flush task only when an event loop is already running. In sync contexts, logger initialization now stays safe and non-blocking instead of raising during setup.

Root cause

asyncio.create_task() requires an active running loop in the current thread. LangsmithLogger was calling it unconditionally during object construction.

Fix

  • add a small _start_periodic_flush_task() helper
  • use asyncio.get_running_loop() to detect whether task startup is safe
  • lazily start periodic flushing from async logging paths when needed
  • keep the behavior unchanged when LiteLLM is initialized inside an async context
  • keep the task startup control internal instead of exposing a public test-only constructor flag

Test plan

  • added regression coverage for:
    • init without a running loop
    • init with a running loop
    • lazy startup from async_log_success_event
    • lazy startup from async_log_failure_event
  • uv run pytest tests/test_litellm/integrations/test_langsmith_init.py -q
  • uv run pytest tests/logging_callback_tests/test_langsmith_unit_test.py -q

Context

This is related to earlier reports around LangSmith sync initialization and event-loop handling:

@vercel
Copy link
Copy Markdown

vercel bot commented Mar 16, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Mar 16, 2026 11:32am

Request Review

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Mar 16, 2026

CLA assistant check
All committers have signed the CLA.

@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq bot commented Mar 16, 2026

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing pandego:fix/langsmith-sync-no-event-loop (59caea4) with main (58e74a6)

Open in CodSpeed

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 16, 2026

Greptile Summary

This PR fixes a RuntimeError: no running event loop crash that occurred when LangsmithLogger was instantiated from a synchronous context. The unconditional asyncio.create_task(self.periodic_flush()) call in __init__ is replaced with a guarded _start_periodic_flush_task() helper that uses asyncio.get_running_loop(), returning None (and logging a debug message) when no loop is present. To ensure the flush task still starts when the logger is later used asynchronously, _ensure_periodic_flush_task() is called at the top of both async_log_success_event and async_log_failure_event, lazily scheduling the task on the first async invocation after a sync-context init.

Key observations:

  • The fix correctly handles the sync-init → async-use production pattern that was previously broken.
  • _ensure_periodic_flush_task is synchronous with no await between its guard check and assignment, which correctly prevents duplicate task creation under asyncio's cooperative scheduling model.
  • The synchronous log_success_event path intentionally does not call _ensure_periodic_flush_task; it relies on batch-size flushing via _send_batch(), which is the correct design since periodic flush is an async-only concept here.
  • All stale @patch("asyncio.create_task") decorators (which patched the wrong symbol) have been removed and replaced with focused, accurate test coverage for the guard paths and lazy startup.
  • Previous review feedback (silent queue drops on sync-init, placement of _ensure_periodic_flush_task inside the try block for async_log_failure_event, loose call-count assertions, and missing failure-event test) has been fully addressed in this iteration.

Confidence Score: 4/5

  • This PR is safe to merge; it fixes a real crash path without breaking existing async behaviour.
  • The core logic change is minimal and well-targeted. All previously raised review threads have been addressed. Test coverage now spans the no-loop, with-loop, and lazy-startup paths for both async log methods. The only reason for a non-perfect score is a small untested edge case: _ensure_periodic_flush_task will re-invoke _start_periodic_flush_task on every async log call when the prior task's done() is True (e.g., after a periodic_flush coroutine dies due to an unhandled exception) — but this is intentional recovery behaviour, is safe under asyncio's cooperative model, and is documented in a comment.
  • No files require special attention; both changed files are straightforward.

Important Files Changed

Filename Overview
litellm/integrations/langsmith.py Core fix: replaces unconditional asyncio.create_task in __init__ with a guarded _start_periodic_flush_task helper, adds lazy _ensure_periodic_flush_task called on every async log event. Logic is sound; log_success_event (sync path) intentionally does not call _ensure_periodic_flush_task since sync logging relies on batch-size-based flushing via _send_batch().
tests/test_litellm/integrations/test_langsmith_init.py Stale @patch("asyncio.create_task") decorators correctly removed; five new tests cover no-loop init, loop-present init, and lazy startup for both async log paths. One subtle gap: test_langsmith_init_starts_periodic_flush_with_running_loop asserts mock_loop.create_task.assert_called_once() but does not verify the coroutine passed is periodic_flush; however the test does close the scheduled coroutine to avoid ResourceWarning.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[LangsmithLogger.__init__] --> B[self._flush_task = _start_periodic_flush_task]
    B --> C{asyncio.get_running_loop?}
    C -- RuntimeError: No Loop --> D[return None\n_flush_task = None]
    C -- Loop exists --> E[loop.create_task periodic_flush\n_flush_task = Task]

    F[async_log_success_event] --> G[_ensure_periodic_flush_task]
    H[async_log_failure_event] --> G
    G --> I{_flush_task is None\nor .done?}
    I -- Yes --> J[_start_periodic_flush_task]
    J --> C
    I -- No --> K[Task already running - skip]

    E --> L[periodic_flush loop\n while True: sleep → flush_queue]
    D -.->|lazy start on first async call| F
    D -.->|lazy start on first async call| H
Loading

Last reviewed commit: 59caea4

Comment thread litellm/integrations/langsmith.py Outdated
Comment thread tests/test_litellm/integrations/test_langsmith_init.py Outdated
@pandego
Copy link
Copy Markdown
Contributor Author

pandego commented Mar 16, 2026

Addressed the review feedback in the latest push.

What changed:

  • lazily start the periodic flush task on async log calls if the logger was created before an event loop existed
  • keep the original init-time guard so sync construction no longer raises
  • strengthened the tests:
    • verify the no-loop init path leaves _flush_task unset
    • verify the happy path schedules loop.create_task(...)
    • verify async logging lazily starts the periodic flush task after sync init
  • removed the stale @patch("asyncio.create_task") decorators from the init tests

I also checked branch state against main before pushing - it is current, so the earlier failing checks were not caused by the branch being behind.

Local validation run:

  • uv run pytest tests/test_litellm/integrations/test_langsmith_init.py -q
  • uv run pytest tests/logging_callback_tests/test_langsmith_unit_test.py -q

Comment thread litellm/integrations/langsmith.py Outdated
Comment thread tests/test_litellm/integrations/test_langsmith_init.py Outdated
Comment thread tests/test_litellm/integrations/test_langsmith_init.py
Comment thread litellm/integrations/langsmith.py Outdated
Comment thread litellm/integrations/langsmith.py
@pandego
Copy link
Copy Markdown
Contributor Author

pandego commented Mar 16, 2026

Quick note: the LangSmith sync-init fix and regression coverage are updated per review and passing locally.

The remaining red checks appear unrelated to this change:

  • lint
  • test (proxy-misc)

I did not change the proxy startup path touched by the proxy-misc failure, and the lint failure looks repo-wide rather than specific to the LangSmith files in this PR.

@ghost ghost changed the base branch from main to litellm_oss_staging_03_17_2026 March 17, 2026 05:34
@ghost ghost merged commit e9291a9 into BerriAI:litellm_oss_staging_03_17_2026 Mar 17, 2026
37 of 39 checks passed
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(langsmith): sync init can raise no running event loop

2 participants