Skip to content

fix: batch-limit stale managed object cleanup to prevent 300K row UPD…#25258

Merged
krrish-berri-2 merged 3 commits intomainfrom
litellm_april_6_3
Apr 7, 2026
Merged

fix: batch-limit stale managed object cleanup to prevent 300K row UPD…#25258
krrish-berri-2 merged 3 commits intomainfrom
litellm_april_6_3

Conversation

@ishaan-berri
Copy link
Copy Markdown
Contributor

…ATE (#25257)

  • Add STALE_OBJECT_CLEANUP_BATCH_SIZE constant

Configurable batch limit (default 1000) for stale managed object cleanup, preventing unbounded UPDATE queries from hitting 300K+ rows at once.

  • Batch-limit stale managed object cleanup with single bounded SQL query

Two fixes to _cleanup_stale_managed_objects:

  1. Replace unbounded update_many with a single execute_raw using a subquery LIMIT, capping each poll cycle to STALE_OBJECT_CLEANUP_BATCH_SIZE rows. Zero rows loaded into Python memory — everything stays in Postgres. Uses the same PostgreSQL raw-SQL pattern as spend_log_cleanup.py (the proxy requires PostgreSQL per schema.prisma).

  2. Extract _expire_stale_rows as a separate method for testability.

Keeps the file_purpose='response' filter to avoid incorrectly expiring long-running batch or fine-tune jobs that legitimately exceed the staleness cutoff.

Relevant issues

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Type

🆕 New Feature
🐛 Bug Fix
🧹 Refactoring
📖 Documentation
🚄 Infrastructure
✅ Test

Changes

…ATE (#25257)

* Add STALE_OBJECT_CLEANUP_BATCH_SIZE constant

Configurable batch limit (default 1000) for stale managed object cleanup,
preventing unbounded UPDATE queries from hitting 300K+ rows at once.

* Batch-limit stale managed object cleanup with single bounded SQL query

Two fixes to _cleanup_stale_managed_objects:

1. Replace unbounded update_many with a single execute_raw using a
   subquery LIMIT, capping each poll cycle to STALE_OBJECT_CLEANUP_BATCH_SIZE
   rows. Zero rows loaded into Python memory — everything stays in Postgres.
   Uses the same PostgreSQL raw-SQL pattern as spend_log_cleanup.py
   (the proxy requires PostgreSQL per schema.prisma).

2. Extract _expire_stale_rows as a separate method for testability.

Keeps the file_purpose='response' filter to avoid incorrectly expiring
long-running batch or fine-tune jobs that legitimately exceed the
staleness cutoff.
@vercel
Copy link
Copy Markdown

vercel bot commented Apr 7, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Apr 7, 2026 2:02am

Request Review

@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq bot commented Apr 7, 2026

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing litellm_april_6_3 (73c09fb) with main (3ac61a5)

Open in CodSpeed

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 7, 2026

Greptile Summary

This PR fixes a real production problem: _cleanup_stale_managed_objects previously issued an unbounded UPDATE that could hit 300K+ rows in a single query. The fix introduces _expire_stale_rows, which uses execute_raw with a WHERE id IN (SELECT ... LIMIT n) subquery to cap each poll cycle at STALE_OBJECT_CLEANUP_BATCH_SIZE (default 1000) rows — the same bounded-SQL pattern already used by spend_log_cleanup.py. A new STALE_OBJECT_CLEANUP_BATCH_SIZE constant is added to litellm/constants.py with an env-var override and a max(1,...) guard, and the environment variable is documented in config_settings.md.

  • enterprise/.../check_responses_cost.py: extracts _expire_stale_rows for testability; replaces update_many with a bounded execute_raw call — production logic is correct
  • litellm/constants.py: adds STALE_OBJECT_CLEANUP_BATCH_SIZE — correct and consistent with existing patterns
  • docs/: documents the new env var — accurate
  • tests/proxy_unit_tests/test_check_responses_cost.py (not in the diff, but impacted): the shared mock_prisma_client fixture uses client.db = MagicMock(), which makes await prisma_client.db.execute_raw(...) raise TypeError in all 10 tests; additionally, test_cleanup_stale_managed_objects still asserts that update_many was called for stale cleanup, which is now a dead assertion
  • tests/local_testing/test_embedding.py: removes test_cohere_embedding (Cohere v2 coverage) without explanation — unrelated to this fix

Confidence Score: 4/5

The production fix is logically correct but the PR breaks all existing unit tests — the execute_raw call is not mocked as AsyncMock in the shared test fixture.

Score is 4/5 because two P1 findings remain: the missing AsyncMock for execute_raw will cause TypeError in all 10 tests in test_check_responses_cost.py on CI, and the stale assertions in test_cleanup_stale_managed_objects no longer verify the new cleanup path. The core production logic is sound.

tests/proxy_unit_tests/test_check_responses_cost.py needs client.db.execute_raw = AsyncMock(return_value=0) added to the shared fixture, and test_cleanup_stale_managed_objects must be updated to assert on execute_raw rather than update_many.

Important Files Changed

Filename Overview
enterprise/litellm_enterprise/proxy/common_utils/check_responses_cost.py Replaces unbounded update_many with batched execute_raw subquery LIMIT — logically correct, but breaks all existing unit tests because execute_raw is not mocked as AsyncMock in the shared test fixture
litellm/constants.py Adds STALE_OBJECT_CLEANUP_BATCH_SIZE constant with env-var override and max(1,...) guard — correct and consistent with existing patterns
docs/my-website/docs/proxy/config_settings.md Documents new STALE_OBJECT_CLEANUP_BATCH_SIZE env var — complete and accurate
tests/local_testing/test_embedding.py Removes test_cohere_embedding (Cohere v2 coverage) without explanation — unrelated to this PR's stated scope

Sequence Diagram

sequenceDiagram
    participant Caller
    participant CRC as CheckResponsesCost
    participant DB as prisma_client.db
    participant PG as PostgreSQL
    participant Logger as verbose_proxy_logger
    participant LiteLLM as litellm

    Caller->>CRC: check_responses_cost()
    CRC->>CRC: _cleanup_stale_managed_objects()
    CRC->>CRC: _expire_stale_rows(cutoff, STALE_OBJECT_CLEANUP_BATCH_SIZE)
    CRC->>DB: execute_raw(UPDATE ... WHERE id IN (SELECT ... LIMIT n))
    DB->>PG: bounded UPDATE
    PG-->>DB: affected row count
    DB-->>CRC: result
    alt result > 0
        CRC->>Logger: warning(marked N stale objects)
    end
    CRC->>DB: find_many(status in [queued, in_progress], take=MAX_OBJECTS_PER_POLL_CYCLE)
    DB-->>CRC: jobs[]
    loop for each job
        CRC->>LiteLLM: aget_responses(response_id, metadata)
        LiteLLM-->>CRC: response
    end
    CRC->>DB: update_many(id in completed_ids, status=completed)
    DB-->>CRC: updated count
    CRC-->>Caller: done
Loading

Comments Outside Diff (2)

  1. tests/proxy_unit_tests/test_check_responses_cost.py, line 19-25 (link)

    P1 execute_raw not mocked as AsyncMock — all 10 tests will fail

    The mock_prisma_client fixture sets client.db = MagicMock(). The implementation change now calls await self.prisma_client.db.execute_raw(...) inside _expire_stale_rows. In Python 3.8+, awaiting a plain MagicMock raises TypeError: object MagicMock can't be used in 'await' expression, which propagates to every test that exercises check_responses_cost() (all 10 tests in this file share this fixture).

    Fix: add client.db.execute_raw = AsyncMock(return_value=0) to the shared fixture:

  2. tests/proxy_unit_tests/test_check_responses_cost.py, line 86-106 (link)

    P1 Stale assertions check the superseded update_many path

    This test was written against the old _cleanup_stale_managed_objects that called litellm_managedobjecttable.update_many(...) for stale cleanup. The new implementation routes stale-row expiry through _expire_stale_rows → execute_raw. The assertions on litellm_managedobjecttable.update_many.call_args_list at lines 100–106 now target a call that is never made for stale cleanup, so the test no longer verifies the new batched behavior.

    The test should be rewritten to:

    1. Assert mock_prisma_client.db.execute_raw was called (after fixing the AsyncMock issue above).
    2. Verify the SQL string contains LIMIT and the batch_size argument matches STALE_OBJECT_CLEANUP_BATCH_SIZE.
    3. Remove or reframe the update_many stale-cleanup assertions — update_many is now used only for marking polled jobs as completed, not for stale expiry.

Reviews (3): Last reviewed commit: "test: remove deprecated embed-english-v2..." | Re-trigger Greptile

@ishaan-berri ishaan-berri temporarily deployed to integration-redis-postgres April 7, 2026 01:50 — with GitHub Actions Inactive
@ishaan-berri ishaan-berri temporarily deployed to integration-postgres April 7, 2026 01:50 — with GitHub Actions Inactive
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@krrish-berri-2 krrish-berri-2 merged commit 7a9a9f0 into main Apr 7, 2026
97 of 105 checks passed
@krrish-berri-2 krrish-berri-2 deleted the litellm_april_6_3 branch April 7, 2026 02:11
yuneng-berri added a commit that referenced this pull request Apr 7, 2026
PR #25258 changed _cleanup_stale_managed_objects from update_many to
execute_raw via _expire_stale_rows, but the tests were not updated.
The tests now mock _expire_stale_rows on the instance and assert
update_many calls only for job completion, not stale cleanup.
harish876 pushed a commit to harish876/litellm that referenced this pull request Apr 8, 2026
BerriAI#25258)

* fix: batch-limit stale managed object cleanup to prevent 300K row UPDATE (BerriAI#25257)

* Add STALE_OBJECT_CLEANUP_BATCH_SIZE constant

Configurable batch limit (default 1000) for stale managed object cleanup,
preventing unbounded UPDATE queries from hitting 300K+ rows at once.

* Batch-limit stale managed object cleanup with single bounded SQL query

Two fixes to _cleanup_stale_managed_objects:

1. Replace unbounded update_many with a single execute_raw using a
   subquery LIMIT, capping each poll cycle to STALE_OBJECT_CLEANUP_BATCH_SIZE
   rows. Zero rows loaded into Python memory — everything stays in Postgres.
   Uses the same PostgreSQL raw-SQL pattern as spend_log_cleanup.py
   (the proxy requires PostgreSQL per schema.prisma).

2. Extract _expire_stale_rows as a separate method for testability.

Keeps the file_purpose='response' filter to avoid incorrectly expiring
long-running batch or fine-tune jobs that legitimately exceed the
staleness cutoff.

* docs: add STALE_OBJECT_CLEANUP_BATCH_SIZE to env vars reference

* test: remove deprecated embed-english-v2.0 cohere embedding tests
harish876 pushed a commit to harish876/litellm that referenced this pull request Apr 8, 2026
PR BerriAI#25258 changed _cleanup_stale_managed_objects from update_many to
execute_raw via _expire_stale_rows, but the tests were not updated.
The tests now mock _expire_stale_rows on the instance and assert
update_many calls only for job completion, not stale cleanup.
yuneng-berri added a commit that referenced this pull request Apr 14, 2026
- Add 8 content PRs that merged directly to the release branch outside the listed staging PRs: #23769 (Ramp callback), #25252 (JWT OAuth2 override), #25254 (AWS GovCloud mode), #25258 (batch-limit cleanup), #25334 (router custom_llm_provider), #25345 (Triton embeddings), #25347 (tag-based routing), #25358 (Baseten pricing attribution)
- Add @kedarthakkar to new contributors (first-ever PR via #23769)
- Update RELEASE_NOTES_GENERATION_INSTRUCTIONS: require walking git log range between release tags in addition to staging PRs, and verify new-contributor status per author rather than trusting the GH release body floor
Benniphx pushed a commit to Benniphx/litellm that referenced this pull request Apr 15, 2026
- Add 8 content PRs that merged directly to the release branch outside the listed staging PRs: BerriAI#23769 (Ramp callback), BerriAI#25252 (JWT OAuth2 override), BerriAI#25254 (AWS GovCloud mode), BerriAI#25258 (batch-limit cleanup), BerriAI#25334 (router custom_llm_provider), BerriAI#25345 (Triton embeddings), BerriAI#25347 (tag-based routing), BerriAI#25358 (Baseten pricing attribution)
- Add @kedarthakkar to new contributors (first-ever PR via BerriAI#23769)
- Update RELEASE_NOTES_GENERATION_INSTRUCTIONS: require walking git log range between release tags in addition to staging PRs, and verify new-contributor status per author rather than trusting the GH release body floor
ishaan-berri pushed a commit that referenced this pull request Apr 15, 2026
- Add 8 content PRs that merged directly to the release branch outside the listed staging PRs: #23769 (Ramp callback), #25252 (JWT OAuth2 override), #25254 (AWS GovCloud mode), #25258 (batch-limit cleanup), #25334 (router custom_llm_provider), #25345 (Triton embeddings), #25347 (tag-based routing), #25358 (Baseten pricing attribution)
- Add @kedarthakkar to new contributors (first-ever PR via #23769)
- Update RELEASE_NOTES_GENERATION_INSTRUCTIONS: require walking git log range between release tags in addition to staging PRs, and verify new-contributor status per author rather than trusting the GH release body floor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants