fix: batch-limit stale managed object cleanup to prevent 300K row UPD… by ishaan-berri · Pull Request #25258 · BerriAI/litellm

ishaan-berri · 2026-04-07T01:41:16Z

…ATE (#25257)

Add STALE_OBJECT_CLEANUP_BATCH_SIZE constant

Configurable batch limit (default 1000) for stale managed object cleanup, preventing unbounded UPDATE queries from hitting 300K+ rows at once.

Batch-limit stale managed object cleanup with single bounded SQL query

Two fixes to _cleanup_stale_managed_objects:

Replace unbounded update_many with a single execute_raw using a subquery LIMIT, capping each poll cycle to STALE_OBJECT_CLEANUP_BATCH_SIZE rows. Zero rows loaded into Python memory — everything stays in Postgres. Uses the same PostgreSQL raw-SQL pattern as spend_log_cleanup.py (the proxy requires PostgreSQL per schema.prisma).
Extract _expire_stale_rows as a separate method for testability.

Keeps the file_purpose='response' filter to avoid incorrectly expiring long-running batch or fine-tune jobs that legitimately exceed the staleness cutoff.

Relevant issues

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem
I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

50-55 passing tests: main is stable with minor issues.

45-49 passing tests: acceptable but needs attention

<= 40 passing tests: unstable; be careful with your merges and assess the risk.

Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:

Type

🆕 New Feature
🐛 Bug Fix
🧹 Refactoring
📖 Documentation
🚄 Infrastructure
✅ Test

Changes

…ATE (#25257) * Add STALE_OBJECT_CLEANUP_BATCH_SIZE constant Configurable batch limit (default 1000) for stale managed object cleanup, preventing unbounded UPDATE queries from hitting 300K+ rows at once. * Batch-limit stale managed object cleanup with single bounded SQL query Two fixes to _cleanup_stale_managed_objects: 1. Replace unbounded update_many with a single execute_raw using a subquery LIMIT, capping each poll cycle to STALE_OBJECT_CLEANUP_BATCH_SIZE rows. Zero rows loaded into Python memory — everything stays in Postgres. Uses the same PostgreSQL raw-SQL pattern as spend_log_cleanup.py (the proxy requires PostgreSQL per schema.prisma). 2. Extract _expire_stale_rows as a separate method for testability. Keeps the file_purpose='response' filter to avoid incorrectly expiring long-running batch or fine-tune jobs that legitimately exceed the staleness cutoff.

vercel · 2026-04-07T01:41:22Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	Apr 7, 2026 2:02am

codspeed-hq · 2026-04-07T01:42:57Z

Merging this PR will not alter performance

✅ 16 untouched benchmarks

_{Comparing litellm_april_6_3 (73c09fb) with main (3ac61a5)}

greptile-apps · 2026-04-07T01:47:25Z

Greptile Summary

This PR fixes a real production problem: _cleanup_stale_managed_objects previously issued an unbounded UPDATE that could hit 300K+ rows in a single query. The fix introduces _expire_stale_rows, which uses execute_raw with a WHERE id IN (SELECT ... LIMIT n) subquery to cap each poll cycle at STALE_OBJECT_CLEANUP_BATCH_SIZE (default 1000) rows — the same bounded-SQL pattern already used by spend_log_cleanup.py. A new STALE_OBJECT_CLEANUP_BATCH_SIZE constant is added to litellm/constants.py with an env-var override and a max(1,...) guard, and the environment variable is documented in config_settings.md.

enterprise/.../check_responses_cost.py: extracts _expire_stale_rows for testability; replaces update_many with a bounded execute_raw call — production logic is correct
litellm/constants.py: adds STALE_OBJECT_CLEANUP_BATCH_SIZE — correct and consistent with existing patterns
docs/: documents the new env var — accurate
tests/proxy_unit_tests/test_check_responses_cost.py (not in the diff, but impacted): the shared mock_prisma_client fixture uses client.db = MagicMock(), which makes await prisma_client.db.execute_raw(...) raise TypeError in all 10 tests; additionally, test_cleanup_stale_managed_objects still asserts that update_many was called for stale cleanup, which is now a dead assertion
tests/local_testing/test_embedding.py: removes test_cohere_embedding (Cohere v2 coverage) without explanation — unrelated to this fix

Confidence Score: 4/5

The production fix is logically correct but the PR breaks all existing unit tests — the execute_raw call is not mocked as AsyncMock in the shared test fixture.

Score is 4/5 because two P1 findings remain: the missing AsyncMock for execute_raw will cause TypeError in all 10 tests in test_check_responses_cost.py on CI, and the stale assertions in test_cleanup_stale_managed_objects no longer verify the new cleanup path. The core production logic is sound.

tests/proxy_unit_tests/test_check_responses_cost.py needs client.db.execute_raw = AsyncMock(return_value=0) added to the shared fixture, and test_cleanup_stale_managed_objects must be updated to assert on execute_raw rather than update_many.

Important Files Changed

Filename	Overview
enterprise/litellm_enterprise/proxy/common_utils/check_responses_cost.py	Replaces unbounded update_many with batched execute_raw subquery LIMIT — logically correct, but breaks all existing unit tests because execute_raw is not mocked as AsyncMock in the shared test fixture
litellm/constants.py	Adds STALE_OBJECT_CLEANUP_BATCH_SIZE constant with env-var override and max(1,...) guard — correct and consistent with existing patterns
docs/my-website/docs/proxy/config_settings.md	Documents new STALE_OBJECT_CLEANUP_BATCH_SIZE env var — complete and accurate
tests/local_testing/test_embedding.py	Removes test_cohere_embedding (Cohere v2 coverage) without explanation — unrelated to this PR's stated scope

Sequence Diagram

sequenceDiagram
    participant Caller
    participant CRC as CheckResponsesCost
    participant DB as prisma_client.db
    participant PG as PostgreSQL
    participant Logger as verbose_proxy_logger
    participant LiteLLM as litellm

    Caller->>CRC: check_responses_cost()
    CRC->>CRC: _cleanup_stale_managed_objects()
    CRC->>CRC: _expire_stale_rows(cutoff, STALE_OBJECT_CLEANUP_BATCH_SIZE)
    CRC->>DB: execute_raw(UPDATE ... WHERE id IN (SELECT ... LIMIT n))
    DB->>PG: bounded UPDATE
    PG-->>DB: affected row count
    DB-->>CRC: result
    alt result > 0
        CRC->>Logger: warning(marked N stale objects)
    end
    CRC->>DB: find_many(status in [queued, in_progress], take=MAX_OBJECTS_PER_POLL_CYCLE)
    DB-->>CRC: jobs[]
    loop for each job
        CRC->>LiteLLM: aget_responses(response_id, metadata)
        LiteLLM-->>CRC: response
    end
    CRC->>DB: update_many(id in completed_ids, status=completed)
    DB-->>CRC: updated count
    CRC-->>Caller: done

Comments Outside Diff (2)

tests/proxy_unit_tests/test_check_responses_cost.py, line 19-25 (link)

execute_raw not mocked as AsyncMock — all 10 tests will fail

The mock_prisma_client fixture sets client.db = MagicMock(). The implementation change now calls await self.prisma_client.db.execute_raw(...) inside _expire_stale_rows. In Python 3.8+, awaiting a plain MagicMock raises TypeError: object MagicMock can't be used in 'await' expression, which propagates to every test that exercises check_responses_cost() (all 10 tests in this file share this fixture).

Fix: add client.db.execute_raw = AsyncMock(return_value=0) to the shared fixture:
tests/proxy_unit_tests/test_check_responses_cost.py, line 86-106 (link)

Stale assertions check the superseded update_many path

This test was written against the old _cleanup_stale_managed_objects that called litellm_managedobjecttable.update_many(...) for stale cleanup. The new implementation routes stale-row expiry through _expire_stale_rows → execute_raw. The assertions on litellm_managedobjecttable.update_many.call_args_list at lines 100–106 now target a call that is never made for stale cleanup, so the test no longer verifies the new batched behavior.

The test should be rewritten to:
1. Assert mock_prisma_client.db.execute_raw was called (after fixing the AsyncMock issue above).
2. Verify the SQL string contains LIMIT and the batch_size argument matches STALE_OBJECT_CLEANUP_BATCH_SIZE.
3. Remove or reframe the update_many stale-cleanup assertions — update_many is now used only for marking polled jobs as completed, not for stale expiry.

_{Reviews (3): Last reviewed commit: "test: remove deprecated embed-english-v2..." | Re-trigger Greptile}

CLAassistant · 2026-04-07T01:50:21Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

PR #25258 changed _cleanup_stale_managed_objects from update_many to execute_raw via _expire_stale_rows, but the tests were not updated. The tests now mock _expire_stale_rows on the instance and assert update_many calls only for job completion, not stale cleanup.

BerriAI#25258) * fix: batch-limit stale managed object cleanup to prevent 300K row UPDATE (BerriAI#25257) * Add STALE_OBJECT_CLEANUP_BATCH_SIZE constant Configurable batch limit (default 1000) for stale managed object cleanup, preventing unbounded UPDATE queries from hitting 300K+ rows at once. * Batch-limit stale managed object cleanup with single bounded SQL query Two fixes to _cleanup_stale_managed_objects: 1. Replace unbounded update_many with a single execute_raw using a subquery LIMIT, capping each poll cycle to STALE_OBJECT_CLEANUP_BATCH_SIZE rows. Zero rows loaded into Python memory — everything stays in Postgres. Uses the same PostgreSQL raw-SQL pattern as spend_log_cleanup.py (the proxy requires PostgreSQL per schema.prisma). 2. Extract _expire_stale_rows as a separate method for testability. Keeps the file_purpose='response' filter to avoid incorrectly expiring long-running batch or fine-tune jobs that legitimately exceed the staleness cutoff. * docs: add STALE_OBJECT_CLEANUP_BATCH_SIZE to env vars reference * test: remove deprecated embed-english-v2.0 cohere embedding tests

PR BerriAI#25258 changed _cleanup_stale_managed_objects from update_many to execute_raw via _expire_stale_rows, but the tests were not updated. The tests now mock _expire_stale_rows on the instance and assert update_many calls only for job completion, not stale cleanup.

@kedarthakkar

- Add 8 content PRs that merged directly to the release branch outside the listed staging PRs: #23769 (Ramp callback), #25252 (JWT OAuth2 override), #25254 (AWS GovCloud mode), #25258 (batch-limit cleanup), #25334 (router custom_llm_provider), #25345 (Triton embeddings), #25347 (tag-based routing), #25358 (Baseten pricing attribution) - Add @kedarthakkar to new contributors (first-ever PR via #23769) - Update RELEASE_NOTES_GENERATION_INSTRUCTIONS: require walking git log range between release tags in addition to staging PRs, and verify new-contributor status per author rather than trusting the GH release body floor

@kedarthakkar

- Add 8 content PRs that merged directly to the release branch outside the listed staging PRs: BerriAI#23769 (Ramp callback), BerriAI#25252 (JWT OAuth2 override), BerriAI#25254 (AWS GovCloud mode), BerriAI#25258 (batch-limit cleanup), BerriAI#25334 (router custom_llm_provider), BerriAI#25345 (Triton embeddings), BerriAI#25347 (tag-based routing), BerriAI#25358 (Baseten pricing attribution) - Add @kedarthakkar to new contributors (first-ever PR via BerriAI#23769) - Update RELEASE_NOTES_GENERATION_INSTRUCTIONS: require walking git log range between release tags in addition to staging PRs, and verify new-contributor status per author rather than trusting the GH release body floor

@kedarthakkar

- Add 8 content PRs that merged directly to the release branch outside the listed staging PRs: #23769 (Ramp callback), #25252 (JWT OAuth2 override), #25254 (AWS GovCloud mode), #25258 (batch-limit cleanup), #25334 (router custom_llm_provider), #25345 (Triton embeddings), #25347 (tag-based routing), #25358 (Baseten pricing attribution) - Add @kedarthakkar to new contributors (first-ever PR via #23769) - Update RELEASE_NOTES_GENERATION_INSTRUCTIONS: require walking git log range between release tags in addition to staging PRs, and verify new-contributor status per author rather than trusting the GH release body floor

ishaan-berri temporarily deployed to integration-postgres April 7, 2026 01:41 — with GitHub Actions Inactive

ishaan-berri had a problem deploying to integration-postgres April 7, 2026 01:41 — with GitHub Actions Error

ishaan-berri temporarily deployed to integration-postgres April 7, 2026 01:41 — with GitHub Actions Inactive

ishaan-berri temporarily deployed to integration-redis-postgres April 7, 2026 01:41 — with GitHub Actions Inactive

vercel bot deployed to Preview April 7, 2026 01:42 View deployment

greptile-apps bot reviewed Apr 7, 2026

View reviewed changes

Comment thread enterprise/litellm_enterprise/proxy/common_utils/check_responses_cost.py

docs: add STALE_OBJECT_CLEANUP_BATCH_SIZE to env vars reference

7946508

ishaan-berri temporarily deployed to integration-redis-postgres April 7, 2026 01:50 — with GitHub Actions Inactive

ishaan-berri temporarily deployed to integration-postgres April 7, 2026 01:50 — with GitHub Actions Inactive

ishaan-berri had a problem deploying to integration-postgres April 7, 2026 01:50 — with GitHub Actions Error

vercel bot deployed to Preview April 7, 2026 01:51 View deployment

test: remove deprecated embed-english-v2.0 cohere embedding tests

73c09fb

ishaan-berri temporarily deployed to integration-redis-postgres April 7, 2026 02:01 — with GitHub Actions Inactive

ishaan-berri temporarily deployed to integration-postgres April 7, 2026 02:01 — with GitHub Actions Inactive

ishaan-berri had a problem deploying to integration-postgres April 7, 2026 02:01 — with GitHub Actions Error

vercel bot deployed to Preview April 7, 2026 02:02 View deployment

krrish-berri-2 self-requested a review April 7, 2026 02:11

krrish-berri-2 approved these changes Apr 7, 2026

View reviewed changes

krrish-berri-2 merged commit 7a9a9f0 into main Apr 7, 2026
97 of 105 checks passed

krrish-berri-2 deleted the litellm_april_6_3 branch April 7, 2026 02:11

yuneng-berri mentioned this pull request Apr 7, 2026

[Fix] Update check_responses_cost tests for _expire_stale_rows #25299

Merged

emerzon mentioned this pull request Apr 12, 2026

add azure ai grok 4 20 models #25582

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: batch-limit stale managed object cleanup to prevent 300K row UPD…#25258

fix: batch-limit stale managed object cleanup to prevent 300K row UPD…#25258
krrish-berri-2 merged 3 commits intomainfrom
litellm_april_6_3

ishaan-berri commented Apr 7, 2026

Uh oh!

vercel bot commented Apr 7, 2026 •

edited

Loading

Uh oh!

codspeed-hq bot commented Apr 7, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Apr 7, 2026 •

edited

Loading

Important Files Changed

Comments Outside Diff (2)

Uh oh!

Uh oh!

CLAassistant commented Apr 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

ishaan-berri commented Apr 7, 2026

Relevant issues

Pre-Submission checklist

Delays in PR merge?

CI (LiteLLM team)

Type

Changes

Uh oh!

vercel bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codspeed-hq bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Uh oh!

greptile-apps bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Comments Outside Diff (2)

Uh oh!

Uh oh!

CLAassistant commented Apr 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vercel bot commented Apr 7, 2026 •

edited

Loading

codspeed-hq bot commented Apr 7, 2026 •

edited

Loading

greptile-apps bot commented Apr 7, 2026 •

edited

Loading