fix: batch-limit stale managed object cleanup to prevent 300K row UPD…#25258
fix: batch-limit stale managed object cleanup to prevent 300K row UPD…#25258krrish-berri-2 merged 3 commits intomainfrom
Conversation
…ATE (#25257) * Add STALE_OBJECT_CLEANUP_BATCH_SIZE constant Configurable batch limit (default 1000) for stale managed object cleanup, preventing unbounded UPDATE queries from hitting 300K+ rows at once. * Batch-limit stale managed object cleanup with single bounded SQL query Two fixes to _cleanup_stale_managed_objects: 1. Replace unbounded update_many with a single execute_raw using a subquery LIMIT, capping each poll cycle to STALE_OBJECT_CLEANUP_BATCH_SIZE rows. Zero rows loaded into Python memory — everything stays in Postgres. Uses the same PostgreSQL raw-SQL pattern as spend_log_cleanup.py (the proxy requires PostgreSQL per schema.prisma). 2. Extract _expire_stale_rows as a separate method for testability. Keeps the file_purpose='response' filter to avoid incorrectly expiring long-running batch or fine-tune jobs that legitimately exceed the staleness cutoff.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile SummaryThis PR fixes a real production problem:
Confidence Score: 4/5The production fix is logically correct but the PR breaks all existing unit tests — the execute_raw call is not mocked as AsyncMock in the shared test fixture. Score is 4/5 because two P1 findings remain: the missing AsyncMock for execute_raw will cause TypeError in all 10 tests in test_check_responses_cost.py on CI, and the stale assertions in test_cleanup_stale_managed_objects no longer verify the new cleanup path. The core production logic is sound. tests/proxy_unit_tests/test_check_responses_cost.py needs client.db.execute_raw = AsyncMock(return_value=0) added to the shared fixture, and test_cleanup_stale_managed_objects must be updated to assert on execute_raw rather than update_many.
|
| Filename | Overview |
|---|---|
| enterprise/litellm_enterprise/proxy/common_utils/check_responses_cost.py | Replaces unbounded update_many with batched execute_raw subquery LIMIT — logically correct, but breaks all existing unit tests because execute_raw is not mocked as AsyncMock in the shared test fixture |
| litellm/constants.py | Adds STALE_OBJECT_CLEANUP_BATCH_SIZE constant with env-var override and max(1,...) guard — correct and consistent with existing patterns |
| docs/my-website/docs/proxy/config_settings.md | Documents new STALE_OBJECT_CLEANUP_BATCH_SIZE env var — complete and accurate |
| tests/local_testing/test_embedding.py | Removes test_cohere_embedding (Cohere v2 coverage) without explanation — unrelated to this PR's stated scope |
Sequence Diagram
sequenceDiagram
participant Caller
participant CRC as CheckResponsesCost
participant DB as prisma_client.db
participant PG as PostgreSQL
participant Logger as verbose_proxy_logger
participant LiteLLM as litellm
Caller->>CRC: check_responses_cost()
CRC->>CRC: _cleanup_stale_managed_objects()
CRC->>CRC: _expire_stale_rows(cutoff, STALE_OBJECT_CLEANUP_BATCH_SIZE)
CRC->>DB: execute_raw(UPDATE ... WHERE id IN (SELECT ... LIMIT n))
DB->>PG: bounded UPDATE
PG-->>DB: affected row count
DB-->>CRC: result
alt result > 0
CRC->>Logger: warning(marked N stale objects)
end
CRC->>DB: find_many(status in [queued, in_progress], take=MAX_OBJECTS_PER_POLL_CYCLE)
DB-->>CRC: jobs[]
loop for each job
CRC->>LiteLLM: aget_responses(response_id, metadata)
LiteLLM-->>CRC: response
end
CRC->>DB: update_many(id in completed_ids, status=completed)
DB-->>CRC: updated count
CRC-->>Caller: done
Comments Outside Diff (2)
-
tests/proxy_unit_tests/test_check_responses_cost.py, line 19-25 (link)execute_rawnot mocked asAsyncMock— all 10 tests will failThe
mock_prisma_clientfixture setsclient.db = MagicMock(). The implementation change now callsawait self.prisma_client.db.execute_raw(...)inside_expire_stale_rows. In Python 3.8+, awaiting a plainMagicMockraisesTypeError: object MagicMock can't be used in 'await' expression, which propagates to every test that exercisescheck_responses_cost()(all 10 tests in this file share this fixture).Fix: add
client.db.execute_raw = AsyncMock(return_value=0)to the shared fixture: -
tests/proxy_unit_tests/test_check_responses_cost.py, line 86-106 (link)Stale assertions check the superseded
update_manypathThis test was written against the old
_cleanup_stale_managed_objectsthat calledlitellm_managedobjecttable.update_many(...)for stale cleanup. The new implementation routes stale-row expiry through_expire_stale_rows → execute_raw. The assertions onlitellm_managedobjecttable.update_many.call_args_listat lines 100–106 now target a call that is never made for stale cleanup, so the test no longer verifies the new batched behavior.The test should be rewritten to:
- Assert
mock_prisma_client.db.execute_rawwas called (after fixing theAsyncMockissue above). - Verify the SQL string contains
LIMITand thebatch_sizeargument matchesSTALE_OBJECT_CLEANUP_BATCH_SIZE. - Remove or reframe the
update_manystale-cleanup assertions —update_manyis now used only for marking polled jobs ascompleted, not for stale expiry.
- Assert
Reviews (3): Last reviewed commit: "test: remove deprecated embed-english-v2..." | Re-trigger Greptile
|
|
PR #25258 changed _cleanup_stale_managed_objects from update_many to execute_raw via _expire_stale_rows, but the tests were not updated. The tests now mock _expire_stale_rows on the instance and assert update_many calls only for job completion, not stale cleanup.
BerriAI#25258) * fix: batch-limit stale managed object cleanup to prevent 300K row UPDATE (BerriAI#25257) * Add STALE_OBJECT_CLEANUP_BATCH_SIZE constant Configurable batch limit (default 1000) for stale managed object cleanup, preventing unbounded UPDATE queries from hitting 300K+ rows at once. * Batch-limit stale managed object cleanup with single bounded SQL query Two fixes to _cleanup_stale_managed_objects: 1. Replace unbounded update_many with a single execute_raw using a subquery LIMIT, capping each poll cycle to STALE_OBJECT_CLEANUP_BATCH_SIZE rows. Zero rows loaded into Python memory — everything stays in Postgres. Uses the same PostgreSQL raw-SQL pattern as spend_log_cleanup.py (the proxy requires PostgreSQL per schema.prisma). 2. Extract _expire_stale_rows as a separate method for testability. Keeps the file_purpose='response' filter to avoid incorrectly expiring long-running batch or fine-tune jobs that legitimately exceed the staleness cutoff. * docs: add STALE_OBJECT_CLEANUP_BATCH_SIZE to env vars reference * test: remove deprecated embed-english-v2.0 cohere embedding tests
PR BerriAI#25258 changed _cleanup_stale_managed_objects from update_many to execute_raw via _expire_stale_rows, but the tests were not updated. The tests now mock _expire_stale_rows on the instance and assert update_many calls only for job completion, not stale cleanup.
- Add 8 content PRs that merged directly to the release branch outside the listed staging PRs: #23769 (Ramp callback), #25252 (JWT OAuth2 override), #25254 (AWS GovCloud mode), #25258 (batch-limit cleanup), #25334 (router custom_llm_provider), #25345 (Triton embeddings), #25347 (tag-based routing), #25358 (Baseten pricing attribution) - Add @kedarthakkar to new contributors (first-ever PR via #23769) - Update RELEASE_NOTES_GENERATION_INSTRUCTIONS: require walking git log range between release tags in addition to staging PRs, and verify new-contributor status per author rather than trusting the GH release body floor
- Add 8 content PRs that merged directly to the release branch outside the listed staging PRs: BerriAI#23769 (Ramp callback), BerriAI#25252 (JWT OAuth2 override), BerriAI#25254 (AWS GovCloud mode), BerriAI#25258 (batch-limit cleanup), BerriAI#25334 (router custom_llm_provider), BerriAI#25345 (Triton embeddings), BerriAI#25347 (tag-based routing), BerriAI#25358 (Baseten pricing attribution) - Add @kedarthakkar to new contributors (first-ever PR via BerriAI#23769) - Update RELEASE_NOTES_GENERATION_INSTRUCTIONS: require walking git log range between release tags in addition to staging PRs, and verify new-contributor status per author rather than trusting the GH release body floor
- Add 8 content PRs that merged directly to the release branch outside the listed staging PRs: #23769 (Ramp callback), #25252 (JWT OAuth2 override), #25254 (AWS GovCloud mode), #25258 (batch-limit cleanup), #25334 (router custom_llm_provider), #25345 (Triton embeddings), #25347 (tag-based routing), #25358 (Baseten pricing attribution) - Add @kedarthakkar to new contributors (first-ever PR via #23769) - Update RELEASE_NOTES_GENERATION_INSTRUCTIONS: require walking git log range between release tags in addition to staging PRs, and verify new-contributor status per author rather than trusting the GH release body floor
…ATE (#25257)
Configurable batch limit (default 1000) for stale managed object cleanup, preventing unbounded UPDATE queries from hitting 300K+ rows at once.
Two fixes to _cleanup_stale_managed_objects:
Replace unbounded update_many with a single execute_raw using a subquery LIMIT, capping each poll cycle to STALE_OBJECT_CLEANUP_BATCH_SIZE rows. Zero rows loaded into Python memory — everything stays in Postgres. Uses the same PostgreSQL raw-SQL pattern as spend_log_cleanup.py (the proxy requires PostgreSQL per schema.prisma).
Extract _expire_stale_rows as a separate method for testability.
Keeps the file_purpose='response' filter to avoid incorrectly expiring long-running batch or fine-tune jobs that legitimately exceed the staleness cutoff.
Relevant issues
Pre-Submission checklist
Please complete all items before asking a LiteLLM maintainer to review your PR
tests/test_litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unit@greptileaiand received a Confidence Score of at least 4/5 before requesting a maintainer reviewDelays in PR merge?
If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).
CI (LiteLLM team)
Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:
Type
🆕 New Feature
🐛 Bug Fix
🧹 Refactoring
📖 Documentation
🚄 Infrastructure
✅ Test
Changes