fix: batch-limit stale managed object cleanup to prevent 300K row UPDATE#25257
Conversation
Configurable batch limit (default 1000) for stale managed object cleanup, preventing unbounded UPDATE queries from hitting 300K+ rows at once.
Two fixes to _cleanup_stale_managed_objects: 1. Replace unbounded update_many with a single execute_raw using a subquery LIMIT, capping each poll cycle to STALE_OBJECT_CLEANUP_BATCH_SIZE rows. Zero rows loaded into Python memory — everything stays in Postgres. Uses the same PostgreSQL raw-SQL pattern as spend_log_cleanup.py (the proxy requires PostgreSQL per schema.prisma). 2. Extract _expire_stale_rows as a separate method for testability. Keeps the file_purpose='response' filter to avoid incorrectly expiring long-running batch or fine-tune jobs that legitimately exceed the staleness cutoff.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Greptile SummaryThis PR adds a
Confidence Score: 3/5Not safe to merge — violates explicit CLAUDE.md raw SQL prohibition and leaves all stale-cleanup tests broken Two P1 issues: (1) execute_raw directly contradicts the project's CLAUDE.md rule against raw SQL in proxy DB operations; (2) all stale-cleanup-related tests fail with IndexError/AssertionError when run because execute_raw is not mocked as AsyncMock in the fixture, and these failures are hidden from make test-unit enterprise/litellm_enterprise/proxy/common_utils/check_responses_cost.py (raw SQL usage) and tests/proxy_unit_tests/test_check_responses_cost.py (stale mocks)
|
| Filename | Overview |
|---|---|
| enterprise/litellm_enterprise/proxy/common_utils/check_responses_cost.py | Adds _expire_stale_rows using execute_raw (violates CLAUDE.md raw SQL prohibition); corresponding tests are broken due to missing AsyncMock setup for execute_raw in the fixture |
| litellm/constants.py | Correctly adds STALE_OBJECT_CLEANUP_BATCH_SIZE constant with env override and max(1,...) guard — no issues |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[check_responses_cost] --> B[_cleanup_stale_managed_objects]
B --> C[cutoff = now - STALENESS_CUTOFF_DAYS]
C --> D[_expire_stale_rows\ncutoff, BATCH_SIZE]
D --> E[execute_raw UPDATE\nvia subquery LIMIT]
E --> F{rows_updated > 0?}
F -- Yes --> G[log warning]
F -- No --> H[continue to job poll]
G --> H
H --> I[find_many queued/in_progress jobs]
I --> J[process each job]
J --> K[update_many completed jobs]
Comments Outside Diff (1)
-
tests/proxy_unit_tests/test_check_responses_cost.py, line 19-24 (link)Tests are stale and will fail against the new implementation
The fixture sets
client.db = MagicMock(), which meansclient.db.execute_rawis a plainMagicMock— not anAsyncMock. In Python 3.8+,await MagicMock()raisesTypeError: object MagicMock can't be used in 'await' expression.When
_expire_stale_rowsis called, thatTypeErroris silently swallowed by theexcept Exceptionhandler incheck_responses_cost. As a result every test that asserts on stale-cleanup behaviour fails:test_cleanup_stale_managed_objects(line 86):update_manyis never called for stale cleanup, socalls[0]raisesIndexErrortest_check_responses_cost_with_completed_response(line 108),test_check_responses_cost_with_failed_response(line 155),test_check_responses_cost_with_cancelled_response(line 196),test_check_responses_cost_multiple_jobs(line 353): each expectslen(calls) == 2(stale + completion) but only sees 1 →AssertionErrortest_check_responses_cost_with_in_progress_response(line 238),test_check_responses_cost_with_queued_response(line 279),test_check_responses_cost_with_exception(line 319): each checkscalls[0][1]["data"] == {"status": "stale_expired"}butcallsis empty →IndexError
These failures are invisible to
make test-unitbecause that command only runstests/test_litellm/, nottests/proxy_unit_tests/.Regardless of whether
execute_rawis kept or replaced, the fixture must be updated:client.db.execute_raw = AsyncMock(return_value=0)
and assertions should verify
execute_rawcall args instead ofupdate_many.
Reviews (1): Last reviewed commit: "Batch-limit stale managed object cleanup..." | Re-trigger Greptile
| return await self.prisma_client.db.execute_raw( | ||
| """ | ||
| UPDATE "LiteLLM_ManagedObjectTable" | ||
| SET "status" = 'stale_expired' | ||
| WHERE "id" IN ( | ||
| SELECT "id" FROM "LiteLLM_ManagedObjectTable" | ||
| WHERE "file_purpose" = 'response' | ||
| AND "status" NOT IN ('completed', 'complete', 'failed', 'expired', 'cancelled', 'stale_expired') | ||
| AND "created_at" < $1::timestamptz | ||
| ORDER BY "created_at" ASC | ||
| LIMIT $2 | ||
| ) | ||
| """, | ||
| cutoff, | ||
| batch_size, | ||
| ) |
There was a problem hiding this comment.
Raw SQL violates project coding standard
The project's CLAUDE.md explicitly prohibits this:
"Do not write raw SQL for proxy DB operations. Use Prisma model methods instead of
execute_raw/query_raw."
The _expire_stale_rows method calls execute_raw with PostgreSQL-specific syntax ($1::timestamptz, double-quoted identifiers, LIMIT in a subquery) — exactly the pattern the rule was written to prevent.
The compliant alternative is a two-step Prisma approach that still keeps the UPDATE bounded to batch_size rows with only IDs flowing through Python:
async def _expire_stale_rows(self, cutoff: datetime, batch_size: int) -> int:
stale_rows = await self.prisma_client.db.litellm_managedobjecttable.find_many(
where={
"file_purpose": "response",
"status": {"not_in": ["completed", "complete", "failed", "expired", "cancelled", "stale_expired"]},
"created_at": {"lt": cutoff},
},
take=batch_size,
order={"created_at": "asc"},
select={"id": True},
)
if not stale_rows:
return 0
return await self.prisma_client.db.litellm_managedobjecttable.update_many(
where={"id": {"in": [row.id for row in stale_rows]}},
data={"status": "stale_expired"},
)This keeps everything in Prisma ORM, matches the project's existing patterns, and still bounds the UPDATE to batch_size rows.
Context Used: CLAUDE.md (source)
#25258) * fix: batch-limit stale managed object cleanup to prevent 300K row UPDATE (#25257) * Add STALE_OBJECT_CLEANUP_BATCH_SIZE constant Configurable batch limit (default 1000) for stale managed object cleanup, preventing unbounded UPDATE queries from hitting 300K+ rows at once. * Batch-limit stale managed object cleanup with single bounded SQL query Two fixes to _cleanup_stale_managed_objects: 1. Replace unbounded update_many with a single execute_raw using a subquery LIMIT, capping each poll cycle to STALE_OBJECT_CLEANUP_BATCH_SIZE rows. Zero rows loaded into Python memory — everything stays in Postgres. Uses the same PostgreSQL raw-SQL pattern as spend_log_cleanup.py (the proxy requires PostgreSQL per schema.prisma). 2. Extract _expire_stale_rows as a separate method for testability. Keeps the file_purpose='response' filter to avoid incorrectly expiring long-running batch or fine-tune jobs that legitimately exceed the staleness cutoff. * docs: add STALE_OBJECT_CLEANUP_BATCH_SIZE to env vars reference * test: remove deprecated embed-english-v2.0 cohere embedding tests
BerriAI#25258) * fix: batch-limit stale managed object cleanup to prevent 300K row UPDATE (BerriAI#25257) * Add STALE_OBJECT_CLEANUP_BATCH_SIZE constant Configurable batch limit (default 1000) for stale managed object cleanup, preventing unbounded UPDATE queries from hitting 300K+ rows at once. * Batch-limit stale managed object cleanup with single bounded SQL query Two fixes to _cleanup_stale_managed_objects: 1. Replace unbounded update_many with a single execute_raw using a subquery LIMIT, capping each poll cycle to STALE_OBJECT_CLEANUP_BATCH_SIZE rows. Zero rows loaded into Python memory — everything stays in Postgres. Uses the same PostgreSQL raw-SQL pattern as spend_log_cleanup.py (the proxy requires PostgreSQL per schema.prisma). 2. Extract _expire_stale_rows as a separate method for testability. Keeps the file_purpose='response' filter to avoid incorrectly expiring long-running batch or fine-tune jobs that legitimately exceed the staleness cutoff. * docs: add STALE_OBJECT_CLEANUP_BATCH_SIZE to env vars reference * test: remove deprecated embed-english-v2.0 cohere embedding tests
Fixes unbounded UPDATE queries in
_cleanup_stale_managed_objectsthat could hit 300K+ rows at once.Two changes:
STALE_OBJECT_CLEANUP_BATCH_SIZEconstant (default 1000) tolitellm/constants.pyupdate_manywith a singleexecute_rawusing a subquery LIMIT — zero rows loaded into Python memory, everything stays in Postgres. Same pattern asspend_log_cleanup.py.Also extracts
_expire_stale_rowsas a separate method for testability.Relevant issues
Pre-Submission checklist
tests/test_litellm/make test-unitCI (LiteLLM team)
Branch creation CI run
Link:
CI run for the last commit
Link: