[OPIK-5599] [BE] Fix experiment items not displayed after evaluation suite deletion#6053
Conversation
…suite deletion When an evaluation suite (dataset) was deleted, subsequent requests for dataset items with experiment items returned 404 because DatasetItemService unconditionally called datasetService.findById(), which throws when the dataset no longer exists. Changes: - DatasetService: add verifyVisibilityIfExists() — only throws 404 when the dataset exists but fails the visibility check; silently succeeds when deleted - DatasetItemService.getItems(): replace the blocking findById() guard with a reactive Mono.fromCallable wrapping verifyVisibilityIfExists(), allowing requests to proceed after deletion - DatasetItemService.getItemsWithExperimentItems(): when getFallbackVersionId() returns empty (all versions deleted), fall back to the legacy dao.getItems() so experiment items are still returned from ClickHouse - DatasetsResourceTest: add regression test covering the deleted-suite scenario
|
🔄 Test environment deployment process has started Phase 1: Deploying base version You can monitor the progress here. |
Backend Tests - Integration Group 15 25 files 25 suites 4m 53s ⏱️ For more details on these errors, see this check. Results for commit b37bb47. ♻️ This comment has been updated with latest results. |
| return Mono.deferContextual(ctx -> { | ||
| String workspaceId = ctx.get(RequestContext.WORKSPACE_ID); | ||
| Visibility visibility = ctx.get(RequestContext.VISIBILITY); |
There was a problem hiding this comment.
getItems reads RequestContext.VISIBILITY from the Reactor context but callers like DatasetExpansionService don't set it — should we default to a visibility (e.g. PRIVATE) when missing or ensure non-HTTP callers set VISIBILITY before calling getItems?
Finding type: Logical Bugs | Severity: 🔴 High
Want Baz to fix this for you? Activate Fixer
Other fix methods
Prompt for AI Agents:
Before applying, verify this suggestion against the current code. In
apps/opik-backend/src/main/java/com/comet/opik/domain/DatasetItemService.java around
lines 1164-1169, the getItems method unconditionally calls
ctx.get(RequestContext.VISIBILITY) which throws when VISIBILITY is not present in the
Reactor context. Change this to safely obtain a visibility with a default (for example,
Visibility.PRIVATE) when the key is missing (e.g., use
ctx.getOrDefault(RequestContext.VISIBILITY, Visibility.PRIVATE) or check containsKey and
default). Ensure the rest of the method uses this defaulted visibility. Optionally,
add/update unit tests and document that non-HTTP callers should also set VISIBILITY if a
non-default behavior is required.
There was a problem hiding this comment.
I guess our tests are just mocking this service, this wasn't noticed in the local testing. Worth to check, Thiago.
There was a problem hiding this comment.
Commit aefebe6 addressed this comment by updating DatasetExpansionService to write the entire RequestContext (via AsyncUtils.setRequestContext) before calling DatasetItemService.getItems. That helper copies VISIBILITY along with workspace/user info, so non-HTTP callers now supply the visibility required by getItems.
There was a problem hiding this comment.
Good catch! Fixed in commit aefebe6 — DatasetExpansionService now uses AsyncUtils.setRequestContext(ctx, requestContext) which copies all context keys including VISIBILITY, so non-HTTP callers always supply the required visibility before calling getItems.
There was a problem hiding this comment.
Thanks for the update—per DatasetExpansionService (around the new .contextWrite(ctx -> AsyncUtils.setRequestContext(ctx, requestContext)) in getItems), the full RequestContext, including VISIBILITY, is now propagated before calling DatasetItemService#getItems, so non-HTTP callers no longer miss that value.
Backend Tests - Integration Group 1411 tests 411 ✅ 13m 40s ⏱️ Results for commit f9f70fb. ♻️ This comment has been updated with latest results. |
|
✅ Test environment is now available! To configure additional Environment variables for your environment, run [Deploy Opik AdHoc Environment workflow] (https://github.com/comet-ml/comet-deployment/actions/workflows/deploy_opik_adhoc_env.yaml) Access Information
The deployment has completed successfully and the version has been verified. |
ldaugusto
left a comment
There was a problem hiding this comment.
Some points to check before merge
| return Mono.deferContextual(ctx -> { | ||
| String workspaceId = ctx.get(RequestContext.WORKSPACE_ID); | ||
| Visibility visibility = ctx.get(RequestContext.VISIBILITY); |
There was a problem hiding this comment.
I guess our tests are just mocking this service, this wasn't noticed in the local testing. Worth to check, Thiago.
- DatasetExpansionService: use AsyncUtils.setRequestContext to ensure VISIBILITY is set in Reactor context - DatasetService.verifyVisibilityIfExists: add debug log documenting the silent-pass when dataset not found - DatasetsResourceTest: fix pre-existing typos; use assertPageAndContent to also assert traceId on returned experiment items
ldaugusto
left a comment
There was a problem hiding this comment.
Thanks for addressing those points!
…ing to null When dataset items are queried via LEFT JOIN after the dataset is deleted, ClickHouse returns the epoch (1970-01-01T00:00:00Z) for DateTime fields where no row matched. Map epoch instants to null in DatasetItemResultMapper so callers receive null instead of the sentinel epoch value. Update assertPageAndContent to handle null timestamps in expected items, branching between asserting null vs asserting isAfter.
Make nullIfEpoch package-private in DatasetItemResultMapper and use it in DatasetItemVersionDAO, which has the same LEFT JOIN queries against deleted dataset tables that can return epoch timestamps.
Details
When an evaluation suite (dataset with
type = 'evaluation_suite') was deleted, any subsequent request to retrieve dataset items with experiment items for that suite returned 404. This happened becauseDatasetItemService.getItems()unconditionally calleddatasetService.findById(), which throwsNotFoundExceptionwhen the dataset no longer exists.Root causes fixed:
DatasetItemService.getItems(): Removed the blockingfindById()guard and replaced it with a reactiveMono.fromCallablewrapping a newverifyVisibilityIfExists()method. The new method only throws 404 if the dataset exists but fails the visibility check — it silently succeeds when the dataset is already deleted, allowing the request to proceed.DatasetItemService.getItemsWithExperimentItems(): WhengetFallbackVersionId()returns empty (because all MySQL dataset versions were deleted alongside the suite), the code now falls back to the legacydao.getItems()query so that experiment items stored in ClickHouse are still returned.DatasetService: AddedverifyVisibilityIfExists(UUID id, String workspaceId, Visibility visibility)— a visibility-only check that skips deleted datasets.Change checklist
Issues
AI-WATERMARK
AI-WATERMARK: yes
Testing
Regression test added:
DatasetsResourceTest#find__whenEvaluationSuiteIsDeleted__thenExperimentItemsShouldStillBeRetrievableTest scenario:
GET /datasets/{id}/items/experimentsstill returns the experiment item (HTTP 200)All 6 tests in
FindDatasetItemsWithExperimentItemspass.Note on returned data: After deletion the
dataset_itemsClickHouse data is gone (items were stored indataset_item_versionsvia the versioning feature). The legacy fallback query returns synthetic items built fromexperiment_itemsonly, sotraceId,spanId,source,description,evaluators, andexecutionPolicyarenullon the returned dataset item. The experiment items themselves remain fully populated.Documentation
No documentation changes required — this is a bug fix restoring previously expected behaviour.