[OPIK-5247] [BE] fix: preserve assertion results and status after experiment item aggregation#5999
Conversation
…eriment item aggregation When experiment items are aggregated into experiment_item_aggregates, the has_aggregated branch in ExperimentItemDAO was returning an empty string for assertions_array instead of reading the stored value from the aggregates table. Similarly, DatasetItemVersionDAO was live-joining assertion_results instead of reading from experiment_item_aggregates. Changes: - ExperimentItemDAO: has_aggregated branch now reads ei.assertions_array from experiment_item_aggregates instead of returning '' AS assertions_array - DatasetItemVersionDAO: has_aggregated branch now reads eia.assertions_array from experiment_item_aggregates instead of live-joining assertion_results - ExperimentAggregatesDAO: added GET_ASSERTIONS_ARRAY aggregation step that collects assertion names, pass/fail values, and reasons into a JSON array stored in experiment_item_aggregates.assertions_array - ExperimentSourceData: added assertionsArray field to carry aggregated data - Migration 000076: adds assertions_array String DEFAULT '[]' column to experiment_item_aggregates - ExperimentAggregatesIntegrationTest: two new integration tests verifying that assertionResults and status are preserved before and after aggregation in both the stream endpoint and the dataset items view Implements OPIK-5247
|
🔄 Test environment deployment process has started Phase 1: Deploying base version You can monitor the progress here. |
Backend Tests - Integration Group 15 26 files 26 suites 3m 18s ⏱️ For more details on these errors, see this check. Results for commit 0232a93. |
|
✅ Test environment is now available! To configure additional Environment variables for your environment, run [Deploy Opik AdHoc Environment workflow] (https://github.com/comet-ml/comet-deployment/actions/workflows/deploy_opik_adhoc_env.yaml) Access Information
The deployment has completed successfully and the version has been verified. |
Details
After experiment items are aggregated into
experiment_item_aggregates, two DAO branches responsible for reading pre-computed data were silently discardingassertionResultsandstatus:ExperimentItemDAO(has_aggregatedbranch): was returning'' AS assertions_array(hardcoded empty string) instead of readingei.assertions_arrayfrom the aggregates table.DatasetItemVersionDAO(has_aggregatedbranch): was live-joiningassertion_resultsat query time instead of reading the already-aggregatedeia.assertions_arraycolumn.This caused
assertionResultsandstatusto disappear from experiment items after aggregation runs.Fix:
ExperimentItemDAO: readei.assertions_arrayfromexperiment_item_aggregatesDatasetItemVersionDAO: readeia.assertions_arrayfromexperiment_item_aggregatesinstead of joiningassertion_resultsExperimentAggregatesDAO: addedGET_ASSERTIONS_ARRAYaggregation step that serializes assertion results (name, passed, reason) into a JSON array and stores it inassertions_arrayExperimentSourceData: addedassertionsArrayfield to carry the aggregated data through the pipeline000076: addsassertions_array String DEFAULT '[]'column toexperiment_item_aggregatesExperimentAggregatesIntegrationTest: two new integration tests covering both the stream endpoint and dataset items view, verifying thatassertionResultsandstatusare identical before and after aggregationChange checklist
Issues
Testing
Two new integration tests added to
ExperimentAggregatesIntegrationTest:assertionResultsArePreservedAfterExperimentItemAggregation— creates an evaluation suite experiment with assertion scores on one trace, verifiesassertionResultsandstatusare present in the raw path, runs aggregation, then verifies the aggregated path returns identical data via whole-object recursive comparison.assertionResultsInDatasetItemsArePreservedAfterAggregation— same scenario via the dataset items endpoint, verifies before/after parity usingassertDatasetItemsWithExperimentItems(which now includes order-insensitiveassertionResultscomparison).Documentation
No documentation changes required. Internal fix to aggregation pipeline.