[CM-7071] llm sdk allow users to logs their chain execution by alexkuzmik · Pull Request #7 · comet-ml/opik

alexkuzmik · 2023-05-24T19:56:06Z

No description provided.

…ain-execution

@SuppressWarnings

…trics (#3969) * [NA] [BE] Upgrade MySQL container from Testcontainers * Fix imports order * [OPIK-2856] [BE] Implement UUIDv7 time-based filtering for traces - Add InstantToUUIDMapper to convert Instant timestamps to UUIDv7 bounds - Add InstantParamConverter to parse ISO-8601 and epoch millisecond time parameters - Update TracesResource to accept from_time and to_time parameters on /traces and /traces/stats endpoints - Update TraceDAO to apply UUID-based time filtering using BETWEEN clause on id column - Update TraceSearchCriteria to include uuidFromTime and uuidToTime fields - Add comprehensive integration tests for time filtering with boundary conditions - All tests passing: 10/10 time filtering tests + validation tests * [OPIK-2856] Address PR review comments: improve exception handling and type safety - Fix InstantParamConverter to catch specific DateTimeParseException instead of generic Exception - Add debug logging when falling back to epoch milliseconds parsing - Refactor anonymous ParamConverter class to named InstantConverter inner class for clarity - Suppress unchecked cast with @SuppressWarnings annotation - Fix MySQLContainerUtils return type to use MySQLContainer<?> for type safety * [OPIK-2856] Fix InstantToUUIDMapperTest to match implementation - Update tests to reflect that toUpperBound uses next millisecond (+1ms) for inclusive BETWEEN queries - Remove outdated assertions expecting same timestamp in both bounds - Verify upper bound is lexicographically greater than lower bound - All 13 tests now passing * Remove setup duplicated code * Revision 2: Address PR review comments - LOW priority fixes - #7: Add INSTANCE singleton pattern for InstantConverter - #8: Use StringUtils.isEmpty() for null-safe empty check - Note: #2 and #3 already addressed in previous commit * Revision 3: Use IdGenerator.getTimeOrderedEpoch() for UUID bounds - Simplified InstantToUUIDMapper to use IdGenerator.getTimeOrderedEpoch() instead of convertOtelIdToUUIDv7 - Per UUIDv7 RFC, sub-millisecond 12 bits are optional with millisecond granularity - Start/end interval semantics with ±1ms ensures correct BETWEEN query results - This approach has been battle-tested for months without issues per reviewer recommendation - Converted InstantToUUIDMapper to @singleton service for proper DI integration - Updated TracesResource to inject InstantToUUIDMapper dependency - Updated tests to properly mock IdGenerator dependency * Fix tests * [OPIK-2856] [BE] Split Get spans Tests * Fix format * Revision 2: Extract duplicated span creation logic into createSpanWithTimestamp helper method * [OPIK-2856] [BE] Implement UUIDv7 time-based filtering for spans * Revision 3: Extract workspace setup duplication into setupTestWorkspace helper method and fix transformTestParams call * [OPIK-2856] Refactor TracesResourceTest to use TraceResourceClient instead of direct URL_TEMPLATE calls - Replace all direct client.target(URL_TEMPLATE) calls with TraceResourceClient methods - Add callFeedbackScoresWithCookie method to TraceResourceClient for session token authentication - Add callRetrieveThreadResponseWithCookie method to TraceResourceClient for session token authentication - Fix feedback batch endpoint by using callFeedbackScores and callFeedbackScoresWithCookie - Add null checks for query parameters to prevent NPE errors - Fix API key vs session token usage in authentication tests - Rename get__whenApiKeyIsPresent__thenReturnTraceThread to get__whenSessionTokenIsPresent__thenReturnTraceThread in SessionTokenCookie class - Add mockGetWorkspaceIdByName() calls for proper workspace mocking - Preserve original test assertions and behavior - All tests properly refactored to use resource client methods instead of direct HTTP calls * [OPIK-2856] Remove duplicate methods from TraceResourceClient - Remove callGetTraces() - duplicate of callGetTracesWithQueryParams() - Remove callSearchTraces() - duplicate of callSearchTracesStream() - Reduced code duplication and maintenance burden * Fix tests * Revision 2: Address Copilot review comments - remove redundant wrapper method and add clarifying comment * Revision 3: Extract duplicated path splitting logic into helper method addPathSegments() * Revision 4: Make getWebTarget() private and add callGetTraceThreadsWithSorting() public method * Revision 7: Move addPathSegments() and addQueryParameters() helper methods to BaseCommentResourceClient * [OPIK-2856] [BE] Add UUIDv7 time-based filtering for trace threads * Revision 2: Address GitHub Copilot PR review comments - Extract conditional UUID generation into generateThreadModelId() method for better readability - Rename minTraceTimestamp to earliestTraceTimestamp for clarity - Add explanatory comment about UUIDv7 lexicographic ordering in compareTo() * Fix * Revision 3: Add UUID time filter to SELECT_TRACES_STATS query * Revision 4: Fix generateUUIDForTimestamp to manually construct UUIDv7 * [OPIK-2856] [BE] Implement UUIDv7-based time filtering for project metrics - Add uuidFromTime and uuidToTime fields to ProjectMetricRequest - Update ProjectMetricsService to enrich requests with UUID bounds using InstantToUUIDMapper - Refactor ProjectMetricsDAO SQL queries to use UUID-based filtering (id BETWEEN uuid_from_time AND uuid_to_time) - Extract timestamps from UUIDs using UUIDv7ToDateTime for bucketing and WITH FILL clauses - Update TraceService to generate UUIDs based on trace startTime when ID is not provided - Fix ProjectMetricsResourceTest to generate UUIDs with correct timestamps using TimeBasedEpochGenerator - Remove explicit openTraceThread calls in tests to allow traces to create thread metadata with correct timestamps All 206 ProjectMetricsResourceTest tests now passing (1 skipped). * [OPIK-2856] Fix flaky MultiValueFeedbackScoresE2ETest by ensuring UUID bounds are min/max for timestamp * [OPIK-2856] Update InstantToUUIDMapper tests to match new min/max UUID implementation * [OPIK-2856] Address Copilot PR review comments: clarify 62-bit constant and update validateProject comment * [OPIK-2856] [BE] Extract UUID utility for test reuse - Create UUIDTestUtils with generateUUIDForTimestamp method - Replace local implementations in ProjectMetricsResourceTest - Replace local implementations in FindSpansResourceTest - Remove duplicate method definitions and unused imports - Centralize UUID generation logic for time-based testing Tests verified: ✅ ProjectMetricsResourceTest (206 tests passed, 1 skipped) * Revert id changes

@SuppressWarnings

…bs (#3977) * [NA] [BE] Upgrade MySQL container from Testcontainers * Fix imports order * [OPIK-2856] [BE] Implement UUIDv7 time-based filtering for traces - Add InstantToUUIDMapper to convert Instant timestamps to UUIDv7 bounds - Add InstantParamConverter to parse ISO-8601 and epoch millisecond time parameters - Update TracesResource to accept from_time and to_time parameters on /traces and /traces/stats endpoints - Update TraceDAO to apply UUID-based time filtering using BETWEEN clause on id column - Update TraceSearchCriteria to include uuidFromTime and uuidToTime fields - Add comprehensive integration tests for time filtering with boundary conditions - All tests passing: 10/10 time filtering tests + validation tests * [OPIK-2856] Address PR review comments: improve exception handling and type safety - Fix InstantParamConverter to catch specific DateTimeParseException instead of generic Exception - Add debug logging when falling back to epoch milliseconds parsing - Refactor anonymous ParamConverter class to named InstantConverter inner class for clarity - Suppress unchecked cast with @SuppressWarnings annotation - Fix MySQLContainerUtils return type to use MySQLContainer<?> for type safety * [OPIK-2856] Fix InstantToUUIDMapperTest to match implementation - Update tests to reflect that toUpperBound uses next millisecond (+1ms) for inclusive BETWEEN queries - Remove outdated assertions expecting same timestamp in both bounds - Verify upper bound is lexicographically greater than lower bound - All 13 tests now passing * Remove setup duplicated code * Revision 2: Address PR review comments - LOW priority fixes - #7: Add INSTANCE singleton pattern for InstantConverter - #8: Use StringUtils.isEmpty() for null-safe empty check - Note: #2 and #3 already addressed in previous commit * Revision 3: Use IdGenerator.getTimeOrderedEpoch() for UUID bounds - Simplified InstantToUUIDMapper to use IdGenerator.getTimeOrderedEpoch() instead of convertOtelIdToUUIDv7 - Per UUIDv7 RFC, sub-millisecond 12 bits are optional with millisecond granularity - Start/end interval semantics with ±1ms ensures correct BETWEEN query results - This approach has been battle-tested for months without issues per reviewer recommendation - Converted InstantToUUIDMapper to @singleton service for proper DI integration - Updated TracesResource to inject InstantToUUIDMapper dependency - Updated tests to properly mock IdGenerator dependency * Fix tests * [OPIK-2856] [BE] Split Get spans Tests * Fix format * Revision 2: Extract duplicated span creation logic into createSpanWithTimestamp helper method * [OPIK-2856] [BE] Implement UUIDv7 time-based filtering for spans * Revision 3: Extract workspace setup duplication into setupTestWorkspace helper method and fix transformTestParams call * [OPIK-2856] Refactor TracesResourceTest to use TraceResourceClient instead of direct URL_TEMPLATE calls - Replace all direct client.target(URL_TEMPLATE) calls with TraceResourceClient methods - Add callFeedbackScoresWithCookie method to TraceResourceClient for session token authentication - Add callRetrieveThreadResponseWithCookie method to TraceResourceClient for session token authentication - Fix feedback batch endpoint by using callFeedbackScores and callFeedbackScoresWithCookie - Add null checks for query parameters to prevent NPE errors - Fix API key vs session token usage in authentication tests - Rename get__whenApiKeyIsPresent__thenReturnTraceThread to get__whenSessionTokenIsPresent__thenReturnTraceThread in SessionTokenCookie class - Add mockGetWorkspaceIdByName() calls for proper workspace mocking - Preserve original test assertions and behavior - All tests properly refactored to use resource client methods instead of direct HTTP calls * [OPIK-2856] Remove duplicate methods from TraceResourceClient - Remove callGetTraces() - duplicate of callGetTracesWithQueryParams() - Remove callSearchTraces() - duplicate of callSearchTracesStream() - Reduced code duplication and maintenance burden * Fix tests * Revision 2: Address Copilot review comments - remove redundant wrapper method and add clarifying comment * Revision 3: Extract duplicated path splitting logic into helper method addPathSegments() * Revision 4: Make getWebTarget() private and add callGetTraceThreadsWithSorting() public method * Revision 7: Move addPathSegments() and addQueryParameters() helper methods to BaseCommentResourceClient * [OPIK-2856] [BE] Add UUIDv7 time-based filtering for trace threads * Revision 2: Address GitHub Copilot PR review comments - Extract conditional UUID generation into generateThreadModelId() method for better readability - Rename minTraceTimestamp to earliestTraceTimestamp for clarity - Add explanatory comment about UUIDv7 lexicographic ordering in compareTo() * Fix * Revision 3: Add UUID time filter to SELECT_TRACES_STATS query * Revision 4: Fix generateUUIDForTimestamp to manually construct UUIDv7 * [OPIK-2856] [BE] Implement UUIDv7-based time filtering for project metrics - Add uuidFromTime and uuidToTime fields to ProjectMetricRequest - Update ProjectMetricsService to enrich requests with UUID bounds using InstantToUUIDMapper - Refactor ProjectMetricsDAO SQL queries to use UUID-based filtering (id BETWEEN uuid_from_time AND uuid_to_time) - Extract timestamps from UUIDs using UUIDv7ToDateTime for bucketing and WITH FILL clauses - Update TraceService to generate UUIDs based on trace startTime when ID is not provided - Fix ProjectMetricsResourceTest to generate UUIDs with correct timestamps using TimeBasedEpochGenerator - Remove explicit openTraceThread calls in tests to allow traces to create thread metadata with correct timestamps All 206 ProjectMetricsResourceTest tests now passing (1 skipped). * [OPIK-2856] Fix flaky MultiValueFeedbackScoresE2ETest by ensuring UUID bounds are min/max for timestamp * [OPIK-2856] Update InstantToUUIDMapper tests to match new min/max UUID implementation * [OPIK-2856] Address Copilot PR review comments: clarify 62-bit constant and update validateProject comment * [OPIK-2856] [FE] Add datetime picker to traces, spans, and threads tabs * Revision 2: Synchronize date range across all tabs using shared 'range' key * Revision 3: Add refetchOnMount to ensure data refreshes when switching tabs * Revision 4: Fix TypeScript error - change refetchOnMount from 'stale' to 'always' * Fix date range * Update SpanService.java * Update TraceService.java * Update ProjectMetricsResourceTest.java

@SuppressWarnings

) * [NA] [BE] Upgrade MySQL container from Testcontainers * Fix imports order * [OPIK-2856] [BE] Implement UUIDv7 time-based filtering for traces - Add InstantToUUIDMapper to convert Instant timestamps to UUIDv7 bounds - Add InstantParamConverter to parse ISO-8601 and epoch millisecond time parameters - Update TracesResource to accept from_time and to_time parameters on /traces and /traces/stats endpoints - Update TraceDAO to apply UUID-based time filtering using BETWEEN clause on id column - Update TraceSearchCriteria to include uuidFromTime and uuidToTime fields - Add comprehensive integration tests for time filtering with boundary conditions - All tests passing: 10/10 time filtering tests + validation tests * [OPIK-2856] Address PR review comments: improve exception handling and type safety - Fix InstantParamConverter to catch specific DateTimeParseException instead of generic Exception - Add debug logging when falling back to epoch milliseconds parsing - Refactor anonymous ParamConverter class to named InstantConverter inner class for clarity - Suppress unchecked cast with @SuppressWarnings annotation - Fix MySQLContainerUtils return type to use MySQLContainer<?> for type safety * [OPIK-2856] Fix InstantToUUIDMapperTest to match implementation - Update tests to reflect that toUpperBound uses next millisecond (+1ms) for inclusive BETWEEN queries - Remove outdated assertions expecting same timestamp in both bounds - Verify upper bound is lexicographically greater than lower bound - All 13 tests now passing * Remove setup duplicated code * Revision 2: Address PR review comments - LOW priority fixes - #7: Add INSTANCE singleton pattern for InstantConverter - #8: Use StringUtils.isEmpty() for null-safe empty check - Note: #2 and #3 already addressed in previous commit * Revision 3: Use IdGenerator.getTimeOrderedEpoch() for UUID bounds - Simplified InstantToUUIDMapper to use IdGenerator.getTimeOrderedEpoch() instead of convertOtelIdToUUIDv7 - Per UUIDv7 RFC, sub-millisecond 12 bits are optional with millisecond granularity - Start/end interval semantics with ±1ms ensures correct BETWEEN query results - This approach has been battle-tested for months without issues per reviewer recommendation - Converted InstantToUUIDMapper to @singleton service for proper DI integration - Updated TracesResource to inject InstantToUUIDMapper dependency - Updated tests to properly mock IdGenerator dependency * Fix tests

@SuppressWarnings

* [NA] [BE] Upgrade MySQL container from Testcontainers * Fix imports order * [OPIK-2856] [BE] Implement UUIDv7 time-based filtering for traces - Add InstantToUUIDMapper to convert Instant timestamps to UUIDv7 bounds - Add InstantParamConverter to parse ISO-8601 and epoch millisecond time parameters - Update TracesResource to accept from_time and to_time parameters on /traces and /traces/stats endpoints - Update TraceDAO to apply UUID-based time filtering using BETWEEN clause on id column - Update TraceSearchCriteria to include uuidFromTime and uuidToTime fields - Add comprehensive integration tests for time filtering with boundary conditions - All tests passing: 10/10 time filtering tests + validation tests * [OPIK-2856] Address PR review comments: improve exception handling and type safety - Fix InstantParamConverter to catch specific DateTimeParseException instead of generic Exception - Add debug logging when falling back to epoch milliseconds parsing - Refactor anonymous ParamConverter class to named InstantConverter inner class for clarity - Suppress unchecked cast with @SuppressWarnings annotation - Fix MySQLContainerUtils return type to use MySQLContainer<?> for type safety * [OPIK-2856] Fix InstantToUUIDMapperTest to match implementation - Update tests to reflect that toUpperBound uses next millisecond (+1ms) for inclusive BETWEEN queries - Remove outdated assertions expecting same timestamp in both bounds - Verify upper bound is lexicographically greater than lower bound - All 13 tests now passing * Remove setup duplicated code * Revision 2: Address PR review comments - LOW priority fixes - #7: Add INSTANCE singleton pattern for InstantConverter - #8: Use StringUtils.isEmpty() for null-safe empty check - Note: #2 and #3 already addressed in previous commit * Revision 3: Use IdGenerator.getTimeOrderedEpoch() for UUID bounds - Simplified InstantToUUIDMapper to use IdGenerator.getTimeOrderedEpoch() instead of convertOtelIdToUUIDv7 - Per UUIDv7 RFC, sub-millisecond 12 bits are optional with millisecond granularity - Start/end interval semantics with ±1ms ensures correct BETWEEN query results - This approach has been battle-tested for months without issues per reviewer recommendation - Converted InstantToUUIDMapper to @singleton service for proper DI integration - Updated TracesResource to inject InstantToUUIDMapper dependency - Updated tests to properly mock IdGenerator dependency * Fix tests * [OPIK-2856] [BE] Split Get spans Tests * Fix format

@SuppressWarnings

* [NA] [BE] Upgrade MySQL container from Testcontainers * Fix imports order * [OPIK-2856] [BE] Implement UUIDv7 time-based filtering for traces - Add InstantToUUIDMapper to convert Instant timestamps to UUIDv7 bounds - Add InstantParamConverter to parse ISO-8601 and epoch millisecond time parameters - Update TracesResource to accept from_time and to_time parameters on /traces and /traces/stats endpoints - Update TraceDAO to apply UUID-based time filtering using BETWEEN clause on id column - Update TraceSearchCriteria to include uuidFromTime and uuidToTime fields - Add comprehensive integration tests for time filtering with boundary conditions - All tests passing: 10/10 time filtering tests + validation tests * [OPIK-2856] Address PR review comments: improve exception handling and type safety - Fix InstantParamConverter to catch specific DateTimeParseException instead of generic Exception - Add debug logging when falling back to epoch milliseconds parsing - Refactor anonymous ParamConverter class to named InstantConverter inner class for clarity - Suppress unchecked cast with @SuppressWarnings annotation - Fix MySQLContainerUtils return type to use MySQLContainer<?> for type safety * [OPIK-2856] Fix InstantToUUIDMapperTest to match implementation - Update tests to reflect that toUpperBound uses next millisecond (+1ms) for inclusive BETWEEN queries - Remove outdated assertions expecting same timestamp in both bounds - Verify upper bound is lexicographically greater than lower bound - All 13 tests now passing * Remove setup duplicated code * Revision 2: Address PR review comments - LOW priority fixes - #7: Add INSTANCE singleton pattern for InstantConverter - #8: Use StringUtils.isEmpty() for null-safe empty check - Note: #2 and #3 already addressed in previous commit * Revision 3: Use IdGenerator.getTimeOrderedEpoch() for UUID bounds - Simplified InstantToUUIDMapper to use IdGenerator.getTimeOrderedEpoch() instead of convertOtelIdToUUIDv7 - Per UUIDv7 RFC, sub-millisecond 12 bits are optional with millisecond granularity - Start/end interval semantics with ±1ms ensures correct BETWEEN query results - This approach has been battle-tested for months without issues per reviewer recommendation - Converted InstantToUUIDMapper to @singleton service for proper DI integration - Updated TracesResource to inject InstantToUUIDMapper dependency - Updated tests to properly mock IdGenerator dependency * Fix tests * [OPIK-2856] [BE] Split Get spans Tests * Fix format * Revision 2: Extract duplicated span creation logic into createSpanWithTimestamp helper method * [OPIK-2856] [BE] Implement UUIDv7 time-based filtering for spans * Revision 3: Extract workspace setup duplication into setupTestWorkspace helper method and fix transformTestParams call

@SuppressWarnings

…d of direct URL_TEMPLATE calls (#3947) * [NA] [BE] Upgrade MySQL container from Testcontainers * Fix imports order * [OPIK-2856] [BE] Implement UUIDv7 time-based filtering for traces - Add InstantToUUIDMapper to convert Instant timestamps to UUIDv7 bounds - Add InstantParamConverter to parse ISO-8601 and epoch millisecond time parameters - Update TracesResource to accept from_time and to_time parameters on /traces and /traces/stats endpoints - Update TraceDAO to apply UUID-based time filtering using BETWEEN clause on id column - Update TraceSearchCriteria to include uuidFromTime and uuidToTime fields - Add comprehensive integration tests for time filtering with boundary conditions - All tests passing: 10/10 time filtering tests + validation tests * [OPIK-2856] Address PR review comments: improve exception handling and type safety - Fix InstantParamConverter to catch specific DateTimeParseException instead of generic Exception - Add debug logging when falling back to epoch milliseconds parsing - Refactor anonymous ParamConverter class to named InstantConverter inner class for clarity - Suppress unchecked cast with @SuppressWarnings annotation - Fix MySQLContainerUtils return type to use MySQLContainer<?> for type safety * [OPIK-2856] Fix InstantToUUIDMapperTest to match implementation - Update tests to reflect that toUpperBound uses next millisecond (+1ms) for inclusive BETWEEN queries - Remove outdated assertions expecting same timestamp in both bounds - Verify upper bound is lexicographically greater than lower bound - All 13 tests now passing * Remove setup duplicated code * Revision 2: Address PR review comments - LOW priority fixes - #7: Add INSTANCE singleton pattern for InstantConverter - #8: Use StringUtils.isEmpty() for null-safe empty check - Note: #2 and #3 already addressed in previous commit * Revision 3: Use IdGenerator.getTimeOrderedEpoch() for UUID bounds - Simplified InstantToUUIDMapper to use IdGenerator.getTimeOrderedEpoch() instead of convertOtelIdToUUIDv7 - Per UUIDv7 RFC, sub-millisecond 12 bits are optional with millisecond granularity - Start/end interval semantics with ±1ms ensures correct BETWEEN query results - This approach has been battle-tested for months without issues per reviewer recommendation - Converted InstantToUUIDMapper to @singleton service for proper DI integration - Updated TracesResource to inject InstantToUUIDMapper dependency - Updated tests to properly mock IdGenerator dependency * Fix tests * [OPIK-2856] [BE] Split Get spans Tests * Fix format * Revision 2: Extract duplicated span creation logic into createSpanWithTimestamp helper method * [OPIK-2856] [BE] Implement UUIDv7 time-based filtering for spans * Revision 3: Extract workspace setup duplication into setupTestWorkspace helper method and fix transformTestParams call * [OPIK-2856] Refactor TracesResourceTest to use TraceResourceClient instead of direct URL_TEMPLATE calls - Replace all direct client.target(URL_TEMPLATE) calls with TraceResourceClient methods - Add callFeedbackScoresWithCookie method to TraceResourceClient for session token authentication - Add callRetrieveThreadResponseWithCookie method to TraceResourceClient for session token authentication - Fix feedback batch endpoint by using callFeedbackScores and callFeedbackScoresWithCookie - Add null checks for query parameters to prevent NPE errors - Fix API key vs session token usage in authentication tests - Rename get__whenApiKeyIsPresent__thenReturnTraceThread to get__whenSessionTokenIsPresent__thenReturnTraceThread in SessionTokenCookie class - Add mockGetWorkspaceIdByName() calls for proper workspace mocking - Preserve original test assertions and behavior - All tests properly refactored to use resource client methods instead of direct HTTP calls * [OPIK-2856] Remove duplicate methods from TraceResourceClient - Remove callGetTraces() - duplicate of callGetTracesWithQueryParams() - Remove callSearchTraces() - duplicate of callSearchTracesStream() - Reduced code duplication and maintenance burden * Fix tests * Revision 2: Address Copilot review comments - remove redundant wrapper method and add clarifying comment * Revision 3: Extract duplicated path splitting logic into helper method addPathSegments() * Revision 4: Make getWebTarget() private and add callGetTraceThreadsWithSorting() public method * Revision 7: Move addPathSegments() and addQueryParameters() helper methods to BaseCommentResourceClient

@SuppressWarnings

…3953) * [NA] [BE] Upgrade MySQL container from Testcontainers * Fix imports order * [OPIK-2856] [BE] Implement UUIDv7 time-based filtering for traces - Add InstantToUUIDMapper to convert Instant timestamps to UUIDv7 bounds - Add InstantParamConverter to parse ISO-8601 and epoch millisecond time parameters - Update TracesResource to accept from_time and to_time parameters on /traces and /traces/stats endpoints - Update TraceDAO to apply UUID-based time filtering using BETWEEN clause on id column - Update TraceSearchCriteria to include uuidFromTime and uuidToTime fields - Add comprehensive integration tests for time filtering with boundary conditions - All tests passing: 10/10 time filtering tests + validation tests * [OPIK-2856] Address PR review comments: improve exception handling and type safety - Fix InstantParamConverter to catch specific DateTimeParseException instead of generic Exception - Add debug logging when falling back to epoch milliseconds parsing - Refactor anonymous ParamConverter class to named InstantConverter inner class for clarity - Suppress unchecked cast with @SuppressWarnings annotation - Fix MySQLContainerUtils return type to use MySQLContainer<?> for type safety * [OPIK-2856] Fix InstantToUUIDMapperTest to match implementation - Update tests to reflect that toUpperBound uses next millisecond (+1ms) for inclusive BETWEEN queries - Remove outdated assertions expecting same timestamp in both bounds - Verify upper bound is lexicographically greater than lower bound - All 13 tests now passing * Remove setup duplicated code * Revision 2: Address PR review comments - LOW priority fixes - #7: Add INSTANCE singleton pattern for InstantConverter - #8: Use StringUtils.isEmpty() for null-safe empty check - Note: #2 and #3 already addressed in previous commit * Revision 3: Use IdGenerator.getTimeOrderedEpoch() for UUID bounds - Simplified InstantToUUIDMapper to use IdGenerator.getTimeOrderedEpoch() instead of convertOtelIdToUUIDv7 - Per UUIDv7 RFC, sub-millisecond 12 bits are optional with millisecond granularity - Start/end interval semantics with ±1ms ensures correct BETWEEN query results - This approach has been battle-tested for months without issues per reviewer recommendation - Converted InstantToUUIDMapper to @singleton service for proper DI integration - Updated TracesResource to inject InstantToUUIDMapper dependency - Updated tests to properly mock IdGenerator dependency * Fix tests * [OPIK-2856] [BE] Split Get spans Tests * Fix format * Revision 2: Extract duplicated span creation logic into createSpanWithTimestamp helper method * [OPIK-2856] [BE] Implement UUIDv7 time-based filtering for spans * Revision 3: Extract workspace setup duplication into setupTestWorkspace helper method and fix transformTestParams call * [OPIK-2856] Refactor TracesResourceTest to use TraceResourceClient instead of direct URL_TEMPLATE calls - Replace all direct client.target(URL_TEMPLATE) calls with TraceResourceClient methods - Add callFeedbackScoresWithCookie method to TraceResourceClient for session token authentication - Add callRetrieveThreadResponseWithCookie method to TraceResourceClient for session token authentication - Fix feedback batch endpoint by using callFeedbackScores and callFeedbackScoresWithCookie - Add null checks for query parameters to prevent NPE errors - Fix API key vs session token usage in authentication tests - Rename get__whenApiKeyIsPresent__thenReturnTraceThread to get__whenSessionTokenIsPresent__thenReturnTraceThread in SessionTokenCookie class - Add mockGetWorkspaceIdByName() calls for proper workspace mocking - Preserve original test assertions and behavior - All tests properly refactored to use resource client methods instead of direct HTTP calls * [OPIK-2856] Remove duplicate methods from TraceResourceClient - Remove callGetTraces() - duplicate of callGetTracesWithQueryParams() - Remove callSearchTraces() - duplicate of callSearchTracesStream() - Reduced code duplication and maintenance burden * Fix tests * Revision 2: Address Copilot review comments - remove redundant wrapper method and add clarifying comment * Revision 3: Extract duplicated path splitting logic into helper method addPathSegments() * Revision 4: Make getWebTarget() private and add callGetTraceThreadsWithSorting() public method * Revision 7: Move addPathSegments() and addQueryParameters() helper methods to BaseCommentResourceClient * [OPIK-2856] [BE] Add UUIDv7 time-based filtering for trace threads * Revision 2: Address GitHub Copilot PR review comments - Extract conditional UUID generation into generateThreadModelId() method for better readability - Rename minTraceTimestamp to earliestTraceTimestamp for clarity - Add explanatory comment about UUIDv7 lexicographic ordering in compareTo() * Fix * Revision 3: Add UUID time filter to SELECT_TRACES_STATS query * Revision 4: Fix generateUUIDForTimestamp to manually construct UUIDv7

@SuppressWarnings

* [NA] [BE] Upgrade MySQL container from Testcontainers * Fix imports order * [OPIK-2856] [BE] Implement UUIDv7 time-based filtering for traces - Add InstantToUUIDMapper to convert Instant timestamps to UUIDv7 bounds - Add InstantParamConverter to parse ISO-8601 and epoch millisecond time parameters - Update TracesResource to accept from_time and to_time parameters on /traces and /traces/stats endpoints - Update TraceDAO to apply UUID-based time filtering using BETWEEN clause on id column - Update TraceSearchCriteria to include uuidFromTime and uuidToTime fields - Add comprehensive integration tests for time filtering with boundary conditions - All tests passing: 10/10 time filtering tests + validation tests * [OPIK-2856] Address PR review comments: improve exception handling and type safety - Fix InstantParamConverter to catch specific DateTimeParseException instead of generic Exception - Add debug logging when falling back to epoch milliseconds parsing - Refactor anonymous ParamConverter class to named InstantConverter inner class for clarity - Suppress unchecked cast with @SuppressWarnings annotation - Fix MySQLContainerUtils return type to use MySQLContainer<?> for type safety * [OPIK-2856] Fix InstantToUUIDMapperTest to match implementation - Update tests to reflect that toUpperBound uses next millisecond (+1ms) for inclusive BETWEEN queries - Remove outdated assertions expecting same timestamp in both bounds - Verify upper bound is lexicographically greater than lower bound - All 13 tests now passing * Remove setup duplicated code * Revision 2: Address PR review comments - LOW priority fixes - #7: Add INSTANCE singleton pattern for InstantConverter - #8: Use StringUtils.isEmpty() for null-safe empty check - Note: #2 and #3 already addressed in previous commit * Revision 3: Use IdGenerator.getTimeOrderedEpoch() for UUID bounds - Simplified InstantToUUIDMapper to use IdGenerator.getTimeOrderedEpoch() instead of convertOtelIdToUUIDv7 - Per UUIDv7 RFC, sub-millisecond 12 bits are optional with millisecond granularity - Start/end interval semantics with ±1ms ensures correct BETWEEN query results - This approach has been battle-tested for months without issues per reviewer recommendation - Converted InstantToUUIDMapper to @singleton service for proper DI integration - Updated TracesResource to inject InstantToUUIDMapper dependency - Updated tests to properly mock IdGenerator dependency * Fix tests * [OPIK-2856] [BE] Split Get spans Tests * Fix format * Revision 2: Extract duplicated span creation logic into createSpanWithTimestamp helper method * [OPIK-2856] [BE] Implement UUIDv7 time-based filtering for spans * Revision 3: Extract workspace setup duplication into setupTestWorkspace helper method and fix transformTestParams call * [OPIK-2856] Refactor TracesResourceTest to use TraceResourceClient instead of direct URL_TEMPLATE calls - Replace all direct client.target(URL_TEMPLATE) calls with TraceResourceClient methods - Add callFeedbackScoresWithCookie method to TraceResourceClient for session token authentication - Add callRetrieveThreadResponseWithCookie method to TraceResourceClient for session token authentication - Fix feedback batch endpoint by using callFeedbackScores and callFeedbackScoresWithCookie - Add null checks for query parameters to prevent NPE errors - Fix API key vs session token usage in authentication tests - Rename get__whenApiKeyIsPresent__thenReturnTraceThread to get__whenSessionTokenIsPresent__thenReturnTraceThread in SessionTokenCookie class - Add mockGetWorkspaceIdByName() calls for proper workspace mocking - Preserve original test assertions and behavior - All tests properly refactored to use resource client methods instead of direct HTTP calls * [OPIK-2856] Remove duplicate methods from TraceResourceClient - Remove callGetTraces() - duplicate of callGetTracesWithQueryParams() - Remove callSearchTraces() - duplicate of callSearchTracesStream() - Reduced code duplication and maintenance burden * Fix tests * Revision 2: Address Copilot review comments - remove redundant wrapper method and add clarifying comment * Revision 3: Extract duplicated path splitting logic into helper method addPathSegments() * Revision 4: Make getWebTarget() private and add callGetTraceThreadsWithSorting() public method * Revision 7: Move addPathSegments() and addQueryParameters() helper methods to BaseCommentResourceClient * [OPIK-2856] [BE] Add UUIDv7 time-based filtering for trace threads * Revision 2: Address GitHub Copilot PR review comments - Extract conditional UUID generation into generateThreadModelId() method for better readability - Rename minTraceTimestamp to earliestTraceTimestamp for clarity - Add explanatory comment about UUIDv7 lexicographic ordering in compareTo() * Fix * Revision 3: Add UUID time filter to SELECT_TRACES_STATS query * Revision 4: Fix generateUUIDForTimestamp to manually construct UUIDv7 * [OPIK-2856] [BE] Implement UUIDv7-based time filtering for project metrics - Add uuidFromTime and uuidToTime fields to ProjectMetricRequest - Update ProjectMetricsService to enrich requests with UUID bounds using InstantToUUIDMapper - Refactor ProjectMetricsDAO SQL queries to use UUID-based filtering (id BETWEEN uuid_from_time AND uuid_to_time) - Extract timestamps from UUIDs using UUIDv7ToDateTime for bucketing and WITH FILL clauses - Update TraceService to generate UUIDs based on trace startTime when ID is not provided - Fix ProjectMetricsResourceTest to generate UUIDs with correct timestamps using TimeBasedEpochGenerator - Remove explicit openTraceThread calls in tests to allow traces to create thread metadata with correct timestamps All 206 ProjectMetricsResourceTest tests now passing (1 skipped). * [OPIK-2856] Fix flaky MultiValueFeedbackScoresE2ETest by ensuring UUID bounds are min/max for timestamp * [OPIK-2856] Update InstantToUUIDMapper tests to match new min/max UUID implementation * [OPIK-2856] Address Copilot PR review comments: clarify 62-bit constant and update validateProject comment * [OPIK-2856] [BE] Extract UUID utility for test reuse - Create UUIDTestUtils with generateUUIDForTimestamp method - Replace local implementations in ProjectMetricsResourceTest - Replace local implementations in FindSpansResourceTest - Remove duplicate method definitions and unused imports - Centralize UUID generation logic for time-based testing Tests verified: ✅ ProjectMetricsResourceTest (206 tests passed, 1 skipped) * Revert id changes * [OPIK-2856] [BE] Use batch calls to reduce test duration

Python SDK improvements: - Import module instead of name for ExperimentScore (comment #2) - Allow ExperimentScoreFunction to return single or List of ScoreResults (comment #3) - Move experiment score verification to verify_experiment utility (comment #4) Backend code quality: - Simplify TypeReference diamond operator in ExperimentScore.java (comment #5) - Remove overloaded constructor in FeedbackScoreNames.ScoreName (comment #6) - Reuse ScoreName instead of ScoreNameWithType in DAO (comment #7) - Add TODO for full primary key in ORDER BY (comment #8) - Revert flakiness fix in TemplateUtilsTest.java (comment #9)

…nctions (#3989) * Hide experiment_scores columns in the single experiment table * Add SDK support for experiment_scores * Add SDK support for experiment_scores * Add BE functionality * Typescript autogenerated code * Documentation and FE update * Address PR comments * Address PR comments * Fix PR comments * Address PR comments * Fix merge conflicts * Fix tests * Fix failing tests * Fix failing tests * Fix UI colors and column names * Refactor: Extract common score averaging logic to eliminate duplication * Harmonize experiment scores sorting to use map access from CTE - Add experiment_scores_agg LEFT JOIN to non-grouped queries - Simplify SortingQueryBuilder to use coalesce(map[key]) instead of complex JSON extraction - Remove special case handling for experiment_scores in null direction logic - Addresses PR review comments about query harmonization * Remove early return for empty test results in experiment scores - Allow experiment score functions to handle empty test results - Some functions may want to return baseline/default scores with no data - Addresses PR review comment about preventing score function execution * Add E2E test for experiment scores functionality - Test verifies experiment scoring functions work end-to-end - Validates experiment scores appear in evaluation result - Validates experiment scores are retrievable via SDK API - Uses compute_max_score function to test score aggregation - Addresses PR review comment about E2E test coverage * Enhance experiment score computation to handle empty test results gracefully - Update condition to return empty list if either scoring functions or test results are absent - Ensures robustness in score computation logic * Add Python SDK E2E test for experiment scores - Tests experiment_scoring_functions parameter in evaluate() - Verifies experiment scores are computed and returned in result - Validates scores are persisted to backend API - Tests aggregate metrics (max, min, avg) computation - Addresses PR review comment about SDK test coverage * Revert "Add E2E test for experiment scores functionality" This reverts commit 50f9f8d. * Apply DRY principle to score type mapping in ExperimentFeedbackScoresTab - Extract addScoresToMap helper function to avoid duplication - Works for both feedback_scores and experiment_scores - Reduces code duplication and improves maintainability - Fix parameter ordering (required before optional) * [FE] Apply DRY principle to feedback/experiment scores handling - useExperimentsTableConfig: Extract getScoreByName helper, eliminate duplicate accessorFn logic - useCompareExperimentsChartsData: Extract createScoresMap helper for both score types - CompareExperimentsDetails: Extract markScores helper to avoid duplicate map calls - ExperimentsPage: Extract createScoresMap and getScoreNames helpers - EvaluationSection: Use shared transformExperimentScores utility - experimentScoreUtils: Refactor with formatScores helper to eliminate duplication All changes maintain type safety and pass linting/typecheck * Revision 7: Add missing experiment_scores_agg CTE to FIND query * Revision 8: Fix experiment_scores sorting to use correct CTE alias 'es' * Revision 9: Address all 9 PR review comments Python SDK improvements: - Import module instead of name for ExperimentScore (comment #2) - Allow ExperimentScoreFunction to return single or List of ScoreResults (comment #3) - Move experiment score verification to verify_experiment utility (comment #4) Backend code quality: - Simplify TypeReference diamond operator in ExperimentScore.java (comment #5) - Remove overloaded constructor in FeedbackScoreNames.ScoreName (comment #6) - Reuse ScoreName instead of ScoreNameWithType in DAO (comment #7) - Add TODO for full primary key in ORDER BY (comment #8) - Revert flakiness fix in TemplateUtilsTest.java (comment #9) * Update return type of get_experiment_data method to use rest_api_types for consistency * Revision 10: Add full primary key to ORDER BY clause * Refactor test for standard deviation calculation in experiment scoring functions Replaced hardcoded expected standard deviation value with a dynamic calculation using the statistics.stdev function for improved accuracy and maintainability. * Add experiment_scores column to experiments table in migration 000048 This migration introduces a new column, experiment_scores, to the experiments table to store precomputed metrics. The column is added with a default value of an empty string. A rollback statement is also included to drop the column if necessary. * Update import statement for Prompt in evaluator.py to reflect new module structure * Refactor whitespace in verifiers.py for improved readability This commit removes unnecessary blank lines in the verify_experiment and _verify_experiment_scores functions, enhancing the overall clarity of the code without altering functionality. * Enhance type hinting in dataset and experiment modules This commit adds future annotations to the dataset REST operations and introduces TYPE_CHECKING for conditional imports in the experiment module, improving type hinting and code clarity without affecting functionality. * Update documentation to replace `experiment_scores` with `experiment_scoring_functions` for consistency across evaluation methods * Refactor score type handling in experiment feedback components This commit replaces string literals for score types with constants, enhancing type safety and code clarity across various components, including ExperimentFeedbackScoresTab, ExperimentItemsTab, and related utility functions. The changes ensure consistent usage of SCORE_TYPE_FEEDBACK and SCORE_TYPE_EXPERIMENT throughout the codebase. * Refactor column mapping for sorting functionality This commit consolidates the logic for converting underscore-prefixed column IDs to dot notation into a single array of sortable prefixes. The `mapComplexColumn` function is updated to iterate over this array, improving code clarity and maintainability while ensuring consistent handling of various column types. * Implement ExperimentScoreListCell and refactor score handling in data tables This commit introduces the new ExperimentScoreListCell component for displaying experiment scores and updates the relevant data tables to utilize this component. Additionally, it refactors the handling of score types across various components, replacing string literals with constants for improved type safety and consistency. The changes affect the ExperimentsPage, ProjectsPage, and other related components, ensuring a unified approach to score type management. * Refactor FeedbackScoresChartsWrapper and FeedbackScoreHoverCard for consistency This commit updates the FeedbackScoresChartsWrapper component to rename the `isAggregationScores` prop to `areAggregatedScores` for improved clarity. Additionally, it modifies the subtitle text in the FeedbackScoreHoverCard component to use "Aggregated experiment scores" and "Average feedback scores" for consistency in terminology across the application. * Add experiment scores tab to CompareExperimentsPage and update score handling This commit introduces a new tab for displaying experiment scores in the CompareExperimentsPage. It updates the ExperimentFeedbackScoresTab component to handle both feedback and experiment scores based on the selected tab. The score retrieval logic is modified to filter scores according to their type, enhancing clarity and usability in the comparison of experiments. * run fern generate * Refactor score handling in various components to unify feedback and experiment score logic. Removed experiment score references and updated feedback score components to handle aggregated scores. Adjusted column definitions and metadata across multiple pages for consistency. * Add migration to include experiment_scores column in experiments table --------- Co-authored-by: Daniel Dimenshtein <danield@comet.com> Co-authored-by: Ido Berkovich <ido@comet.com> Co-authored-by: Boris Feld <boris@comet.com> Co-authored-by: YarivHashaiComet <yarivh@comet.com>

- Fix #1: Use trace provider when model not found (provider fallback) - Fix #3: Add role mapping for external roles (tool, function, human, etc.) - Fix #4: Support 'type' property for LangChain/LangGraph messages - Fix #6: Add empty array check in canOpenInPlayground - Fix #7: Check span input before using, fallback to trace input - Fix #9/#10: Handle { messages: [] } case properly

- #7: Remove "Used for testing and verification" from getExperimentFromAggregates javadoc - #8: Replace recursive flatMap with Mono.expand() in populateExperimentItemsInBatches - #10: Remove unrelated subscribeOn addition from DatasetItemService.createVersionFromDelta

@nonnull

…5338) * [OPIK-4380] [BE] Add experiment aggregates for denormalized metrics - Add experiment_aggregates and experiment_item_aggregates tables - Implement ExperimentAggregatesDAO with population and query methods - Add ExperimentAggregatesService for aggregation management - Refactor DTOs into organized model classes: - ExperimentAggregatesModel: aggregation results - ExperimentEntityData: entity models - ExperimentSourceData: raw source data - ExperimentAggregatesUtils: utilities - Add FEEDBACK_SCORES_AGGREGATED filter strategy for map-based filtering - Add comprehensive integration tests (10/10 passing) - Configure batch size and parallelism settings * [OPIK-4380] [BE] Add MySQL deadlock retry mechanism for concurrent dataset operations Problem: - MySQL deadlock on dataset_version_tags composite PRIMARY KEY (workspace_id, dataset_id, tag) - Occurred during parallel dataset creation in same workspace - Multiple threads inserting "latest" tag for different datasets caused lock contention - Experiments with parallel execution were failing with MySQLTransactionRollbackException Solution: - Add handleOnDeadLocks() method in RetryUtils with: - 5 retry attempts with exponential backoff (250ms to 2s) - 0.5 jitter to reduce thundering herd effect - Recursive isDatabaseDeadlock() detection for MySQLTransactionRollbackException - Apply retry logic in DatasetItemService.setDatasetItemVersion() - Enables concurrent dataset creation for same workspace Impact: - Supports parallel experiment execution with proper deadlock handling - Test success rate improved from 0/10 to 10/10 in ExperimentAggregatesIntegrationTest * Fix visibility * [OPIK-4380] [BE] Address PR review comments for experiment aggregates Fixed 11 automated review comments from baz-reviewer: CRITICAL fixes: - Prevent NPE on null span aggregations by adding coalesce() in SQL - Handle multi-project experiments with LIMIT 1 in GET_PROJECT_ID - Handle zero-item experiments with empty aggregation helpers - Bind feedback_scores_percentiles map instead of empty CAST HIGH priority fixes: - Use toDecimal128(12) instead of toDecimal64(9) for cost percentiles - Add null-safe tags handling with Optional.ofNullable() - Include exception objects in retry logging for stack traces MEDIUM priority fixes: - Add missing log_comment to SELECT_EXPERIMENT_BY_ID query - Add missing log_comment to GET_PROJECT_ID query LOW priority fixes: - Remove duplicate "id" binding in bindItemsParameters - Enhance batchSize config documentation with details All 11 integration tests passing. * [OPIK-4380] [BE] Fix table definition * [OPIK-4380] [BE] Address PR comments and consolidate DatasetItemService methods - Fix tags NPE in ExperimentAggregatesDAO with defaultIfNull - Remove unnecessary FINAL clause from GET_EXPERIMENT_DATA query - Fix test naming in ExperimentAggregatesIntegrationTest - Consolidate 7 duplicate createVersionFromDelta methods into single canonical implementation - Remove debug logger from config-test.yml * [OPIK-4380] [BE] Fix missing log_comment and centralize search criteria binding - Fix SELECT_EXPERIMENT_BY_ID to properly render log_comment metadata - Use getSTWithLogComment pattern in getExperimentFromAggregates - Ensures ClickHouse query logging populates workspace/experiment IDs - Centralize ExperimentSearchCriteria binding logic - Create ExperimentSearchCriteriaBinder utility class - Parameterize filter strategies to support both DAO variants - Eliminate 29-line duplication between ExperimentDAO and ExperimentAggregatesDAO - Single source of truth prevents DAOs from getting out of sync * [OPIK-4380] [BE] Fix createVersionFromDelta consolidation after rebase - Update canonical method signature to include new parameters: - List<EvaluatorItem> evaluators - ExecutionPolicy executionPolicy - boolean clearExecutionPolicy - Update all 5 caller sites to pass new parameters: - Use changes.evaluators(), changes.executionPolicy(), changes.clearExecutionPolicy() when available - Pass null/false for auto-generated versions that inherit from base - Add imports for EvaluatorItem and ExecutionPolicy Fixes compilation errors introduced by rebase with upstream changes to DatasetVersionService * [OPIK-4380] [BE] Address PR review comments - fix type mismatch, extract constants, remove DAO logging - Fixed BigDecimal[] to Double[] conversion for experiment_scores (matches ClickHouse Float64) - Extracted FilterStrategy lists to static final constants to avoid repeated allocations - Added @nonnull validation to populateExperimentAggregate parameter - Removed DAO layer logging, keeping service-level logging only * [OPIK-4380] [BE] Address PR review comments - fix TYPE_REFERENCE visibility, redundant IN subquery, hardcoded context keys, Instant.now in loop, and inline defaultIfNull * Revision 4: Address remaining JetoPistola review comments (#7, #8, #10) - #7: Remove "Used for testing and verification" from getExperimentFromAggregates javadoc - #8: Replace recursive flatMap with Mono.expand() in populateExperimentItemsInBatches - #10: Remove unrelated subscribeOn addition from DatasetItemService.createVersionFromDelta * Revision 3: Add switchIfEmpty fallback for deleted traces in populateExperimentAggregate * Fix tests * Revision 6: Move countTotal log from DAO to service layer Operational logs belong in the service layer, not the DAO. * Revision 7: Apply Spotless formatting * Revision 8: Make populateAggregations(UUID, int) private Removes the uncapped public batch size entry point. All callers now go through the public no-arg overload which reads batchSize safely from config. * [OPIK-4380] [BE] Add evaluation_method support to experiment_aggregates pipeline - Add ClickHouse migration (000062) to add evaluation_method column to experiment_aggregates table - Add evaluationMethod field to ExperimentData record - Update GET_EXPERIMENT_DATA query to read evaluation_method from experiments - Update INSERT_EXPERIMENT_AGGREGATE to write evaluation_method to experiment_aggregates - Update SELECT_EXPERIMENT_BY_ID to read evaluation_method from experiment_aggregates - Fix Experiment record constructor call: insert EvaluationMethod at correct position (10) * [OPIK-4380] [BE] Extract shared helper for experiment aggregation queries Reduce copy-paste in getTraceAggregations, getSpanAggregations, and getFeedbackScoreAggregations by extracting queryExperimentAggregation, which centralises the context-aware execution, workspace/experiment/project parameter binding, and singleOrEmpty pattern shared by all three methods. * [OPIK-4380] [BE] Enforce non-null contract on countTotal criteria parameter Add @nonnull to ExperimentSearchCriteria in the interface and implementation so that a null argument fails fast with an explicit NullPointerException at the DAO boundary instead of crashing deep inside buildCountTemplate. * [OPIK-4380] [BE] Fix countTotal ignoring target project IDs in normal path target_project_ids was only applied inside the project_deleted LEFT JOIN subquery; the main WHERE had no project restriction, so counts were workspace-wide. Reuse has_target_projects in the main WHERE so project_id IN :target_project_ids always takes effect. Also replace manual null/empty checks with CollectionUtils.isNotEmpty. * [OPIK-4380] [BE] Apply Spotless formatting

@nonnull

* [OPIK-4380] [BE] Add experiment aggregates for denormalized metrics - Add experiment_aggregates and experiment_item_aggregates tables - Implement ExperimentAggregatesDAO with population and query methods - Add ExperimentAggregatesService for aggregation management - Refactor DTOs into organized model classes: - ExperimentAggregatesModel: aggregation results - ExperimentEntityData: entity models - ExperimentSourceData: raw source data - ExperimentAggregatesUtils: utilities - Add FEEDBACK_SCORES_AGGREGATED filter strategy for map-based filtering - Add comprehensive integration tests (10/10 passing) - Configure batch size and parallelism settings * [OPIK-4380] [BE] Add MySQL deadlock retry mechanism for concurrent dataset operations Problem: - MySQL deadlock on dataset_version_tags composite PRIMARY KEY (workspace_id, dataset_id, tag) - Occurred during parallel dataset creation in same workspace - Multiple threads inserting "latest" tag for different datasets caused lock contention - Experiments with parallel execution were failing with MySQLTransactionRollbackException Solution: - Add handleOnDeadLocks() method in RetryUtils with: - 5 retry attempts with exponential backoff (250ms to 2s) - 0.5 jitter to reduce thundering herd effect - Recursive isDatabaseDeadlock() detection for MySQLTransactionRollbackException - Apply retry logic in DatasetItemService.setDatasetItemVersion() - Enables concurrent dataset creation for same workspace Impact: - Supports parallel experiment execution with proper deadlock handling - Test success rate improved from 0/10 to 10/10 in ExperimentAggregatesIntegrationTest * Fix visibility * [OPIK-4380] [BE] Address PR review comments for experiment aggregates Fixed 11 automated review comments from baz-reviewer: CRITICAL fixes: - Prevent NPE on null span aggregations by adding coalesce() in SQL - Handle multi-project experiments with LIMIT 1 in GET_PROJECT_ID - Handle zero-item experiments with empty aggregation helpers - Bind feedback_scores_percentiles map instead of empty CAST HIGH priority fixes: - Use toDecimal128(12) instead of toDecimal64(9) for cost percentiles - Add null-safe tags handling with Optional.ofNullable() - Include exception objects in retry logging for stack traces MEDIUM priority fixes: - Add missing log_comment to SELECT_EXPERIMENT_BY_ID query - Add missing log_comment to GET_PROJECT_ID query LOW priority fixes: - Remove duplicate "id" binding in bindItemsParameters - Enhance batchSize config documentation with details All 11 integration tests passing. * [OPIK-4382] [BE] Refactor experiment aggregates with import cleanup and Optional patterns - Add missing imports for IntStream, ProjectStats, and other dependencies - Replace fully-qualified class names with proper imports across DAO and Service classes - Fix IS_NOT_EMPTY filter handling for FEEDBACK_SCORES_AGGREGATED strategies - Refactor null checks to use Optional in mapping methods: - mapFeedbackScoreAggregations, mapExperimentFromAggregates - mapFeedbackScoreData, mapExperimentGroupAggregationItem - Batch insert preparation with Optional chains - Improve code readability and maintainability with functional patterns * [OPIK-4380] [BE] Fix table definition * [OPIK-4380] [BE] Address PR comments and consolidate DatasetItemService methods - Fix tags NPE in ExperimentAggregatesDAO with defaultIfNull - Remove unnecessary FINAL clause from GET_EXPERIMENT_DATA query - Fix test naming in ExperimentAggregatesIntegrationTest - Consolidate 7 duplicate createVersionFromDelta methods into single canonical implementation - Remove debug logger from config-test.yml * [OPIK-4380] [BE] Fix missing log_comment and centralize search criteria binding - Fix SELECT_EXPERIMENT_BY_ID to properly render log_comment metadata - Use getSTWithLogComment pattern in getExperimentFromAggregates - Ensures ClickHouse query logging populates workspace/experiment IDs - Centralize ExperimentSearchCriteria binding logic - Create ExperimentSearchCriteriaBinder utility class - Parameterize filter strategies to support both DAO variants - Eliminate 29-line duplication between ExperimentDAO and ExperimentAggregatesDAO - Single source of truth prevents DAOs from getting out of sync * [OPIK-4380] [BE] Fix createVersionFromDelta consolidation after rebase - Update canonical method signature to include new parameters: - List<EvaluatorItem> evaluators - ExecutionPolicy executionPolicy - boolean clearExecutionPolicy - Update all 5 caller sites to pass new parameters: - Use changes.evaluators(), changes.executionPolicy(), changes.clearExecutionPolicy() when available - Pass null/false for auto-generated versions that inherit from base - Add imports for EvaluatorItem and ExecutionPolicy Fixes compilation errors introduced by rebase with upstream changes to DatasetVersionService * [OPIK-4380] [BE] Address PR review comments - fix type mismatch, extract constants, remove DAO logging - Fixed BigDecimal[] to Double[] conversion for experiment_scores (matches ClickHouse Float64) - Extracted FilterStrategy lists to static final constants to avoid repeated allocations - Added @nonnull validation to populateExperimentAggregate parameter - Removed DAO layer logging, keeping service-level logging only * [OPIK-4380] [BE] Extract shared helper for experiment data pagination Extract streamWithExperimentPagination() helper method to eliminate duplication in getTracesData(), getSpansData(), and getFeedbackScoresData(). All three methods followed identical pattern: - asyncTemplate.stream with connection - getSTWithLogComment with cursor flag - Bind workspace_id, experiment_id, project_id, limit - Optional cursor binding - Result mapping Benefits: - Single source of truth for pagination binding logic - Prevents divergence when tweaking cursor/limit bindings - Reduces code from ~20 lines to ~10 lines per method - Type-safe generic implementation Note: CTE redundancy (3x experiment_items scan) is intentional to avoid passing large trace ID lists as parameters, which would cause performance issues with 10K+ traces. * [OPIK-4380] [BE] Address PR review comments - fix TYPE_REFERENCE visibility, redundant IN subquery, hardcoded context keys, Instant.now in loop, and inline defaultIfNull * Revision 4: Address remaining JetoPistola review comments (#7, #8, #10) - #7: Remove "Used for testing and verification" from getExperimentFromAggregates javadoc - #8: Replace recursive flatMap with Mono.expand() in populateExperimentItemsInBatches - #10: Remove unrelated subscribeOn addition from DatasetItemService.createVersionFromDelta * Revision 3: Add switchIfEmpty fallback for deleted traces in populateExperimentAggregate * Fix tests * Revision 6: Move countTotal log from DAO to service layer Operational logs belong in the service layer, not the DAO. * Revision 7: Apply Spotless formatting * Revision 8: Make populateAggregations(UUID, int) private Removes the uncapped public batch size entry point. All callers now go through the public no-arg overload which reads batchSize safely from config. * [OPIK-4380] [BE] Add evaluation_method support to experiment_aggregates pipeline - Add ClickHouse migration (000062) to add evaluation_method column to experiment_aggregates table - Add evaluationMethod field to ExperimentData record - Update GET_EXPERIMENT_DATA query to read evaluation_method from experiments - Update INSERT_EXPERIMENT_AGGREGATE to write evaluation_method to experiment_aggregates - Update SELECT_EXPERIMENT_BY_ID to read evaluation_method from experiment_aggregates - Fix Experiment record constructor call: insert EvaluationMethod at correct position (10) * [OPIK-4380] [BE] Extract shared helper for experiment aggregation queries Reduce copy-paste in getTraceAggregations, getSpanAggregations, and getFeedbackScoreAggregations by extracting queryExperimentAggregation, which centralises the context-aware execution, workspace/experiment/project parameter binding, and singleOrEmpty pattern shared by all three methods. * [OPIK-4380] [BE] Enforce non-null contract on countTotal criteria parameter Add @nonnull to ExperimentSearchCriteria in the interface and implementation so that a null argument fails fast with an explicit NullPointerException at the DAO boundary instead of crashing deep inside buildCountTemplate. * [OPIK-4380] [BE] Fix countTotal ignoring target project IDs in normal path target_project_ids was only applied inside the project_deleted LEFT JOIN subquery; the main WHERE had no project restriction, so counts were workspace-wide. Reuse has_target_projects in the main WHERE so project_id IN :target_project_ids always takes effect. Also replace manual null/empty checks with CollectionUtils.isNotEmpty. * [OPIK-4380] [BE] Apply Spotless formatting * [OPIK-4382] [BE] Address PR review comments on experiment aggregates - Fix :versionId → :version_id parameter naming in SQL templates and bindings - Fix last_updated_at binding to use item.lastUpdatedAt() instead of Instant.now() - Fix FEEDBACK_SCORES_AGGREGATED_IS_EMPTY filter: embed generated SQL into templates instead of hard-coded static condition, and add missing bind calls - Fix RetryUtils log duplication (remove getMessage() + pass exception directly) - Add batchSize = 1000 default in ExperimentAggregatesConfig - Extract resolveVersionIdForCriteria helper to deduplicate version-id resolution - Add null/blank/ClickHouse placeholder guards in extractUuidsFromGroupValues - Extract loadEntityMap helper to deduplicate getEnrichInfoHolder enrichment logic * Revision 3: Address PR comments E, F, G, H - Fix E: Extract shared template/bind helpers in ExperimentAggregatesDAO - Fix F: Bind experiment_ids as UUID[] instead of String[] - Fix G+H: Extract getEnrichInfoHolder logic into ExperimentGroupEnricher, eliminating duplication between ExperimentService and ExperimentAggregatesService without introducing a direct dependency between them * Revision 4: Fix ExperimentServiceTest to include ExperimentGroupEnricher mock * [OPIK-4382] [BE] Extract shared Row→ExperimentGroup mappers into ExperimentGroupMappers Pull the duplicated Row→ExperimentGroupItem and Row→ExperimentGroupAggregationItem conversion logic from ExperimentDAO and ExperimentAggregatesDAO into a shared ExperimentGroupMappers utility class. Both DAOs now delegate to the same toExperimentGroupItem / toExperimentGroupAggregationItem helpers, eliminating the need to mirror DTO mapping changes in two places. * [OPIK-4382] [BE] Deduplicate bindGroupCriteria into ExperimentGroupMappers Moves the shared group-criteria binding logic out of ExperimentDAO and ExperimentAggregatesDAO into ExperimentGroupMappers.bindGroupCriteria(), following the same pattern as ExperimentSearchCriteriaBinder. Adding or fixing a criteria binding now only requires a change in one place. * [OPIK-4382] [BE] Extract streamGroupQuery helper and fix null percentiles - Deduplicate findGroups/findGroupsAggregations into a single private streamGroupQuery(queryTemplate, criteria, rowMapper) that differs only by the query constant and BiFunction row mapper. - Fix convertToBigDecimal to return null for null/unsupported inputs so absent p50/p90/p99 entries in getDuration propagate as null rather than BigDecimal.ZERO, preserving the semantic-null that lets callers apply COALESCE/fallback logic correctly. * [OPIK-4382] [BE] Consolidate cost/duration helpers into ExperimentGroupMappers Promote getCostValue and getDuration to public static in ExperimentGroupMappers and delete the private copies in ExperimentDAO. ExperimentDAO.mapToDto now delegates to the shared helpers, so any future change to cost filtering, duration percentile extraction, or the BigDecimal conversion only needs to be made in one place. Side-effect: ExperimentDAO.mapToDto also picks up the null-percentile fix (convertToBigDecimal returns null for absent/unsupported inputs) that was previously applied only to ExperimentGroupMappers. * [OPIK-4382] [BE] Fix pagination count and add criteria filter tests - Remove count() OVER () window function from paged query (returned page-scoped count instead of full result-set count) - Replace with dedicated count query + short-circuit: skip items query when count == 0, use DatasetItemPage.empty() for that case - Extract DatasetItemResultMapper.buildItemFromRow as public static helper reused by ExperimentAggregatesDAO - Add parameterized integration tests for ExperimentGroupCriteria filters (name, types, projectId, combined, empty-result) covering both findGroups and findGroupsAggregations aggregate paths * [OPIK-4380] [BE] Extract shared filter helpers into FilterQueryBuilder Add FilterStrategyParam record, applyFiltersToTemplate and bindFilters static helpers to FilterQueryBuilder, then replace duplicated per-strategy loops in DatasetItemVersionDAO and ExperimentAggregatesDAO with single delegating calls backed by per-DAO strategy constants. * [OPIK-4382] [BE] Consolidate filter helpers in getExperimentItemsStatsFromAggregates Add EXPERIMENT_ITEMS_STATS_FILTER_STRATEGY_PARAMS and EXPERIMENT_ITEMS_STATS_BIND_STRATEGIES constants and replace the per-strategy toAnalyticsDbFilters/bind blocks in getExperimentItemsStatsFromAggregates with single delegating calls to FilterQueryBuilder.applyFiltersToTemplate and bindFilters. * Revision 9: Extract shared helpers to eliminate duplication across DAOs - Create DatasetItemSearchCriteriaMapper: centralizes filters + search flag wiring for DatasetItemSearchCriteria, shared by DatasetItemVersionDAO and ExperimentAggregatesDAO - Add ExperimentGroupMappers.applyGroupCriteriaToTemplate: centralizes ExperimentGroupCriteria → ST template wiring, now shared by ExperimentDAO and ExperimentAggregatesDAO - Update DatasetItemVersionDAO.addFiltersToTemplate and bindSearchAndFilters to delegate to DatasetItemSearchCriteriaMapper - Update ExperimentAggregatesDAO.applyDatasetItemFiltersToTemplate and bindDatasetItemSearchParams to delegate to DatasetItemSearchCriteriaMapper - Update ExperimentDAO.newGroupTemplate and ExperimentAggregatesDAO.newGroupTemplate to delegate to ExperimentGroupMappers.applyGroupCriteriaToTemplate * [OPIK-4382] [BE] Address PR review comments - Fix convertToBigDecimal to return BigDecimal.ZERO for unknown Number types - Use import instead of inline FQN for java.time.Instant in ExperimentEntityData - Change EMPTY_ARRAY_STR visibility from public to private in ExperimentAggregatesDAO - Extract resolveLatestVersionId() to deduplicate version-id resolution in ExperimentAggregatesService - Extract shared test setup (AggregatesTestContext + setupAggregatesTestData) in ExperimentAggregatesIntegrationTest * [OPIK-4382] [BE] Address PR review: workspace scoping and thread-safe test collections - Wrap countTotal() in Mono.deferContextual for consistent workspace context handling - Use Collections.synchronizedList for shared lists mutated in parallel forEach * [OPIK-4382] [BE] Replace div.* with explicit column list in dataset_item_versions subqueries Avoids fetching unnecessary heavy columns (e.g. data, metadata) from dataset_item_versions when they are not needed by the outer query.

@nonnull

…regates recomputation (#5371) * [OPIK-4380] [BE] Add experiment aggregates for denormalized metrics - Add experiment_aggregates and experiment_item_aggregates tables - Implement ExperimentAggregatesDAO with population and query methods - Add ExperimentAggregatesService for aggregation management - Refactor DTOs into organized model classes: - ExperimentAggregatesModel: aggregation results - ExperimentEntityData: entity models - ExperimentSourceData: raw source data - ExperimentAggregatesUtils: utilities - Add FEEDBACK_SCORES_AGGREGATED filter strategy for map-based filtering - Add comprehensive integration tests (10/10 passing) - Configure batch size and parallelism settings * [OPIK-4380] [BE] Add MySQL deadlock retry mechanism for concurrent dataset operations Problem: - MySQL deadlock on dataset_version_tags composite PRIMARY KEY (workspace_id, dataset_id, tag) - Occurred during parallel dataset creation in same workspace - Multiple threads inserting "latest" tag for different datasets caused lock contention - Experiments with parallel execution were failing with MySQLTransactionRollbackException Solution: - Add handleOnDeadLocks() method in RetryUtils with: - 5 retry attempts with exponential backoff (250ms to 2s) - 0.5 jitter to reduce thundering herd effect - Recursive isDatabaseDeadlock() detection for MySQLTransactionRollbackException - Apply retry logic in DatasetItemService.setDatasetItemVersion() - Enables concurrent dataset creation for same workspace Impact: - Supports parallel experiment execution with proper deadlock handling - Test success rate improved from 0/10 to 10/10 in ExperimentAggregatesIntegrationTest * Fix visibility * [OPIK-4380] [BE] Address PR review comments for experiment aggregates Fixed 11 automated review comments from baz-reviewer: CRITICAL fixes: - Prevent NPE on null span aggregations by adding coalesce() in SQL - Handle multi-project experiments with LIMIT 1 in GET_PROJECT_ID - Handle zero-item experiments with empty aggregation helpers - Bind feedback_scores_percentiles map instead of empty CAST HIGH priority fixes: - Use toDecimal128(12) instead of toDecimal64(9) for cost percentiles - Add null-safe tags handling with Optional.ofNullable() - Include exception objects in retry logging for stack traces MEDIUM priority fixes: - Add missing log_comment to SELECT_EXPERIMENT_BY_ID query - Add missing log_comment to GET_PROJECT_ID query LOW priority fixes: - Remove duplicate "id" binding in bindItemsParameters - Enhance batchSize config documentation with details All 11 integration tests passing. * [OPIK-4382] [BE] Refactor experiment aggregates with import cleanup and Optional patterns - Add missing imports for IntStream, ProjectStats, and other dependencies - Replace fully-qualified class names with proper imports across DAO and Service classes - Fix IS_NOT_EMPTY filter handling for FEEDBACK_SCORES_AGGREGATED strategies - Refactor null checks to use Optional in mapping methods: - mapFeedbackScoreAggregations, mapExperimentFromAggregates - mapFeedbackScoreData, mapExperimentGroupAggregationItem - Batch insert preparation with Optional chains - Improve code readability and maintainability with functional patterns * [OPIK-4380] [BE] Fix table definition * [OPIK-4380] [BE] Address PR comments and consolidate DatasetItemService methods - Fix tags NPE in ExperimentAggregatesDAO with defaultIfNull - Remove unnecessary FINAL clause from GET_EXPERIMENT_DATA query - Fix test naming in ExperimentAggregatesIntegrationTest - Consolidate 7 duplicate createVersionFromDelta methods into single canonical implementation - Remove debug logger from config-test.yml * [OPIK-4380] [BE] Fix missing log_comment and centralize search criteria binding - Fix SELECT_EXPERIMENT_BY_ID to properly render log_comment metadata - Use getSTWithLogComment pattern in getExperimentFromAggregates - Ensures ClickHouse query logging populates workspace/experiment IDs - Centralize ExperimentSearchCriteria binding logic - Create ExperimentSearchCriteriaBinder utility class - Parameterize filter strategies to support both DAO variants - Eliminate 29-line duplication between ExperimentDAO and ExperimentAggregatesDAO - Single source of truth prevents DAOs from getting out of sync * [OPIK-4380] [BE] Fix createVersionFromDelta consolidation after rebase - Update canonical method signature to include new parameters: - List<EvaluatorItem> evaluators - ExecutionPolicy executionPolicy - boolean clearExecutionPolicy - Update all 5 caller sites to pass new parameters: - Use changes.evaluators(), changes.executionPolicy(), changes.clearExecutionPolicy() when available - Pass null/false for auto-generated versions that inherit from base - Add imports for EvaluatorItem and ExecutionPolicy Fixes compilation errors introduced by rebase with upstream changes to DatasetVersionService * [OPIK-4380] [BE] Address PR review comments - fix type mismatch, extract constants, remove DAO logging - Fixed BigDecimal[] to Double[] conversion for experiment_scores (matches ClickHouse Float64) - Extracted FilterStrategy lists to static final constants to avoid repeated allocations - Added @nonnull validation to populateExperimentAggregate parameter - Removed DAO layer logging, keeping service-level logging only * [OPIK-4380] [BE] Extract shared helper for experiment data pagination Extract streamWithExperimentPagination() helper method to eliminate duplication in getTracesData(), getSpansData(), and getFeedbackScoresData(). All three methods followed identical pattern: - asyncTemplate.stream with connection - getSTWithLogComment with cursor flag - Bind workspace_id, experiment_id, project_id, limit - Optional cursor binding - Result mapping Benefits: - Single source of truth for pagination binding logic - Prevents divergence when tweaking cursor/limit bindings - Reduces code from ~20 lines to ~10 lines per method - Type-safe generic implementation Note: CTE redundancy (3x experiment_items scan) is intentional to avoid passing large trace ID lists as parameters, which would cause performance issues with 10K+ traces. * [OPIK-4383] [BE] Add Redis stream subscriber for debounced experiment aggregates recomputation - Add ExperimentDenormalizationConfig implementing StreamConfiguration with debounce, job lock, and per-experiment aggregation lock settings - Add ExperimentAggregationMessage as stream message record - Add ExperimentAggregatesSubscriber consuming from the denormalization stream; acquires a workspace-scoped distributed lock per experiment before calling populateAggregations() - Add experimentDenormalizationEnabled feature flag to ServiceTogglesConfig and FeatureFlags - Wire ExperimentDenormalizationConfig into OpikConfiguration - Update config.yml and config-test.yml with full experimentDenormalization block (enabled for tests) - Add ExperimentAggregatesSubscriberTest covering lifecycle gating and processEvent success/error paths * Revision 2: Address PR comments - add config defaults, remove toggle, rename tests - ExperimentDenormalizationConfig: add sensible defaults to all fields so Dropwizard validation doesn't fail when the config block is absent from old deployments (config.isEnabled()=false still gates the subscriber) - Remove experimentDenormalizationEnabled service toggle from ServiceTogglesConfig, FeatureFlags, config.yml and config-test.yml - the infrastructure gate (config.isEnabled()) is the single control point - Rename lifecycle test methods to camelCase per project conventions: startSkipsStartupWhenDisabled / stopSkipsShutdownWhenDisabled * Revision 3: Add @max(500) to consumerBatchSize and @NotNull to jobLockWaitTime * [OPIK-4380] [BE] Address PR review comments - fix TYPE_REFERENCE visibility, redundant IN subquery, hardcoded context keys, Instant.now in loop, and inline defaultIfNull * Revision 4: Address remaining JetoPistola review comments (#7, #8, #10) - #7: Remove "Used for testing and verification" from getExperimentFromAggregates javadoc - #8: Replace recursive flatMap with Mono.expand() in populateExperimentItemsInBatches - #10: Remove unrelated subscribeOn addition from DatasetItemService.createVersionFromDelta * Revision 3: Add switchIfEmpty fallback for deleted traces in populateExperimentAggregate * Fix tests * Revision 6: Move countTotal log from DAO to service layer Operational logs belong in the service layer, not the DAO. * Revision 7: Apply Spotless formatting * Revision 8: Make populateAggregations(UUID, int) private Removes the uncapped public batch size entry point. All callers now go through the public no-arg overload which reads batchSize safely from config. * [OPIK-4380] [BE] Add evaluation_method support to experiment_aggregates pipeline - Add ClickHouse migration (000062) to add evaluation_method column to experiment_aggregates table - Add evaluationMethod field to ExperimentData record - Update GET_EXPERIMENT_DATA query to read evaluation_method from experiments - Update INSERT_EXPERIMENT_AGGREGATE to write evaluation_method to experiment_aggregates - Update SELECT_EXPERIMENT_BY_ID to read evaluation_method from experiment_aggregates - Fix Experiment record constructor call: insert EvaluationMethod at correct position (10) * [OPIK-4380] [BE] Extract shared helper for experiment aggregation queries Reduce copy-paste in getTraceAggregations, getSpanAggregations, and getFeedbackScoreAggregations by extracting queryExperimentAggregation, which centralises the context-aware execution, workspace/experiment/project parameter binding, and singleOrEmpty pattern shared by all three methods. * [OPIK-4380] [BE] Enforce non-null contract on countTotal criteria parameter Add @nonnull to ExperimentSearchCriteria in the interface and implementation so that a null argument fails fast with an explicit NullPointerException at the DAO boundary instead of crashing deep inside buildCountTemplate. * [OPIK-4380] [BE] Fix countTotal ignoring target project IDs in normal path target_project_ids was only applied inside the project_deleted LEFT JOIN subquery; the main WHERE had no project restriction, so counts were workspace-wide. Reuse has_target_projects in the main WHERE so project_id IN :target_project_ids always takes effect. Also replace manual null/empty checks with CollectionUtils.isNotEmpty. * [OPIK-4380] [BE] Apply Spotless formatting * [OPIK-4382] [BE] Address PR review comments on experiment aggregates - Fix :versionId → :version_id parameter naming in SQL templates and bindings - Fix last_updated_at binding to use item.lastUpdatedAt() instead of Instant.now() - Fix FEEDBACK_SCORES_AGGREGATED_IS_EMPTY filter: embed generated SQL into templates instead of hard-coded static condition, and add missing bind calls - Fix RetryUtils log duplication (remove getMessage() + pass exception directly) - Add batchSize = 1000 default in ExperimentAggregatesConfig - Extract resolveVersionIdForCriteria helper to deduplicate version-id resolution - Add null/blank/ClickHouse placeholder guards in extractUuidsFromGroupValues - Extract loadEntityMap helper to deduplicate getEnrichInfoHolder enrichment logic * Revision 3: Address PR comments E, F, G, H - Fix E: Extract shared template/bind helpers in ExperimentAggregatesDAO - Fix F: Bind experiment_ids as UUID[] instead of String[] - Fix G+H: Extract getEnrichInfoHolder logic into ExperimentGroupEnricher, eliminating duplication between ExperimentService and ExperimentAggregatesService without introducing a direct dependency between them * Revision 4: Fix ExperimentServiceTest to include ExperimentGroupEnricher mock * [OPIK-4382] [BE] Extract shared Row→ExperimentGroup mappers into ExperimentGroupMappers Pull the duplicated Row→ExperimentGroupItem and Row→ExperimentGroupAggregationItem conversion logic from ExperimentDAO and ExperimentAggregatesDAO into a shared ExperimentGroupMappers utility class. Both DAOs now delegate to the same toExperimentGroupItem / toExperimentGroupAggregationItem helpers, eliminating the need to mirror DTO mapping changes in two places. * [OPIK-4382] [BE] Deduplicate bindGroupCriteria into ExperimentGroupMappers Moves the shared group-criteria binding logic out of ExperimentDAO and ExperimentAggregatesDAO into ExperimentGroupMappers.bindGroupCriteria(), following the same pattern as ExperimentSearchCriteriaBinder. Adding or fixing a criteria binding now only requires a change in one place. * [OPIK-4382] [BE] Extract streamGroupQuery helper and fix null percentiles - Deduplicate findGroups/findGroupsAggregations into a single private streamGroupQuery(queryTemplate, criteria, rowMapper) that differs only by the query constant and BiFunction row mapper. - Fix convertToBigDecimal to return null for null/unsupported inputs so absent p50/p90/p99 entries in getDuration propagate as null rather than BigDecimal.ZERO, preserving the semantic-null that lets callers apply COALESCE/fallback logic correctly. * [OPIK-4382] [BE] Consolidate cost/duration helpers into ExperimentGroupMappers Promote getCostValue and getDuration to public static in ExperimentGroupMappers and delete the private copies in ExperimentDAO. ExperimentDAO.mapToDto now delegates to the shared helpers, so any future change to cost filtering, duration percentile extraction, or the BigDecimal conversion only needs to be made in one place. Side-effect: ExperimentDAO.mapToDto also picks up the null-percentile fix (convertToBigDecimal returns null for absent/unsupported inputs) that was previously applied only to ExperimentGroupMappers. * [OPIK-4382] [BE] Fix pagination count and add criteria filter tests - Remove count() OVER () window function from paged query (returned page-scoped count instead of full result-set count) - Replace with dedicated count query + short-circuit: skip items query when count == 0, use DatasetItemPage.empty() for that case - Extract DatasetItemResultMapper.buildItemFromRow as public static helper reused by ExperimentAggregatesDAO - Add parameterized integration tests for ExperimentGroupCriteria filters (name, types, projectId, combined, empty-result) covering both findGroups and findGroupsAggregations aggregate paths * [OPIK-4380] [BE] Extract shared filter helpers into FilterQueryBuilder Add FilterStrategyParam record, applyFiltersToTemplate and bindFilters static helpers to FilterQueryBuilder, then replace duplicated per-strategy loops in DatasetItemVersionDAO and ExperimentAggregatesDAO with single delegating calls backed by per-DAO strategy constants. * [OPIK-4382] [BE] Consolidate filter helpers in getExperimentItemsStatsFromAggregates Add EXPERIMENT_ITEMS_STATS_FILTER_STRATEGY_PARAMS and EXPERIMENT_ITEMS_STATS_BIND_STRATEGIES constants and replace the per-strategy toAnalyticsDbFilters/bind blocks in getExperimentItemsStatsFromAggregates with single delegating calls to FilterQueryBuilder.applyFiltersToTemplate and bindFilters. * Revision 9: Extract shared helpers to eliminate duplication across DAOs - Create DatasetItemSearchCriteriaMapper: centralizes filters + search flag wiring for DatasetItemSearchCriteria, shared by DatasetItemVersionDAO and ExperimentAggregatesDAO - Add ExperimentGroupMappers.applyGroupCriteriaToTemplate: centralizes ExperimentGroupCriteria → ST template wiring, now shared by ExperimentDAO and ExperimentAggregatesDAO - Update DatasetItemVersionDAO.addFiltersToTemplate and bindSearchAndFilters to delegate to DatasetItemSearchCriteriaMapper - Update ExperimentAggregatesDAO.applyDatasetItemFiltersToTemplate and bindDatasetItemSearchParams to delegate to DatasetItemSearchCriteriaMapper - Update ExperimentDAO.newGroupTemplate and ExperimentAggregatesDAO.newGroupTemplate to delegate to ExperimentGroupMappers.applyGroupCriteriaToTemplate * [OPIK-4383] [BE] Fix project_deleted filter and comments_dedup scope in ExperimentAggregatesDAO - Fix project_deleted filter: use zero UUID sentinel instead of empty string for FixedString(36) column comparison in FIND_GROUPS and FIND_GROUPS_AGGREGATIONS - Fix comments_dedup CTE: scope trace_id subquery by dataset_id to avoid scanning the entire workspace's comments table - Add missing streamMaxLen and streamTrimLimit fields to ExperimentDenormalizationConfig (implements StreamConfiguration interface) * [OPIK-4383] [BE] Address PR review comments: extract ZERO_UUID constant and fix config comment - Promote zero UUID sentinel to shared constant in ExperimentGroupMappers - Use parameterized :zero_uuid binding in SQL templates instead of hardcoded string - Fix config.yml comment from "Default: 120s" to "Default: 1m" * [OPIK-4383] [BE] Add streamMaxLen and streamTrimLimit to experimentDenormalization config

@nonnull

…regates recomputation (#5371) * [OPIK-4380] [BE] Add experiment aggregates for denormalized metrics - Add experiment_aggregates and experiment_item_aggregates tables - Implement ExperimentAggregatesDAO with population and query methods - Add ExperimentAggregatesService for aggregation management - Refactor DTOs into organized model classes: - ExperimentAggregatesModel: aggregation results - ExperimentEntityData: entity models - ExperimentSourceData: raw source data - ExperimentAggregatesUtils: utilities - Add FEEDBACK_SCORES_AGGREGATED filter strategy for map-based filtering - Add comprehensive integration tests (10/10 passing) - Configure batch size and parallelism settings * [OPIK-4380] [BE] Add MySQL deadlock retry mechanism for concurrent dataset operations Problem: - MySQL deadlock on dataset_version_tags composite PRIMARY KEY (workspace_id, dataset_id, tag) - Occurred during parallel dataset creation in same workspace - Multiple threads inserting "latest" tag for different datasets caused lock contention - Experiments with parallel execution were failing with MySQLTransactionRollbackException Solution: - Add handleOnDeadLocks() method in RetryUtils with: - 5 retry attempts with exponential backoff (250ms to 2s) - 0.5 jitter to reduce thundering herd effect - Recursive isDatabaseDeadlock() detection for MySQLTransactionRollbackException - Apply retry logic in DatasetItemService.setDatasetItemVersion() - Enables concurrent dataset creation for same workspace Impact: - Supports parallel experiment execution with proper deadlock handling - Test success rate improved from 0/10 to 10/10 in ExperimentAggregatesIntegrationTest * Fix visibility * [OPIK-4380] [BE] Address PR review comments for experiment aggregates Fixed 11 automated review comments from baz-reviewer: CRITICAL fixes: - Prevent NPE on null span aggregations by adding coalesce() in SQL - Handle multi-project experiments with LIMIT 1 in GET_PROJECT_ID - Handle zero-item experiments with empty aggregation helpers - Bind feedback_scores_percentiles map instead of empty CAST HIGH priority fixes: - Use toDecimal128(12) instead of toDecimal64(9) for cost percentiles - Add null-safe tags handling with Optional.ofNullable() - Include exception objects in retry logging for stack traces MEDIUM priority fixes: - Add missing log_comment to SELECT_EXPERIMENT_BY_ID query - Add missing log_comment to GET_PROJECT_ID query LOW priority fixes: - Remove duplicate "id" binding in bindItemsParameters - Enhance batchSize config documentation with details All 11 integration tests passing. * [OPIK-4382] [BE] Refactor experiment aggregates with import cleanup and Optional patterns - Add missing imports for IntStream, ProjectStats, and other dependencies - Replace fully-qualified class names with proper imports across DAO and Service classes - Fix IS_NOT_EMPTY filter handling for FEEDBACK_SCORES_AGGREGATED strategies - Refactor null checks to use Optional in mapping methods: - mapFeedbackScoreAggregations, mapExperimentFromAggregates - mapFeedbackScoreData, mapExperimentGroupAggregationItem - Batch insert preparation with Optional chains - Improve code readability and maintainability with functional patterns * [OPIK-4380] [BE] Fix table definition * [OPIK-4380] [BE] Address PR comments and consolidate DatasetItemService methods - Fix tags NPE in ExperimentAggregatesDAO with defaultIfNull - Remove unnecessary FINAL clause from GET_EXPERIMENT_DATA query - Fix test naming in ExperimentAggregatesIntegrationTest - Consolidate 7 duplicate createVersionFromDelta methods into single canonical implementation - Remove debug logger from config-test.yml * [OPIK-4380] [BE] Fix missing log_comment and centralize search criteria binding - Fix SELECT_EXPERIMENT_BY_ID to properly render log_comment metadata - Use getSTWithLogComment pattern in getExperimentFromAggregates - Ensures ClickHouse query logging populates workspace/experiment IDs - Centralize ExperimentSearchCriteria binding logic - Create ExperimentSearchCriteriaBinder utility class - Parameterize filter strategies to support both DAO variants - Eliminate 29-line duplication between ExperimentDAO and ExperimentAggregatesDAO - Single source of truth prevents DAOs from getting out of sync * [OPIK-4380] [BE] Fix createVersionFromDelta consolidation after rebase - Update canonical method signature to include new parameters: - List<EvaluatorItem> evaluators - ExecutionPolicy executionPolicy - boolean clearExecutionPolicy - Update all 5 caller sites to pass new parameters: - Use changes.evaluators(), changes.executionPolicy(), changes.clearExecutionPolicy() when available - Pass null/false for auto-generated versions that inherit from base - Add imports for EvaluatorItem and ExecutionPolicy Fixes compilation errors introduced by rebase with upstream changes to DatasetVersionService * [OPIK-4380] [BE] Address PR review comments - fix type mismatch, extract constants, remove DAO logging - Fixed BigDecimal[] to Double[] conversion for experiment_scores (matches ClickHouse Float64) - Extracted FilterStrategy lists to static final constants to avoid repeated allocations - Added @nonnull validation to populateExperimentAggregate parameter - Removed DAO layer logging, keeping service-level logging only * [OPIK-4380] [BE] Extract shared helper for experiment data pagination Extract streamWithExperimentPagination() helper method to eliminate duplication in getTracesData(), getSpansData(), and getFeedbackScoresData(). All three methods followed identical pattern: - asyncTemplate.stream with connection - getSTWithLogComment with cursor flag - Bind workspace_id, experiment_id, project_id, limit - Optional cursor binding - Result mapping Benefits: - Single source of truth for pagination binding logic - Prevents divergence when tweaking cursor/limit bindings - Reduces code from ~20 lines to ~10 lines per method - Type-safe generic implementation Note: CTE redundancy (3x experiment_items scan) is intentional to avoid passing large trace ID lists as parameters, which would cause performance issues with 10K+ traces. * [OPIK-4383] [BE] Add Redis stream subscriber for debounced experiment aggregates recomputation - Add ExperimentDenormalizationConfig implementing StreamConfiguration with debounce, job lock, and per-experiment aggregation lock settings - Add ExperimentAggregationMessage as stream message record - Add ExperimentAggregatesSubscriber consuming from the denormalization stream; acquires a workspace-scoped distributed lock per experiment before calling populateAggregations() - Add experimentDenormalizationEnabled feature flag to ServiceTogglesConfig and FeatureFlags - Wire ExperimentDenormalizationConfig into OpikConfiguration - Update config.yml and config-test.yml with full experimentDenormalization block (enabled for tests) - Add ExperimentAggregatesSubscriberTest covering lifecycle gating and processEvent success/error paths * Revision 2: Address PR comments - add config defaults, remove toggle, rename tests - ExperimentDenormalizationConfig: add sensible defaults to all fields so Dropwizard validation doesn't fail when the config block is absent from old deployments (config.isEnabled()=false still gates the subscriber) - Remove experimentDenormalizationEnabled service toggle from ServiceTogglesConfig, FeatureFlags, config.yml and config-test.yml - the infrastructure gate (config.isEnabled()) is the single control point - Rename lifecycle test methods to camelCase per project conventions: startSkipsStartupWhenDisabled / stopSkipsShutdownWhenDisabled * Revision 3: Add @max(500) to consumerBatchSize and @NotNull to jobLockWaitTime * [OPIK-4380] [BE] Address PR review comments - fix TYPE_REFERENCE visibility, redundant IN subquery, hardcoded context keys, Instant.now in loop, and inline defaultIfNull * Revision 4: Address remaining JetoPistola review comments (#7, #8, #10) - #7: Remove "Used for testing and verification" from getExperimentFromAggregates javadoc - #8: Replace recursive flatMap with Mono.expand() in populateExperimentItemsInBatches - #10: Remove unrelated subscribeOn addition from DatasetItemService.createVersionFromDelta * Revision 3: Add switchIfEmpty fallback for deleted traces in populateExperimentAggregate * Fix tests * Revision 6: Move countTotal log from DAO to service layer Operational logs belong in the service layer, not the DAO. * Revision 7: Apply Spotless formatting * Revision 8: Make populateAggregations(UUID, int) private Removes the uncapped public batch size entry point. All callers now go through the public no-arg overload which reads batchSize safely from config. * [OPIK-4380] [BE] Add evaluation_method support to experiment_aggregates pipeline - Add ClickHouse migration (000062) to add evaluation_method column to experiment_aggregates table - Add evaluationMethod field to ExperimentData record - Update GET_EXPERIMENT_DATA query to read evaluation_method from experiments - Update INSERT_EXPERIMENT_AGGREGATE to write evaluation_method to experiment_aggregates - Update SELECT_EXPERIMENT_BY_ID to read evaluation_method from experiment_aggregates - Fix Experiment record constructor call: insert EvaluationMethod at correct position (10) * [OPIK-4380] [BE] Extract shared helper for experiment aggregation queries Reduce copy-paste in getTraceAggregations, getSpanAggregations, and getFeedbackScoreAggregations by extracting queryExperimentAggregation, which centralises the context-aware execution, workspace/experiment/project parameter binding, and singleOrEmpty pattern shared by all three methods. * [OPIK-4380] [BE] Enforce non-null contract on countTotal criteria parameter Add @nonnull to ExperimentSearchCriteria in the interface and implementation so that a null argument fails fast with an explicit NullPointerException at the DAO boundary instead of crashing deep inside buildCountTemplate. * [OPIK-4380] [BE] Fix countTotal ignoring target project IDs in normal path target_project_ids was only applied inside the project_deleted LEFT JOIN subquery; the main WHERE had no project restriction, so counts were workspace-wide. Reuse has_target_projects in the main WHERE so project_id IN :target_project_ids always takes effect. Also replace manual null/empty checks with CollectionUtils.isNotEmpty. * [OPIK-4380] [BE] Apply Spotless formatting * [OPIK-4382] [BE] Address PR review comments on experiment aggregates - Fix :versionId → :version_id parameter naming in SQL templates and bindings - Fix last_updated_at binding to use item.lastUpdatedAt() instead of Instant.now() - Fix FEEDBACK_SCORES_AGGREGATED_IS_EMPTY filter: embed generated SQL into templates instead of hard-coded static condition, and add missing bind calls - Fix RetryUtils log duplication (remove getMessage() + pass exception directly) - Add batchSize = 1000 default in ExperimentAggregatesConfig - Extract resolveVersionIdForCriteria helper to deduplicate version-id resolution - Add null/blank/ClickHouse placeholder guards in extractUuidsFromGroupValues - Extract loadEntityMap helper to deduplicate getEnrichInfoHolder enrichment logic * Revision 3: Address PR comments E, F, G, H - Fix E: Extract shared template/bind helpers in ExperimentAggregatesDAO - Fix F: Bind experiment_ids as UUID[] instead of String[] - Fix G+H: Extract getEnrichInfoHolder logic into ExperimentGroupEnricher, eliminating duplication between ExperimentService and ExperimentAggregatesService without introducing a direct dependency between them * Revision 4: Fix ExperimentServiceTest to include ExperimentGroupEnricher mock * [OPIK-4382] [BE] Extract shared Row→ExperimentGroup mappers into ExperimentGroupMappers Pull the duplicated Row→ExperimentGroupItem and Row→ExperimentGroupAggregationItem conversion logic from ExperimentDAO and ExperimentAggregatesDAO into a shared ExperimentGroupMappers utility class. Both DAOs now delegate to the same toExperimentGroupItem / toExperimentGroupAggregationItem helpers, eliminating the need to mirror DTO mapping changes in two places. * [OPIK-4382] [BE] Deduplicate bindGroupCriteria into ExperimentGroupMappers Moves the shared group-criteria binding logic out of ExperimentDAO and ExperimentAggregatesDAO into ExperimentGroupMappers.bindGroupCriteria(), following the same pattern as ExperimentSearchCriteriaBinder. Adding or fixing a criteria binding now only requires a change in one place. * [OPIK-4382] [BE] Extract streamGroupQuery helper and fix null percentiles - Deduplicate findGroups/findGroupsAggregations into a single private streamGroupQuery(queryTemplate, criteria, rowMapper) that differs only by the query constant and BiFunction row mapper. - Fix convertToBigDecimal to return null for null/unsupported inputs so absent p50/p90/p99 entries in getDuration propagate as null rather than BigDecimal.ZERO, preserving the semantic-null that lets callers apply COALESCE/fallback logic correctly. * [OPIK-4382] [BE] Consolidate cost/duration helpers into ExperimentGroupMappers Promote getCostValue and getDuration to public static in ExperimentGroupMappers and delete the private copies in ExperimentDAO. ExperimentDAO.mapToDto now delegates to the shared helpers, so any future change to cost filtering, duration percentile extraction, or the BigDecimal conversion only needs to be made in one place. Side-effect: ExperimentDAO.mapToDto also picks up the null-percentile fix (convertToBigDecimal returns null for absent/unsupported inputs) that was previously applied only to ExperimentGroupMappers. * [OPIK-4382] [BE] Fix pagination count and add criteria filter tests - Remove count() OVER () window function from paged query (returned page-scoped count instead of full result-set count) - Replace with dedicated count query + short-circuit: skip items query when count == 0, use DatasetItemPage.empty() for that case - Extract DatasetItemResultMapper.buildItemFromRow as public static helper reused by ExperimentAggregatesDAO - Add parameterized integration tests for ExperimentGroupCriteria filters (name, types, projectId, combined, empty-result) covering both findGroups and findGroupsAggregations aggregate paths * [OPIK-4380] [BE] Extract shared filter helpers into FilterQueryBuilder Add FilterStrategyParam record, applyFiltersToTemplate and bindFilters static helpers to FilterQueryBuilder, then replace duplicated per-strategy loops in DatasetItemVersionDAO and ExperimentAggregatesDAO with single delegating calls backed by per-DAO strategy constants. * [OPIK-4382] [BE] Consolidate filter helpers in getExperimentItemsStatsFromAggregates Add EXPERIMENT_ITEMS_STATS_FILTER_STRATEGY_PARAMS and EXPERIMENT_ITEMS_STATS_BIND_STRATEGIES constants and replace the per-strategy toAnalyticsDbFilters/bind blocks in getExperimentItemsStatsFromAggregates with single delegating calls to FilterQueryBuilder.applyFiltersToTemplate and bindFilters. * Revision 9: Extract shared helpers to eliminate duplication across DAOs - Create DatasetItemSearchCriteriaMapper: centralizes filters + search flag wiring for DatasetItemSearchCriteria, shared by DatasetItemVersionDAO and ExperimentAggregatesDAO - Add ExperimentGroupMappers.applyGroupCriteriaToTemplate: centralizes ExperimentGroupCriteria → ST template wiring, now shared by ExperimentDAO and ExperimentAggregatesDAO - Update DatasetItemVersionDAO.addFiltersToTemplate and bindSearchAndFilters to delegate to DatasetItemSearchCriteriaMapper - Update ExperimentAggregatesDAO.applyDatasetItemFiltersToTemplate and bindDatasetItemSearchParams to delegate to DatasetItemSearchCriteriaMapper - Update ExperimentDAO.newGroupTemplate and ExperimentAggregatesDAO.newGroupTemplate to delegate to ExperimentGroupMappers.applyGroupCriteriaToTemplate * [OPIK-4383] [BE] Fix project_deleted filter and comments_dedup scope in ExperimentAggregatesDAO - Fix project_deleted filter: use zero UUID sentinel instead of empty string for FixedString(36) column comparison in FIND_GROUPS and FIND_GROUPS_AGGREGATIONS - Fix comments_dedup CTE: scope trace_id subquery by dataset_id to avoid scanning the entire workspace's comments table - Add missing streamMaxLen and streamTrimLimit fields to ExperimentDenormalizationConfig (implements StreamConfiguration interface) * [OPIK-4383] [BE] Address PR review comments: extract ZERO_UUID constant and fix config comment - Promote zero UUID sentinel to shared constant in ExperimentGroupMappers - Use parameterized :zero_uuid binding in SQL templates instead of hardcoded string - Fix config.yml comment from "Default: 120s" to "Default: 1m" * [OPIK-4383] [BE] Add streamMaxLen and streamTrimLimit to experimentDenormalization config

@nonnull

…blisher (#5510) * [OPIK-4380] [BE] Add experiment aggregates for denormalized metrics - Add experiment_aggregates and experiment_item_aggregates tables - Implement ExperimentAggregatesDAO with population and query methods - Add ExperimentAggregatesService for aggregation management - Refactor DTOs into organized model classes: - ExperimentAggregatesModel: aggregation results - ExperimentEntityData: entity models - ExperimentSourceData: raw source data - ExperimentAggregatesUtils: utilities - Add FEEDBACK_SCORES_AGGREGATED filter strategy for map-based filtering - Add comprehensive integration tests (10/10 passing) - Configure batch size and parallelism settings * [OPIK-4380] [BE] Add MySQL deadlock retry mechanism for concurrent dataset operations Problem: - MySQL deadlock on dataset_version_tags composite PRIMARY KEY (workspace_id, dataset_id, tag) - Occurred during parallel dataset creation in same workspace - Multiple threads inserting "latest" tag for different datasets caused lock contention - Experiments with parallel execution were failing with MySQLTransactionRollbackException Solution: - Add handleOnDeadLocks() method in RetryUtils with: - 5 retry attempts with exponential backoff (250ms to 2s) - 0.5 jitter to reduce thundering herd effect - Recursive isDatabaseDeadlock() detection for MySQLTransactionRollbackException - Apply retry logic in DatasetItemService.setDatasetItemVersion() - Enables concurrent dataset creation for same workspace Impact: - Supports parallel experiment execution with proper deadlock handling - Test success rate improved from 0/10 to 10/10 in ExperimentAggregatesIntegrationTest * Fix visibility * [OPIK-4380] [BE] Address PR review comments for experiment aggregates Fixed 11 automated review comments from baz-reviewer: CRITICAL fixes: - Prevent NPE on null span aggregations by adding coalesce() in SQL - Handle multi-project experiments with LIMIT 1 in GET_PROJECT_ID - Handle zero-item experiments with empty aggregation helpers - Bind feedback_scores_percentiles map instead of empty CAST HIGH priority fixes: - Use toDecimal128(12) instead of toDecimal64(9) for cost percentiles - Add null-safe tags handling with Optional.ofNullable() - Include exception objects in retry logging for stack traces MEDIUM priority fixes: - Add missing log_comment to SELECT_EXPERIMENT_BY_ID query - Add missing log_comment to GET_PROJECT_ID query LOW priority fixes: - Remove duplicate "id" binding in bindItemsParameters - Enhance batchSize config documentation with details All 11 integration tests passing. * [OPIK-4382] [BE] Refactor experiment aggregates with import cleanup and Optional patterns - Add missing imports for IntStream, ProjectStats, and other dependencies - Replace fully-qualified class names with proper imports across DAO and Service classes - Fix IS_NOT_EMPTY filter handling for FEEDBACK_SCORES_AGGREGATED strategies - Refactor null checks to use Optional in mapping methods: - mapFeedbackScoreAggregations, mapExperimentFromAggregates - mapFeedbackScoreData, mapExperimentGroupAggregationItem - Batch insert preparation with Optional chains - Improve code readability and maintainability with functional patterns * [OPIK-4380] [BE] Fix table definition * [OPIK-4380] [BE] Address PR comments and consolidate DatasetItemService methods - Fix tags NPE in ExperimentAggregatesDAO with defaultIfNull - Remove unnecessary FINAL clause from GET_EXPERIMENT_DATA query - Fix test naming in ExperimentAggregatesIntegrationTest - Consolidate 7 duplicate createVersionFromDelta methods into single canonical implementation - Remove debug logger from config-test.yml * [OPIK-4380] [BE] Fix missing log_comment and centralize search criteria binding - Fix SELECT_EXPERIMENT_BY_ID to properly render log_comment metadata - Use getSTWithLogComment pattern in getExperimentFromAggregates - Ensures ClickHouse query logging populates workspace/experiment IDs - Centralize ExperimentSearchCriteria binding logic - Create ExperimentSearchCriteriaBinder utility class - Parameterize filter strategies to support both DAO variants - Eliminate 29-line duplication between ExperimentDAO and ExperimentAggregatesDAO - Single source of truth prevents DAOs from getting out of sync * [OPIK-4380] [BE] Fix createVersionFromDelta consolidation after rebase - Update canonical method signature to include new parameters: - List<EvaluatorItem> evaluators - ExecutionPolicy executionPolicy - boolean clearExecutionPolicy - Update all 5 caller sites to pass new parameters: - Use changes.evaluators(), changes.executionPolicy(), changes.clearExecutionPolicy() when available - Pass null/false for auto-generated versions that inherit from base - Add imports for EvaluatorItem and ExecutionPolicy Fixes compilation errors introduced by rebase with upstream changes to DatasetVersionService * [OPIK-4380] [BE] Address PR review comments - fix type mismatch, extract constants, remove DAO logging - Fixed BigDecimal[] to Double[] conversion for experiment_scores (matches ClickHouse Float64) - Extracted FilterStrategy lists to static final constants to avoid repeated allocations - Added @nonnull validation to populateExperimentAggregate parameter - Removed DAO layer logging, keeping service-level logging only * [OPIK-4380] [BE] Extract shared helper for experiment data pagination Extract streamWithExperimentPagination() helper method to eliminate duplication in getTracesData(), getSpansData(), and getFeedbackScoresData(). All three methods followed identical pattern: - asyncTemplate.stream with connection - getSTWithLogComment with cursor flag - Bind workspace_id, experiment_id, project_id, limit - Optional cursor binding - Result mapping Benefits: - Single source of truth for pagination binding logic - Prevents divergence when tweaking cursor/limit bindings - Reduces code from ~20 lines to ~10 lines per method - Type-safe generic implementation Note: CTE redundancy (3x experiment_items scan) is intentional to avoid passing large trace ID lists as parameters, which would cause performance issues with 10K+ traces. * [OPIK-4383] [BE] Add Redis stream subscriber for debounced experiment aggregates recomputation - Add ExperimentDenormalizationConfig implementing StreamConfiguration with debounce, job lock, and per-experiment aggregation lock settings - Add ExperimentAggregationMessage as stream message record - Add ExperimentAggregatesSubscriber consuming from the denormalization stream; acquires a workspace-scoped distributed lock per experiment before calling populateAggregations() - Add experimentDenormalizationEnabled feature flag to ServiceTogglesConfig and FeatureFlags - Wire ExperimentDenormalizationConfig into OpikConfiguration - Update config.yml and config-test.yml with full experimentDenormalization block (enabled for tests) - Add ExperimentAggregatesSubscriberTest covering lifecycle gating and processEvent success/error paths * Revision 2: Address PR comments - add config defaults, remove toggle, rename tests - ExperimentDenormalizationConfig: add sensible defaults to all fields so Dropwizard validation doesn't fail when the config block is absent from old deployments (config.isEnabled()=false still gates the subscriber) - Remove experimentDenormalizationEnabled service toggle from ServiceTogglesConfig, FeatureFlags, config.yml and config-test.yml - the infrastructure gate (config.isEnabled()) is the single control point - Rename lifecycle test methods to camelCase per project conventions: startSkipsStartupWhenDisabled / stopSkipsShutdownWhenDisabled * Revision 3: Add @max(500) to consumerBatchSize and @NotNull to jobLockWaitTime * [OPIK-4380] [BE] Address PR review comments - fix TYPE_REFERENCE visibility, redundant IN subquery, hardcoded context keys, Instant.now in loop, and inline defaultIfNull * Revision 4: Address remaining JetoPistola review comments (#7, #8, #10) - #7: Remove "Used for testing and verification" from getExperimentFromAggregates javadoc - #8: Replace recursive flatMap with Mono.expand() in populateExperimentItemsInBatches - #10: Remove unrelated subscribeOn addition from DatasetItemService.createVersionFromDelta * Revision 3: Add switchIfEmpty fallback for deleted traces in populateExperimentAggregate * Fix tests * Revision 6: Move countTotal log from DAO to service layer Operational logs belong in the service layer, not the DAO. * Revision 7: Apply Spotless formatting * Revision 8: Make populateAggregations(UUID, int) private Removes the uncapped public batch size entry point. All callers now go through the public no-arg overload which reads batchSize safely from config. * [OPIK-4380] [BE] Add evaluation_method support to experiment_aggregates pipeline - Add ClickHouse migration (000062) to add evaluation_method column to experiment_aggregates table - Add evaluationMethod field to ExperimentData record - Update GET_EXPERIMENT_DATA query to read evaluation_method from experiments - Update INSERT_EXPERIMENT_AGGREGATE to write evaluation_method to experiment_aggregates - Update SELECT_EXPERIMENT_BY_ID to read evaluation_method from experiment_aggregates - Fix Experiment record constructor call: insert EvaluationMethod at correct position (10) * [OPIK-4380] [BE] Extract shared helper for experiment aggregation queries Reduce copy-paste in getTraceAggregations, getSpanAggregations, and getFeedbackScoreAggregations by extracting queryExperimentAggregation, which centralises the context-aware execution, workspace/experiment/project parameter binding, and singleOrEmpty pattern shared by all three methods. * [OPIK-4380] [BE] Enforce non-null contract on countTotal criteria parameter Add @nonnull to ExperimentSearchCriteria in the interface and implementation so that a null argument fails fast with an explicit NullPointerException at the DAO boundary instead of crashing deep inside buildCountTemplate. * [OPIK-4380] [BE] Fix countTotal ignoring target project IDs in normal path target_project_ids was only applied inside the project_deleted LEFT JOIN subquery; the main WHERE had no project restriction, so counts were workspace-wide. Reuse has_target_projects in the main WHERE so project_id IN :target_project_ids always takes effect. Also replace manual null/empty checks with CollectionUtils.isNotEmpty. * [OPIK-4380] [BE] Apply Spotless formatting * [OPIK-4382] [BE] Address PR review comments on experiment aggregates - Fix :versionId → :version_id parameter naming in SQL templates and bindings - Fix last_updated_at binding to use item.lastUpdatedAt() instead of Instant.now() - Fix FEEDBACK_SCORES_AGGREGATED_IS_EMPTY filter: embed generated SQL into templates instead of hard-coded static condition, and add missing bind calls - Fix RetryUtils log duplication (remove getMessage() + pass exception directly) - Add batchSize = 1000 default in ExperimentAggregatesConfig - Extract resolveVersionIdForCriteria helper to deduplicate version-id resolution - Add null/blank/ClickHouse placeholder guards in extractUuidsFromGroupValues - Extract loadEntityMap helper to deduplicate getEnrichInfoHolder enrichment logic * Revision 3: Address PR comments E, F, G, H - Fix E: Extract shared template/bind helpers in ExperimentAggregatesDAO - Fix F: Bind experiment_ids as UUID[] instead of String[] - Fix G+H: Extract getEnrichInfoHolder logic into ExperimentGroupEnricher, eliminating duplication between ExperimentService and ExperimentAggregatesService without introducing a direct dependency between them * Revision 4: Fix ExperimentServiceTest to include ExperimentGroupEnricher mock * [OPIK-4382] [BE] Extract shared Row→ExperimentGroup mappers into ExperimentGroupMappers Pull the duplicated Row→ExperimentGroupItem and Row→ExperimentGroupAggregationItem conversion logic from ExperimentDAO and ExperimentAggregatesDAO into a shared ExperimentGroupMappers utility class. Both DAOs now delegate to the same toExperimentGroupItem / toExperimentGroupAggregationItem helpers, eliminating the need to mirror DTO mapping changes in two places. * [OPIK-4382] [BE] Deduplicate bindGroupCriteria into ExperimentGroupMappers Moves the shared group-criteria binding logic out of ExperimentDAO and ExperimentAggregatesDAO into ExperimentGroupMappers.bindGroupCriteria(), following the same pattern as ExperimentSearchCriteriaBinder. Adding or fixing a criteria binding now only requires a change in one place. * [OPIK-4382] [BE] Extract streamGroupQuery helper and fix null percentiles - Deduplicate findGroups/findGroupsAggregations into a single private streamGroupQuery(queryTemplate, criteria, rowMapper) that differs only by the query constant and BiFunction row mapper. - Fix convertToBigDecimal to return null for null/unsupported inputs so absent p50/p90/p99 entries in getDuration propagate as null rather than BigDecimal.ZERO, preserving the semantic-null that lets callers apply COALESCE/fallback logic correctly. * [OPIK-4382] [BE] Consolidate cost/duration helpers into ExperimentGroupMappers Promote getCostValue and getDuration to public static in ExperimentGroupMappers and delete the private copies in ExperimentDAO. ExperimentDAO.mapToDto now delegates to the shared helpers, so any future change to cost filtering, duration percentile extraction, or the BigDecimal conversion only needs to be made in one place. Side-effect: ExperimentDAO.mapToDto also picks up the null-percentile fix (convertToBigDecimal returns null for absent/unsupported inputs) that was previously applied only to ExperimentGroupMappers. * [OPIK-4382] [BE] Fix pagination count and add criteria filter tests - Remove count() OVER () window function from paged query (returned page-scoped count instead of full result-set count) - Replace with dedicated count query + short-circuit: skip items query when count == 0, use DatasetItemPage.empty() for that case - Extract DatasetItemResultMapper.buildItemFromRow as public static helper reused by ExperimentAggregatesDAO - Add parameterized integration tests for ExperimentGroupCriteria filters (name, types, projectId, combined, empty-result) covering both findGroups and findGroupsAggregations aggregate paths * [OPIK-4380] [BE] Extract shared filter helpers into FilterQueryBuilder Add FilterStrategyParam record, applyFiltersToTemplate and bindFilters static helpers to FilterQueryBuilder, then replace duplicated per-strategy loops in DatasetItemVersionDAO and ExperimentAggregatesDAO with single delegating calls backed by per-DAO strategy constants. * [OPIK-4382] [BE] Consolidate filter helpers in getExperimentItemsStatsFromAggregates Add EXPERIMENT_ITEMS_STATS_FILTER_STRATEGY_PARAMS and EXPERIMENT_ITEMS_STATS_BIND_STRATEGIES constants and replace the per-strategy toAnalyticsDbFilters/bind blocks in getExperimentItemsStatsFromAggregates with single delegating calls to FilterQueryBuilder.applyFiltersToTemplate and bindFilters. * Revision 9: Extract shared helpers to eliminate duplication across DAOs - Create DatasetItemSearchCriteriaMapper: centralizes filters + search flag wiring for DatasetItemSearchCriteria, shared by DatasetItemVersionDAO and ExperimentAggregatesDAO - Add ExperimentGroupMappers.applyGroupCriteriaToTemplate: centralizes ExperimentGroupCriteria → ST template wiring, now shared by ExperimentDAO and ExperimentAggregatesDAO - Update DatasetItemVersionDAO.addFiltersToTemplate and bindSearchAndFilters to delegate to DatasetItemSearchCriteriaMapper - Update ExperimentAggregatesDAO.applyDatasetItemFiltersToTemplate and bindDatasetItemSearchParams to delegate to DatasetItemSearchCriteriaMapper - Update ExperimentDAO.newGroupTemplate and ExperimentAggregatesDAO.newGroupTemplate to delegate to ExperimentGroupMappers.applyGroupCriteriaToTemplate * [OPIK-4383] [BE] Add experiment aggregate event listener and no-op publisher * Revision 2: Fix missing import for ExperimentAggregationPublisher * Fix tests setup * [OPIK-4383] [BE] Address PR review: move DAO logs to service layer * [OPIK-4383] [BE] Address PR review: extract shared DAO helper and fix log placement * [OPIK-4383] [BE] Short-circuit deleteByTraceIds when no spans found Skip delete, cascading operations, and SpansDeleted event when getSpanIdsForTraces returns an empty set, preserving the original no-op behaviour and avoiding the Preconditions.checkArgument failure in SpanDAO.deleteByIds. * [OPIK-4383] [BE] Fix cascade deletion failures after trace delete Two bugs prevented spans and attachments from being deleted when a trace was deleted via the event-driven cascade: 1. FeedbackScoreService.deleteByTraceIds/deleteBySpanIds had @nonnull on projectId which threw NPE when TracesDeleted.projectId() was null. EventInterceptor swallowed the NPE, stopping the entire cascade chain. Fix: remove @nonnull since the DAO already handles null safely via Optional.ofNullable(projectId). 2. SpanDAO.DELETE_BY_IDS had the wrong column (trace_id) and parameter name (span_ids) — the ClickHouse R2DBC driver could not resolve :span_ids as a named parameter in the DELETE statement. Fixed by using id IN :ids to match the working pattern in TraceDAO.DELETE_BY_ID. * [OPIK-4383] [BE] Remove DAO-level log.info from ExperimentAggregatesDAO methods Move operational logging responsibility to the service layer, consistent with earlier fixes for ExperimentItemDAO and SpanDAO in this PR. * Remove accidentally committed doc files These files were introduced during merge resolution but should not be part of the branch. * [OPIK-4383] [BE] refactor: extract triggerAggregation helper to centralize guard+publish flow * [OPIK-4383] [BE] fix: restore TagOperations.tagUpdateFragment in SpanDAO BULK_UPDATE Restores proper tag handling in SpanDAO.BULK_UPDATE query that was regressed to a simple arrayConcat. Now uses TagOperations.tagUpdateFragment() which provides arrayDistinct(), tag limit enforcement (max 50), and tags_to_add/tags_to_remove support. Also adds the required short_circuit_function_evaluation SETTINGS for throwIf evaluation. * [OPIK-4383] [BE] Rename segment from delete_by_trace_id to delete_by_span_ids Fix stale segment name that was not updated when deleteByTraceIds was refactored into deleteByIds (which now deletes by span IDs). * [OPIK-4383] [BE] Add Preconditions guard for experimentIds in event classes Enforce non-empty experimentIds in ExperimentItemsCreated and ExperimentItemsDeleted constructors to make the contract explicit. * [OPIK-4383] [BE] Remove unreachable empty-experimentIds tests Producers already guarantee non-empty experimentIds before constructing ExperimentItemsCreated/Deleted events, so these tests were exercising an impossible scenario that now correctly fails the Preconditions guard. * [OPIK-4383] [BE] Fix @NotNull to @nonnull in FeedbackScoreService * [OPIK-4383] [BE] Remove redundant traceId null checks in SpanService SpanUpdate.traceId is @NotNull (Jakarta validation), so the null guard is dead code — the event always fires after validation. * [OPIK-4383] [BE] Remove orphaned deleteAllThreadScores from FeedbackScoreService The DAO method was removed during merge from main but the service interface and implementation were not cleaned up, causing a compilation error.

@nonnull

…alizationJob and tests (#5511) * [OPIK-4380] [BE] Add experiment aggregates for denormalized metrics - Add experiment_aggregates and experiment_item_aggregates tables - Implement ExperimentAggregatesDAO with population and query methods - Add ExperimentAggregatesService for aggregation management - Refactor DTOs into organized model classes: - ExperimentAggregatesModel: aggregation results - ExperimentEntityData: entity models - ExperimentSourceData: raw source data - ExperimentAggregatesUtils: utilities - Add FEEDBACK_SCORES_AGGREGATED filter strategy for map-based filtering - Add comprehensive integration tests (10/10 passing) - Configure batch size and parallelism settings * [OPIK-4380] [BE] Add MySQL deadlock retry mechanism for concurrent dataset operations Problem: - MySQL deadlock on dataset_version_tags composite PRIMARY KEY (workspace_id, dataset_id, tag) - Occurred during parallel dataset creation in same workspace - Multiple threads inserting "latest" tag for different datasets caused lock contention - Experiments with parallel execution were failing with MySQLTransactionRollbackException Solution: - Add handleOnDeadLocks() method in RetryUtils with: - 5 retry attempts with exponential backoff (250ms to 2s) - 0.5 jitter to reduce thundering herd effect - Recursive isDatabaseDeadlock() detection for MySQLTransactionRollbackException - Apply retry logic in DatasetItemService.setDatasetItemVersion() - Enables concurrent dataset creation for same workspace Impact: - Supports parallel experiment execution with proper deadlock handling - Test success rate improved from 0/10 to 10/10 in ExperimentAggregatesIntegrationTest * Fix visibility * [OPIK-4380] [BE] Address PR review comments for experiment aggregates Fixed 11 automated review comments from baz-reviewer: CRITICAL fixes: - Prevent NPE on null span aggregations by adding coalesce() in SQL - Handle multi-project experiments with LIMIT 1 in GET_PROJECT_ID - Handle zero-item experiments with empty aggregation helpers - Bind feedback_scores_percentiles map instead of empty CAST HIGH priority fixes: - Use toDecimal128(12) instead of toDecimal64(9) for cost percentiles - Add null-safe tags handling with Optional.ofNullable() - Include exception objects in retry logging for stack traces MEDIUM priority fixes: - Add missing log_comment to SELECT_EXPERIMENT_BY_ID query - Add missing log_comment to GET_PROJECT_ID query LOW priority fixes: - Remove duplicate "id" binding in bindItemsParameters - Enhance batchSize config documentation with details All 11 integration tests passing. * [OPIK-4382] [BE] Refactor experiment aggregates with import cleanup and Optional patterns - Add missing imports for IntStream, ProjectStats, and other dependencies - Replace fully-qualified class names with proper imports across DAO and Service classes - Fix IS_NOT_EMPTY filter handling for FEEDBACK_SCORES_AGGREGATED strategies - Refactor null checks to use Optional in mapping methods: - mapFeedbackScoreAggregations, mapExperimentFromAggregates - mapFeedbackScoreData, mapExperimentGroupAggregationItem - Batch insert preparation with Optional chains - Improve code readability and maintainability with functional patterns * [OPIK-4380] [BE] Fix table definition * [OPIK-4380] [BE] Address PR comments and consolidate DatasetItemService methods - Fix tags NPE in ExperimentAggregatesDAO with defaultIfNull - Remove unnecessary FINAL clause from GET_EXPERIMENT_DATA query - Fix test naming in ExperimentAggregatesIntegrationTest - Consolidate 7 duplicate createVersionFromDelta methods into single canonical implementation - Remove debug logger from config-test.yml * [OPIK-4380] [BE] Fix missing log_comment and centralize search criteria binding - Fix SELECT_EXPERIMENT_BY_ID to properly render log_comment metadata - Use getSTWithLogComment pattern in getExperimentFromAggregates - Ensures ClickHouse query logging populates workspace/experiment IDs - Centralize ExperimentSearchCriteria binding logic - Create ExperimentSearchCriteriaBinder utility class - Parameterize filter strategies to support both DAO variants - Eliminate 29-line duplication between ExperimentDAO and ExperimentAggregatesDAO - Single source of truth prevents DAOs from getting out of sync * [OPIK-4380] [BE] Fix createVersionFromDelta consolidation after rebase - Update canonical method signature to include new parameters: - List<EvaluatorItem> evaluators - ExecutionPolicy executionPolicy - boolean clearExecutionPolicy - Update all 5 caller sites to pass new parameters: - Use changes.evaluators(), changes.executionPolicy(), changes.clearExecutionPolicy() when available - Pass null/false for auto-generated versions that inherit from base - Add imports for EvaluatorItem and ExecutionPolicy Fixes compilation errors introduced by rebase with upstream changes to DatasetVersionService * [OPIK-4380] [BE] Address PR review comments - fix type mismatch, extract constants, remove DAO logging - Fixed BigDecimal[] to Double[] conversion for experiment_scores (matches ClickHouse Float64) - Extracted FilterStrategy lists to static final constants to avoid repeated allocations - Added @nonnull validation to populateExperimentAggregate parameter - Removed DAO layer logging, keeping service-level logging only * [OPIK-4380] [BE] Extract shared helper for experiment data pagination Extract streamWithExperimentPagination() helper method to eliminate duplication in getTracesData(), getSpansData(), and getFeedbackScoresData(). All three methods followed identical pattern: - asyncTemplate.stream with connection - getSTWithLogComment with cursor flag - Bind workspace_id, experiment_id, project_id, limit - Optional cursor binding - Result mapping Benefits: - Single source of truth for pagination binding logic - Prevents divergence when tweaking cursor/limit bindings - Reduces code from ~20 lines to ~10 lines per method - Type-safe generic implementation Note: CTE redundancy (3x experiment_items scan) is intentional to avoid passing large trace ID lists as parameters, which would cause performance issues with 10K+ traces. * [OPIK-4383] [BE] Add Redis stream subscriber for debounced experiment aggregates recomputation - Add ExperimentDenormalizationConfig implementing StreamConfiguration with debounce, job lock, and per-experiment aggregation lock settings - Add ExperimentAggregationMessage as stream message record - Add ExperimentAggregatesSubscriber consuming from the denormalization stream; acquires a workspace-scoped distributed lock per experiment before calling populateAggregations() - Add experimentDenormalizationEnabled feature flag to ServiceTogglesConfig and FeatureFlags - Wire ExperimentDenormalizationConfig into OpikConfiguration - Update config.yml and config-test.yml with full experimentDenormalization block (enabled for tests) - Add ExperimentAggregatesSubscriberTest covering lifecycle gating and processEvent success/error paths * Revision 2: Address PR comments - add config defaults, remove toggle, rename tests - ExperimentDenormalizationConfig: add sensible defaults to all fields so Dropwizard validation doesn't fail when the config block is absent from old deployments (config.isEnabled()=false still gates the subscriber) - Remove experimentDenormalizationEnabled service toggle from ServiceTogglesConfig, FeatureFlags, config.yml and config-test.yml - the infrastructure gate (config.isEnabled()) is the single control point - Rename lifecycle test methods to camelCase per project conventions: startSkipsStartupWhenDisabled / stopSkipsShutdownWhenDisabled * Revision 3: Add @max(500) to consumerBatchSize and @NotNull to jobLockWaitTime * [OPIK-4380] [BE] Address PR review comments - fix TYPE_REFERENCE visibility, redundant IN subquery, hardcoded context keys, Instant.now in loop, and inline defaultIfNull * Revision 4: Address remaining JetoPistola review comments (#7, #8, #10) - #7: Remove "Used for testing and verification" from getExperimentFromAggregates javadoc - #8: Replace recursive flatMap with Mono.expand() in populateExperimentItemsInBatches - #10: Remove unrelated subscribeOn addition from DatasetItemService.createVersionFromDelta * Revision 3: Add switchIfEmpty fallback for deleted traces in populateExperimentAggregate * Fix tests * Revision 6: Move countTotal log from DAO to service layer Operational logs belong in the service layer, not the DAO. * Revision 7: Apply Spotless formatting * Revision 8: Make populateAggregations(UUID, int) private Removes the uncapped public batch size entry point. All callers now go through the public no-arg overload which reads batchSize safely from config. * [OPIK-4380] [BE] Add evaluation_method support to experiment_aggregates pipeline - Add ClickHouse migration (000062) to add evaluation_method column to experiment_aggregates table - Add evaluationMethod field to ExperimentData record - Update GET_EXPERIMENT_DATA query to read evaluation_method from experiments - Update INSERT_EXPERIMENT_AGGREGATE to write evaluation_method to experiment_aggregates - Update SELECT_EXPERIMENT_BY_ID to read evaluation_method from experiment_aggregates - Fix Experiment record constructor call: insert EvaluationMethod at correct position (10) * [OPIK-4380] [BE] Extract shared helper for experiment aggregation queries Reduce copy-paste in getTraceAggregations, getSpanAggregations, and getFeedbackScoreAggregations by extracting queryExperimentAggregation, which centralises the context-aware execution, workspace/experiment/project parameter binding, and singleOrEmpty pattern shared by all three methods. * [OPIK-4380] [BE] Enforce non-null contract on countTotal criteria parameter Add @nonnull to ExperimentSearchCriteria in the interface and implementation so that a null argument fails fast with an explicit NullPointerException at the DAO boundary instead of crashing deep inside buildCountTemplate. * [OPIK-4380] [BE] Fix countTotal ignoring target project IDs in normal path target_project_ids was only applied inside the project_deleted LEFT JOIN subquery; the main WHERE had no project restriction, so counts were workspace-wide. Reuse has_target_projects in the main WHERE so project_id IN :target_project_ids always takes effect. Also replace manual null/empty checks with CollectionUtils.isNotEmpty. * [OPIK-4380] [BE] Apply Spotless formatting * [OPIK-4382] [BE] Address PR review comments on experiment aggregates - Fix :versionId → :version_id parameter naming in SQL templates and bindings - Fix last_updated_at binding to use item.lastUpdatedAt() instead of Instant.now() - Fix FEEDBACK_SCORES_AGGREGATED_IS_EMPTY filter: embed generated SQL into templates instead of hard-coded static condition, and add missing bind calls - Fix RetryUtils log duplication (remove getMessage() + pass exception directly) - Add batchSize = 1000 default in ExperimentAggregatesConfig - Extract resolveVersionIdForCriteria helper to deduplicate version-id resolution - Add null/blank/ClickHouse placeholder guards in extractUuidsFromGroupValues - Extract loadEntityMap helper to deduplicate getEnrichInfoHolder enrichment logic * Revision 3: Address PR comments E, F, G, H - Fix E: Extract shared template/bind helpers in ExperimentAggregatesDAO - Fix F: Bind experiment_ids as UUID[] instead of String[] - Fix G+H: Extract getEnrichInfoHolder logic into ExperimentGroupEnricher, eliminating duplication between ExperimentService and ExperimentAggregatesService without introducing a direct dependency between them * Revision 4: Fix ExperimentServiceTest to include ExperimentGroupEnricher mock * [OPIK-4382] [BE] Extract shared Row→ExperimentGroup mappers into ExperimentGroupMappers Pull the duplicated Row→ExperimentGroupItem and Row→ExperimentGroupAggregationItem conversion logic from ExperimentDAO and ExperimentAggregatesDAO into a shared ExperimentGroupMappers utility class. Both DAOs now delegate to the same toExperimentGroupItem / toExperimentGroupAggregationItem helpers, eliminating the need to mirror DTO mapping changes in two places. * [OPIK-4382] [BE] Deduplicate bindGroupCriteria into ExperimentGroupMappers Moves the shared group-criteria binding logic out of ExperimentDAO and ExperimentAggregatesDAO into ExperimentGroupMappers.bindGroupCriteria(), following the same pattern as ExperimentSearchCriteriaBinder. Adding or fixing a criteria binding now only requires a change in one place. * [OPIK-4382] [BE] Extract streamGroupQuery helper and fix null percentiles - Deduplicate findGroups/findGroupsAggregations into a single private streamGroupQuery(queryTemplate, criteria, rowMapper) that differs only by the query constant and BiFunction row mapper. - Fix convertToBigDecimal to return null for null/unsupported inputs so absent p50/p90/p99 entries in getDuration propagate as null rather than BigDecimal.ZERO, preserving the semantic-null that lets callers apply COALESCE/fallback logic correctly. * [OPIK-4382] [BE] Consolidate cost/duration helpers into ExperimentGroupMappers Promote getCostValue and getDuration to public static in ExperimentGroupMappers and delete the private copies in ExperimentDAO. ExperimentDAO.mapToDto now delegates to the shared helpers, so any future change to cost filtering, duration percentile extraction, or the BigDecimal conversion only needs to be made in one place. Side-effect: ExperimentDAO.mapToDto also picks up the null-percentile fix (convertToBigDecimal returns null for absent/unsupported inputs) that was previously applied only to ExperimentGroupMappers. * [OPIK-4382] [BE] Fix pagination count and add criteria filter tests - Remove count() OVER () window function from paged query (returned page-scoped count instead of full result-set count) - Replace with dedicated count query + short-circuit: skip items query when count == 0, use DatasetItemPage.empty() for that case - Extract DatasetItemResultMapper.buildItemFromRow as public static helper reused by ExperimentAggregatesDAO - Add parameterized integration tests for ExperimentGroupCriteria filters (name, types, projectId, combined, empty-result) covering both findGroups and findGroupsAggregations aggregate paths * [OPIK-4380] [BE] Extract shared filter helpers into FilterQueryBuilder Add FilterStrategyParam record, applyFiltersToTemplate and bindFilters static helpers to FilterQueryBuilder, then replace duplicated per-strategy loops in DatasetItemVersionDAO and ExperimentAggregatesDAO with single delegating calls backed by per-DAO strategy constants. * [OPIK-4382] [BE] Consolidate filter helpers in getExperimentItemsStatsFromAggregates Add EXPERIMENT_ITEMS_STATS_FILTER_STRATEGY_PARAMS and EXPERIMENT_ITEMS_STATS_BIND_STRATEGIES constants and replace the per-strategy toAnalyticsDbFilters/bind blocks in getExperimentItemsStatsFromAggregates with single delegating calls to FilterQueryBuilder.applyFiltersToTemplate and bindFilters. * Revision 9: Extract shared helpers to eliminate duplication across DAOs - Create DatasetItemSearchCriteriaMapper: centralizes filters + search flag wiring for DatasetItemSearchCriteria, shared by DatasetItemVersionDAO and ExperimentAggregatesDAO - Add ExperimentGroupMappers.applyGroupCriteriaToTemplate: centralizes ExperimentGroupCriteria → ST template wiring, now shared by ExperimentDAO and ExperimentAggregatesDAO - Update DatasetItemVersionDAO.addFiltersToTemplate and bindSearchAndFilters to delegate to DatasetItemSearchCriteriaMapper - Update ExperimentAggregatesDAO.applyDatasetItemFiltersToTemplate and bindDatasetItemSearchParams to delegate to DatasetItemSearchCriteriaMapper - Update ExperimentDAO.newGroupTemplate and ExperimentAggregatesDAO.newGroupTemplate to delegate to ExperimentGroupMappers.applyGroupCriteriaToTemplate * [OPIK-4383] [BE] Add experiment aggregate event listener and no-op publisher * Revision 2: Fix missing import for ExperimentAggregationPublisher * [OPIK-4383] [BE] Add ExperimentAggregationPublisher, ExperimentDenormalizationJob and tests - ExperimentAggregationPublisher: debounces experiment aggregation triggers by writing compound workspaceId:experimentId members to a Redis ZSET scored by expiry timestamp (now + debounceDelay), plus a hash storing the userName with TTL=2×debounceDelay to handle stale entries. - ExperimentDenormalizationJob: @every("5s") job that reads ZSET members with score <= now, publishes ExperimentAggregationMessage to the Redis stream, then cleans up the ZSET entry and hash bucket. Handles stale entries (expired hash) by removing the orphaned ZSET member without publishing. - Fix processExperiment reactive chain: avoided double index.remove by returning Mono<Boolean> from flatMap branches so switchIfEmpty is only triggered when the bucket is truly empty. - ExperimentAggregationPublisherTest: integration tests with real Redis container verifying ZSET membership, score, userName storage, TTL, workspace isolation, and debounce deduplication. - ExperimentDenormalizationJobTest: unit tests with Mockito covering disabled config, lock not acquired, empty ZSET, happy path, stale entry, and batch. * Fix tests setup * [OPIK-4383] [BE] Address PR review: move DAO logs to service layer * [OPIK-4383] [BE] Address PR review: extract shared DAO helper and fix log placement * [OPIK-4383] [BE] Short-circuit deleteByTraceIds when no spans found Skip delete, cascading operations, and SpansDeleted event when getSpanIdsForTraces returns an empty set, preserving the original no-op behaviour and avoiding the Preconditions.checkArgument failure in SpanDAO.deleteByIds. * [OPIK-4383] [BE] Fix cascade deletion failures after trace delete Two bugs prevented spans and attachments from being deleted when a trace was deleted via the event-driven cascade: 1. FeedbackScoreService.deleteByTraceIds/deleteBySpanIds had @nonnull on projectId which threw NPE when TracesDeleted.projectId() was null. EventInterceptor swallowed the NPE, stopping the entire cascade chain. Fix: remove @nonnull since the DAO already handles null safely via Optional.ofNullable(projectId). 2. SpanDAO.DELETE_BY_IDS had the wrong column (trace_id) and parameter name (span_ids) — the ClickHouse R2DBC driver could not resolve :span_ids as a named parameter in the DELETE statement. Fixed by using id IN :ids to match the working pattern in TraceDAO.DELETE_BY_ID. * [OPIK-4383] [BE] Address PR review comments on ExperimentDenormalizationJob - Centralize Redis constants (EXPERIMENT_KEY_PREFIX, USER_NAME_FIELD, MEMBER_SEPARATOR) in ExperimentDenormalizationConfig - Change ExperimentAggregationPublisher.publish() to return Mono<Void> instead of void, so errors propagate to callers - Make job interval configurable via jobs map in config.yml - Fix onErrorContinue logging: remove getMessage() duplication - Demote per-experiment logs from INFO to DEBUG - Add ZSET pagination using expand() to avoid materializing entire range - Update tests for all changes * Fix @every job interval config key casing and add jobs section to test config The dropwizard-jobs framework uses WordUtils.uncapitalize(class.getSimpleName()) to look up the interval in the jobs map, so the key must be 'experimentDenormalizationJob' (lowercase first letter). Also adds the missing jobs section and jobBatchSize to config-test.yml. * Replace @every annotation with programmatic Quartz scheduling Remove @every from ExperimentDenormalizationJob and schedule it programmatically in OpikGuiceyLifecycleEventListener, following the same pattern as TraceThreadsClosingJob. Add jobInterval config field to ExperimentDenormalizationConfig. Remove the jobs YAML section that caused deserialization errors with JobConfiguration's immutable map. * Add experiment context to error log and extract publishIfNotEmpty helper - Include experimentId and workspaceId in onExperimentUpdated error log - Extract publishIfNotEmpty helper to deduplicate filter+publish logic across triggerByExperimentIds, triggerByTraceIds, triggerBySpanIds * Fix NPE in ExperimentAggregateEventListenerTest mock setup Stub publisher.publish() to return Mono.empty() in setUp so .subscribe() calls in production code don't NPE on null. * [OPIK-4383] [BE] Remove DAO-level log.info from ExperimentAggregatesDAO methods Move operational logging responsibility to the service layer, consistent with earlier fixes for ExperimentItemDAO and SpanDAO in this PR. * Remove accidentally committed doc files These files were introduced during merge resolution but should not be part of the branch. * [OPIK-4383] [BE] refactor: extract triggerAggregation helper to centralize guard+publish flow * [OPIK-4383] [BE] fix: restore TagOperations.tagUpdateFragment in SpanDAO BULK_UPDATE Restores proper tag handling in SpanDAO.BULK_UPDATE query that was regressed to a simple arrayConcat. Now uses TagOperations.tagUpdateFragment() which provides arrayDistinct(), tag limit enforcement (max 50), and tags_to_add/tags_to_remove support. Also adds the required short_circuit_function_evaluation SETTINGS for throwIf evaluation. * Adding InterruptableJob * [OPIK-4383] [BE] Address PR review: expand safety valve, env var prefix - Add batchSize-capped iteration counter to expand() to prevent infinite loops when ZSET entries fail to be removed - Rename EXPERIMENT_DENORM_JOB_INTERVAL to OPIK_EXPERIMENT_DENORM_JOB_INTERVAL to follow the OPIK_ prefix convention

@nonnull

…ndpoints (#5577) * [OPIK-4380] [BE] Add experiment aggregates for denormalized metrics - Add experiment_aggregates and experiment_item_aggregates tables - Implement ExperimentAggregatesDAO with population and query methods - Add ExperimentAggregatesService for aggregation management - Refactor DTOs into organized model classes: - ExperimentAggregatesModel: aggregation results - ExperimentEntityData: entity models - ExperimentSourceData: raw source data - ExperimentAggregatesUtils: utilities - Add FEEDBACK_SCORES_AGGREGATED filter strategy for map-based filtering - Add comprehensive integration tests (10/10 passing) - Configure batch size and parallelism settings * [OPIK-4380] [BE] Add MySQL deadlock retry mechanism for concurrent dataset operations Problem: - MySQL deadlock on dataset_version_tags composite PRIMARY KEY (workspace_id, dataset_id, tag) - Occurred during parallel dataset creation in same workspace - Multiple threads inserting "latest" tag for different datasets caused lock contention - Experiments with parallel execution were failing with MySQLTransactionRollbackException Solution: - Add handleOnDeadLocks() method in RetryUtils with: - 5 retry attempts with exponential backoff (250ms to 2s) - 0.5 jitter to reduce thundering herd effect - Recursive isDatabaseDeadlock() detection for MySQLTransactionRollbackException - Apply retry logic in DatasetItemService.setDatasetItemVersion() - Enables concurrent dataset creation for same workspace Impact: - Supports parallel experiment execution with proper deadlock handling - Test success rate improved from 0/10 to 10/10 in ExperimentAggregatesIntegrationTest * Fix visibility * [OPIK-4380] [BE] Address PR review comments for experiment aggregates Fixed 11 automated review comments from baz-reviewer: CRITICAL fixes: - Prevent NPE on null span aggregations by adding coalesce() in SQL - Handle multi-project experiments with LIMIT 1 in GET_PROJECT_ID - Handle zero-item experiments with empty aggregation helpers - Bind feedback_scores_percentiles map instead of empty CAST HIGH priority fixes: - Use toDecimal128(12) instead of toDecimal64(9) for cost percentiles - Add null-safe tags handling with Optional.ofNullable() - Include exception objects in retry logging for stack traces MEDIUM priority fixes: - Add missing log_comment to SELECT_EXPERIMENT_BY_ID query - Add missing log_comment to GET_PROJECT_ID query LOW priority fixes: - Remove duplicate "id" binding in bindItemsParameters - Enhance batchSize config documentation with details All 11 integration tests passing. * [OPIK-4382] [BE] Refactor experiment aggregates with import cleanup and Optional patterns - Add missing imports for IntStream, ProjectStats, and other dependencies - Replace fully-qualified class names with proper imports across DAO and Service classes - Fix IS_NOT_EMPTY filter handling for FEEDBACK_SCORES_AGGREGATED strategies - Refactor null checks to use Optional in mapping methods: - mapFeedbackScoreAggregations, mapExperimentFromAggregates - mapFeedbackScoreData, mapExperimentGroupAggregationItem - Batch insert preparation with Optional chains - Improve code readability and maintainability with functional patterns * [OPIK-4380] [BE] Fix table definition * [OPIK-4380] [BE] Address PR comments and consolidate DatasetItemService methods - Fix tags NPE in ExperimentAggregatesDAO with defaultIfNull - Remove unnecessary FINAL clause from GET_EXPERIMENT_DATA query - Fix test naming in ExperimentAggregatesIntegrationTest - Consolidate 7 duplicate createVersionFromDelta methods into single canonical implementation - Remove debug logger from config-test.yml * [OPIK-4380] [BE] Fix missing log_comment and centralize search criteria binding - Fix SELECT_EXPERIMENT_BY_ID to properly render log_comment metadata - Use getSTWithLogComment pattern in getExperimentFromAggregates - Ensures ClickHouse query logging populates workspace/experiment IDs - Centralize ExperimentSearchCriteria binding logic - Create ExperimentSearchCriteriaBinder utility class - Parameterize filter strategies to support both DAO variants - Eliminate 29-line duplication between ExperimentDAO and ExperimentAggregatesDAO - Single source of truth prevents DAOs from getting out of sync * [OPIK-4380] [BE] Fix createVersionFromDelta consolidation after rebase - Update canonical method signature to include new parameters: - List<EvaluatorItem> evaluators - ExecutionPolicy executionPolicy - boolean clearExecutionPolicy - Update all 5 caller sites to pass new parameters: - Use changes.evaluators(), changes.executionPolicy(), changes.clearExecutionPolicy() when available - Pass null/false for auto-generated versions that inherit from base - Add imports for EvaluatorItem and ExecutionPolicy Fixes compilation errors introduced by rebase with upstream changes to DatasetVersionService * [OPIK-4380] [BE] Address PR review comments - fix type mismatch, extract constants, remove DAO logging - Fixed BigDecimal[] to Double[] conversion for experiment_scores (matches ClickHouse Float64) - Extracted FilterStrategy lists to static final constants to avoid repeated allocations - Added @nonnull validation to populateExperimentAggregate parameter - Removed DAO layer logging, keeping service-level logging only * [OPIK-4380] [BE] Extract shared helper for experiment data pagination Extract streamWithExperimentPagination() helper method to eliminate duplication in getTracesData(), getSpansData(), and getFeedbackScoresData(). All three methods followed identical pattern: - asyncTemplate.stream with connection - getSTWithLogComment with cursor flag - Bind workspace_id, experiment_id, project_id, limit - Optional cursor binding - Result mapping Benefits: - Single source of truth for pagination binding logic - Prevents divergence when tweaking cursor/limit bindings - Reduces code from ~20 lines to ~10 lines per method - Type-safe generic implementation Note: CTE redundancy (3x experiment_items scan) is intentional to avoid passing large trace ID lists as parameters, which would cause performance issues with 10K+ traces. * [OPIK-4383] [BE] Add Redis stream subscriber for debounced experiment aggregates recomputation - Add ExperimentDenormalizationConfig implementing StreamConfiguration with debounce, job lock, and per-experiment aggregation lock settings - Add ExperimentAggregationMessage as stream message record - Add ExperimentAggregatesSubscriber consuming from the denormalization stream; acquires a workspace-scoped distributed lock per experiment before calling populateAggregations() - Add experimentDenormalizationEnabled feature flag to ServiceTogglesConfig and FeatureFlags - Wire ExperimentDenormalizationConfig into OpikConfiguration - Update config.yml and config-test.yml with full experimentDenormalization block (enabled for tests) - Add ExperimentAggregatesSubscriberTest covering lifecycle gating and processEvent success/error paths * Revision 2: Address PR comments - add config defaults, remove toggle, rename tests - ExperimentDenormalizationConfig: add sensible defaults to all fields so Dropwizard validation doesn't fail when the config block is absent from old deployments (config.isEnabled()=false still gates the subscriber) - Remove experimentDenormalizationEnabled service toggle from ServiceTogglesConfig, FeatureFlags, config.yml and config-test.yml - the infrastructure gate (config.isEnabled()) is the single control point - Rename lifecycle test methods to camelCase per project conventions: startSkipsStartupWhenDisabled / stopSkipsShutdownWhenDisabled * Revision 3: Add @max(500) to consumerBatchSize and @NotNull to jobLockWaitTime * [OPIK-4380] [BE] Address PR review comments - fix TYPE_REFERENCE visibility, redundant IN subquery, hardcoded context keys, Instant.now in loop, and inline defaultIfNull * Revision 4: Address remaining JetoPistola review comments (#7, #8, #10) - #7: Remove "Used for testing and verification" from getExperimentFromAggregates javadoc - #8: Replace recursive flatMap with Mono.expand() in populateExperimentItemsInBatches - #10: Remove unrelated subscribeOn addition from DatasetItemService.createVersionFromDelta * Revision 3: Add switchIfEmpty fallback for deleted traces in populateExperimentAggregate * Fix tests * Revision 6: Move countTotal log from DAO to service layer Operational logs belong in the service layer, not the DAO. * Revision 7: Apply Spotless formatting * Revision 8: Make populateAggregations(UUID, int) private Removes the uncapped public batch size entry point. All callers now go through the public no-arg overload which reads batchSize safely from config. * [OPIK-4380] [BE] Add evaluation_method support to experiment_aggregates pipeline - Add ClickHouse migration (000062) to add evaluation_method column to experiment_aggregates table - Add evaluationMethod field to ExperimentData record - Update GET_EXPERIMENT_DATA query to read evaluation_method from experiments - Update INSERT_EXPERIMENT_AGGREGATE to write evaluation_method to experiment_aggregates - Update SELECT_EXPERIMENT_BY_ID to read evaluation_method from experiment_aggregates - Fix Experiment record constructor call: insert EvaluationMethod at correct position (10) * [OPIK-4380] [BE] Extract shared helper for experiment aggregation queries Reduce copy-paste in getTraceAggregations, getSpanAggregations, and getFeedbackScoreAggregations by extracting queryExperimentAggregation, which centralises the context-aware execution, workspace/experiment/project parameter binding, and singleOrEmpty pattern shared by all three methods. * [OPIK-4380] [BE] Enforce non-null contract on countTotal criteria parameter Add @nonnull to ExperimentSearchCriteria in the interface and implementation so that a null argument fails fast with an explicit NullPointerException at the DAO boundary instead of crashing deep inside buildCountTemplate. * [OPIK-4380] [BE] Fix countTotal ignoring target project IDs in normal path target_project_ids was only applied inside the project_deleted LEFT JOIN subquery; the main WHERE had no project restriction, so counts were workspace-wide. Reuse has_target_projects in the main WHERE so project_id IN :target_project_ids always takes effect. Also replace manual null/empty checks with CollectionUtils.isNotEmpty. * [OPIK-4380] [BE] Apply Spotless formatting * [OPIK-4382] [BE] Address PR review comments on experiment aggregates - Fix :versionId → :version_id parameter naming in SQL templates and bindings - Fix last_updated_at binding to use item.lastUpdatedAt() instead of Instant.now() - Fix FEEDBACK_SCORES_AGGREGATED_IS_EMPTY filter: embed generated SQL into templates instead of hard-coded static condition, and add missing bind calls - Fix RetryUtils log duplication (remove getMessage() + pass exception directly) - Add batchSize = 1000 default in ExperimentAggregatesConfig - Extract resolveVersionIdForCriteria helper to deduplicate version-id resolution - Add null/blank/ClickHouse placeholder guards in extractUuidsFromGroupValues - Extract loadEntityMap helper to deduplicate getEnrichInfoHolder enrichment logic * Revision 3: Address PR comments E, F, G, H - Fix E: Extract shared template/bind helpers in ExperimentAggregatesDAO - Fix F: Bind experiment_ids as UUID[] instead of String[] - Fix G+H: Extract getEnrichInfoHolder logic into ExperimentGroupEnricher, eliminating duplication between ExperimentService and ExperimentAggregatesService without introducing a direct dependency between them * Revision 4: Fix ExperimentServiceTest to include ExperimentGroupEnricher mock * [OPIK-4382] [BE] Extract shared Row→ExperimentGroup mappers into ExperimentGroupMappers Pull the duplicated Row→ExperimentGroupItem and Row→ExperimentGroupAggregationItem conversion logic from ExperimentDAO and ExperimentAggregatesDAO into a shared ExperimentGroupMappers utility class. Both DAOs now delegate to the same toExperimentGroupItem / toExperimentGroupAggregationItem helpers, eliminating the need to mirror DTO mapping changes in two places. * [OPIK-4382] [BE] Deduplicate bindGroupCriteria into ExperimentGroupMappers Moves the shared group-criteria binding logic out of ExperimentDAO and ExperimentAggregatesDAO into ExperimentGroupMappers.bindGroupCriteria(), following the same pattern as ExperimentSearchCriteriaBinder. Adding or fixing a criteria binding now only requires a change in one place. * [OPIK-4382] [BE] Extract streamGroupQuery helper and fix null percentiles - Deduplicate findGroups/findGroupsAggregations into a single private streamGroupQuery(queryTemplate, criteria, rowMapper) that differs only by the query constant and BiFunction row mapper. - Fix convertToBigDecimal to return null for null/unsupported inputs so absent p50/p90/p99 entries in getDuration propagate as null rather than BigDecimal.ZERO, preserving the semantic-null that lets callers apply COALESCE/fallback logic correctly. * [OPIK-4382] [BE] Consolidate cost/duration helpers into ExperimentGroupMappers Promote getCostValue and getDuration to public static in ExperimentGroupMappers and delete the private copies in ExperimentDAO. ExperimentDAO.mapToDto now delegates to the shared helpers, so any future change to cost filtering, duration percentile extraction, or the BigDecimal conversion only needs to be made in one place. Side-effect: ExperimentDAO.mapToDto also picks up the null-percentile fix (convertToBigDecimal returns null for absent/unsupported inputs) that was previously applied only to ExperimentGroupMappers. * [OPIK-4382] [BE] Fix pagination count and add criteria filter tests - Remove count() OVER () window function from paged query (returned page-scoped count instead of full result-set count) - Replace with dedicated count query + short-circuit: skip items query when count == 0, use DatasetItemPage.empty() for that case - Extract DatasetItemResultMapper.buildItemFromRow as public static helper reused by ExperimentAggregatesDAO - Add parameterized integration tests for ExperimentGroupCriteria filters (name, types, projectId, combined, empty-result) covering both findGroups and findGroupsAggregations aggregate paths * [OPIK-4380] [BE] Extract shared filter helpers into FilterQueryBuilder Add FilterStrategyParam record, applyFiltersToTemplate and bindFilters static helpers to FilterQueryBuilder, then replace duplicated per-strategy loops in DatasetItemVersionDAO and ExperimentAggregatesDAO with single delegating calls backed by per-DAO strategy constants. * [OPIK-4382] [BE] Consolidate filter helpers in getExperimentItemsStatsFromAggregates Add EXPERIMENT_ITEMS_STATS_FILTER_STRATEGY_PARAMS and EXPERIMENT_ITEMS_STATS_BIND_STRATEGIES constants and replace the per-strategy toAnalyticsDbFilters/bind blocks in getExperimentItemsStatsFromAggregates with single delegating calls to FilterQueryBuilder.applyFiltersToTemplate and bindFilters. * Revision 9: Extract shared helpers to eliminate duplication across DAOs - Create DatasetItemSearchCriteriaMapper: centralizes filters + search flag wiring for DatasetItemSearchCriteria, shared by DatasetItemVersionDAO and ExperimentAggregatesDAO - Add ExperimentGroupMappers.applyGroupCriteriaToTemplate: centralizes ExperimentGroupCriteria → ST template wiring, now shared by ExperimentDAO and ExperimentAggregatesDAO - Update DatasetItemVersionDAO.addFiltersToTemplate and bindSearchAndFilters to delegate to DatasetItemSearchCriteriaMapper - Update ExperimentAggregatesDAO.applyDatasetItemFiltersToTemplate and bindDatasetItemSearchParams to delegate to DatasetItemSearchCriteriaMapper - Update ExperimentDAO.newGroupTemplate and ExperimentAggregatesDAO.newGroupTemplate to delegate to ExperimentGroupMappers.applyGroupCriteriaToTemplate * [OPIK-4383] [BE] Add experiment aggregate event listener and no-op publisher * Revision 2: Fix missing import for ExperimentAggregationPublisher * [OPIK-4383] [BE] Add ExperimentAggregationPublisher, ExperimentDenormalizationJob and tests - ExperimentAggregationPublisher: debounces experiment aggregation triggers by writing compound workspaceId:experimentId members to a Redis ZSET scored by expiry timestamp (now + debounceDelay), plus a hash storing the userName with TTL=2×debounceDelay to handle stale entries. - ExperimentDenormalizationJob: @every("5s") job that reads ZSET members with score <= now, publishes ExperimentAggregationMessage to the Redis stream, then cleans up the ZSET entry and hash bucket. Handles stale entries (expired hash) by removing the orphaned ZSET member without publishing. - Fix processExperiment reactive chain: avoided double index.remove by returning Mono<Boolean> from flatMap branches so switchIfEmpty is only triggered when the bucket is truly empty. - ExperimentAggregationPublisherTest: integration tests with real Redis container verifying ZSET membership, score, userName storage, TTL, workspace isolation, and debounce deduplication. - ExperimentDenormalizationJobTest: unit tests with Mockito covering disabled config, lock not acquired, empty ZSET, happy path, stale entry, and batch. * Fix tests setup * [OPIK-4383] [BE] Address PR review: move DAO logs to service layer * [OPIK-4383] [BE] Address PR review: extract shared DAO helper and fix log placement * [OPIK-4383] [BE] Short-circuit deleteByTraceIds when no spans found Skip delete, cascading operations, and SpansDeleted event when getSpanIdsForTraces returns an empty set, preserving the original no-op behaviour and avoiding the Preconditions.checkArgument failure in SpanDAO.deleteByIds. * [OPIK-4383] [BE] Fix cascade deletion failures after trace delete Two bugs prevented spans and attachments from being deleted when a trace was deleted via the event-driven cascade: 1. FeedbackScoreService.deleteByTraceIds/deleteBySpanIds had @nonnull on projectId which threw NPE when TracesDeleted.projectId() was null. EventInterceptor swallowed the NPE, stopping the entire cascade chain. Fix: remove @nonnull since the DAO already handles null safely via Optional.ofNullable(projectId). 2. SpanDAO.DELETE_BY_IDS had the wrong column (trace_id) and parameter name (span_ids) — the ClickHouse R2DBC driver could not resolve :span_ids as a named parameter in the DELETE statement. Fixed by using id IN :ids to match the working pattern in TraceDAO.DELETE_BY_ID. * [OPIK-4383] [BE] Address PR review comments on ExperimentDenormalizationJob - Centralize Redis constants (EXPERIMENT_KEY_PREFIX, USER_NAME_FIELD, MEMBER_SEPARATOR) in ExperimentDenormalizationConfig - Change ExperimentAggregationPublisher.publish() to return Mono<Void> instead of void, so errors propagate to callers - Make job interval configurable via jobs map in config.yml - Fix onErrorContinue logging: remove getMessage() duplication - Demote per-experiment logs from INFO to DEBUG - Add ZSET pagination using expand() to avoid materializing entire range - Update tests for all changes * Fix @every job interval config key casing and add jobs section to test config The dropwizard-jobs framework uses WordUtils.uncapitalize(class.getSimpleName()) to look up the interval in the jobs map, so the key must be 'experimentDenormalizationJob' (lowercase first letter). Also adds the missing jobs section and jobBatchSize to config-test.yml. * Replace @every annotation with programmatic Quartz scheduling Remove @every from ExperimentDenormalizationJob and schedule it programmatically in OpikGuiceyLifecycleEventListener, following the same pattern as TraceThreadsClosingJob. Add jobInterval config field to ExperimentDenormalizationConfig. Remove the jobs YAML section that caused deserialization errors with JobConfiguration's immutable map. * Add experiment context to error log and extract publishIfNotEmpty helper - Include experimentId and workspaceId in onExperimentUpdated error log - Extract publishIfNotEmpty helper to deduplicate filter+publish logic across triggerByExperimentIds, triggerByTraceIds, triggerBySpanIds * Fix NPE in ExperimentAggregateEventListenerTest mock setup Stub publisher.publish() to return Mono.empty() in setUp so .subscribe() calls in production code don't NPE on null. * [OPIK-4385] [BE] Use pre-computed aggregation tables for experiment endpoints Apply UNION ALL hybrid pattern to ExperimentDAO (FIND, FIND_GROUPS, FIND_GROUPS_AGGREGATIONS) and ExperimentItemDAO (STREAM, SELECT_DATASET_ITEM_VERSIONS_WITH_EXPERIMENT_ITEMS_COUNT, SELECT_DATASET_ITEM_VERSIONS_WITH_EXPERIMENT_ITEMS, SELECT_DATASET_ITEM_VERSIONS_WITH_EXPERIMENT_ITEMS_STATS) so that experiments present in experiment_aggregates / experiment_item_aggregates use pre-computed values, while others fall back to live JOIN computation. Add ExperimentAggregatesIntegrationTest covering all 7 affected queries with parameterized filter, pagination, and consistency scenarios. * [OPIK-4384] [BE] Fix missing zero_uuid binding and experiment_scores sort alias - Bind zero_uuid parameter in getById, getByIds, and get(ExperimentStreamRequest) methods that use the FIND query; the UNION ALL refactor introduced an experiments_from_aggregates CTE that requires this parameter but only the main find() method was binding it, causing 500 errors on those paths - Fix SortingQueryBuilder to reference the outer column alias experiment_scores_agg instead of es.experiment_scores; the ORDER BY sits outside the UNION ALL so the inner es alias is out of scope, while experiment_scores_agg is the consistent output alias exposed by both branches * [OPIK-4384] [BE] Fix null row injection from LEFT JOIN miss in feedback_scores and comments aggregation Pre-aggregate feedback_scores_final and comments_final into subqueries (GROUP BY entity_id) before LEFT JOIN in DatasetItemVersionDAO.STREAM. When a LEFT JOIN has no match against a pre-aggregated subquery the joined columns are NULL, so any(NULL) returns NULL instead of a default-valued row with epoch timestamps that caused Instant.parse() failures. Also adds a regression test covering the no-scores path in ExperimentAggregatesIntegrationTest. * [OPIK-4383] [BE] Remove DAO-level log.info from ExperimentAggregatesDAO methods Move operational logging responsibility to the service layer, consistent with earlier fixes for ExperimentItemDAO and SpanDAO in this PR. * Remove accidentally committed doc files These files were introduced during merge resolution but should not be part of the branch. * [OPIK-4383] [BE] refactor: extract triggerAggregation helper to centralize guard+publish flow * [OPIK-4383] [BE] fix: restore TagOperations.tagUpdateFragment in SpanDAO BULK_UPDATE Restores proper tag handling in SpanDAO.BULK_UPDATE query that was regressed to a simple arrayConcat. Now uses TagOperations.tagUpdateFragment() which provides arrayDistinct(), tag limit enforcement (max 50), and tags_to_add/tags_to_remove support. Also adds the required short_circuit_function_evaluation SETTINGS for throwIf evaluation. * Adding InterruptableJob * [OPIK-4383] [BE] Address PR review: expand safety valve, env var prefix - Add batchSize-capped iteration counter to expand() to prevent infinite loops when ZSET entries fail to be removed - Rename EXPERIMENT_DENORM_JOB_INTERVAL to OPIK_EXPERIMENT_DENORM_JOB_INTERVAL to follow the OPIK_ prefix convention * [OPIK-4384] [BE] Add branch optimization and CTE split to experiment queries Use pre-computed experiment_aggregates table to optimize query execution: - Add has_aggregated/has_raw flags to skip unnecessary UNION ALL branches in FIND/FIND_COUNT - Add getAggregationBranchCounts pre-query to determine which branches are needed - Apply CTE split pattern to FIND_GROUPS and FIND_GROUPS_AGGREGATIONS - Update getById to leverage branch optimization via single-ID branch count query - Add <if(id)> filter to SELECT_AGGREGATED_EXPERIMENT_IDS for getById support * [OPIK-4384] [BE] Add missing 7-arg overload for getDatasetItemsWithExperimentItems Fix test compilation error from merge: the remote branch added callers with (UUID, List, null, null, List<SortingField>, String, String) signature which needs a bridge overload to the 9-arg method. * [OPIK-4384] [BE] Add conditional LIMIT push-up, missing CTE, and fix test precision - Add conditional LIMIT push-up in STREAM query: push LIMIT into CTE when only one branch (raw or aggregated) is active for performance - Add missing experiment_item_aggr_trace_scope CTE for aggregated branch - Add AggregatedExperimentCounts record for experiment-level branching - Fix MultiValueFeedbackScoresE2ETest precision assertion: use isEqualTo instead of isEqualByComparingTo to respect custom BigDecimal comparator * [OPIK-4384] [BE] Push OFFSET into top_dataset_items CTE and fix BigDecimal comparator in DatasetsResourceTest * [OPIK-4384] [BE] Add pass rate aggregation to experiment aggregates Add pass_rate, passed_count, and total_count columns to experiment_aggregates table and compute them during aggregation. Update ExperimentDAO queries to select these columns from both raw and aggregated paths, returning NULL for non-evaluation-suite experiments. * Fix get by id * Fix mapping * [OPIK-4384] [BE] Use pre-aggregated comments from aggregate tables with ISO 8601 date formatting Update retrieval queries in ExperimentDAO, DatasetItemVersionDAO, and ExperimentAggregatesDAO to read comments_array_agg as JSON String from aggregate tables instead of live-querying the comments table. Ensure UNION ALL type compatibility by wrapping raw paths with toJSONString() and formatting dates as ISO 8601 for proper Jackson deserialization. * [OPIK-4384] [BE] Use parameterized binding for dynamic sort keys and add deterministic tiebreaker - Replace literal string interpolation in getTopSortExpression with parameterized bind variables (sf.bindKey()) to prevent SQL injection - Remove fieldMapping filter from bindDynamicKeys so all dynamic keys are bound, including those used in the top_sorting SELECT expression - Add deterministic tiebreaker (id DESC / dataset_item_id DESC) to both the push-top-limit CTE and the main ORDER BY for consistent pagination - Fix experiment_items deduplication: use FINAL where DISTINCT was used and vice versa for consistency across query branches * [OPIK-4384] [BE] Add mixed-state aggregation test for UNION ALL hybrid Test creates 3 experiments, aggregates only 1, and queries all 3 to exercise the UNION ALL hybrid path where has_aggregated and has_raw are both true simultaneously. * [OPIK-4384] [BE] Add isNotEmpty assertions to parameterized filter tests Ensure filter scenarios actually match data by asserting content() is not empty before and after aggregation in all parameterized filter tests (find, findGroups, findGroupsAggregations). * [OPIK-4384] [BE] refactor: extract assertion helpers to remove duplication in ExperimentAggregatesIntegrationTest * [OPIK-4384] [BE] refactor: rename parseFlexibleInstant to parseInstant in FeedbackScoreMapper * [OPIK-4384] [BE] Make LIMIT unconditional in FIND query The LIMIT clause was gated on filter/sort flags, so plain paged requests (only limit/offset) at the outer query level would not emit LIMIT. Simplify to always emit LIMIT when the limit parameter is provided. * [OPIK-4384] [BE] Fix comment ordering assertion in tests ClickHouse groupUniqArray does not guarantee ordering, so comment assertions must use ignoringCollectionOrder to avoid flaky failures. * [OPIK-4384] [BE] Add branch conditionals to FIND_GROUPS/FIND_GROUPS_AGGREGATIONS and revert unconditional LIMIT - Wrap SELECT branches in FIND_GROUPS and FIND_GROUPS_AGGREGATIONS with <if(has_aggregated)>/<if(has_raw)> conditionals to skip unnecessary branches when all experiments are aggregated or all are raw - Add no-args getAggregationBranchCounts() overload for workspace-only pre-query (used by group/aggregation queries that lack experiment IDs) - Update executeQueryWithTargetProjects to run both pre-queries in parallel via Mono.zip - Revert commit 215a3f9 (unconditional LIMIT) which caused double LIMIT/OFFSET bug: CTE-level LIMIT + outer LIMIT made page 2+ return 0 results. The complex conditional is correct — outer LIMIT is only needed when post-CTE processing may alter the result set. * [OPIK-4384] [BE] Add branch conditionals to SELECT_DATASET_ITEM_VERSIONS_WITH_EXPERIMENT_ITEMS_COUNT Wrap the UNION ALL in the count query with <if(has_aggregated)>/<if(has_raw)> conditionals to skip unnecessary branches. Pass branch flags through getCountWithExperimentFilters from the existing pre-query results. * [OPIK-4384] [BE] Fix ClickHouse column resolution in COUNT query Alias dataset_item_id as di_id in the COUNT subquery branches to avoid column name ambiguity when ClickHouse 25.3's query analyzer resolves COUNT(DISTINCT dataset_item_id) through a LEFT JOIN with dataset_items_resolved which also has that column. * [OPIK-4384] [BE] Use pre-computed comments in STREAM query and fix UNION ALL type mismatch Aggregated branch now reads comments_array_agg directly from experiment_item_aggregates instead of doing an expensive JOIN to the comments table. Raw branch converts comments to JSON String via toJSONString(CAST(...)) so both branches output compatible types. * [OPIK-4384] [BE] Fix target_project_ids bind error in FIND_GROUPS aggregated branch * [OPIK-4384] [BE] Unify aggregation branch counts with shared ExperimentAggregationSql Extract SELECT_AGGREGATED_EXPERIMENT_IDS SQL and AggregatedExperimentCounts into shared ExperimentAggregationSql utility class. Introduce AggregationBranchCountsCriteria DTO to unify getAggregationBranchCounts overloads across ExperimentDAO, ExperimentItemDAO, and DatasetItemVersionDAO. * [OPIK-4384] [BE] Move getAggregationBranchCounts to ExperimentAggregatesDAO Consolidate aggregation branch counting logic into ExperimentAggregatesDAO instead of a separate utility class. Extract DTOs into their own files in the experiments.aggregations package. * [OPIK-4384] [BE] Deduplicate experiment_aggregates subquery with SELECT DISTINCT Prevent inflated counts from ReplacingMergeTree pre-merge duplicates in the aggregation branch counting query.

@nonnull

…ment by ID (#5579) * [OPIK-4380] [BE] Add experiment aggregates for denormalized metrics - Add experiment_aggregates and experiment_item_aggregates tables - Implement ExperimentAggregatesDAO with population and query methods - Add ExperimentAggregatesService for aggregation management - Refactor DTOs into organized model classes: - ExperimentAggregatesModel: aggregation results - ExperimentEntityData: entity models - ExperimentSourceData: raw source data - ExperimentAggregatesUtils: utilities - Add FEEDBACK_SCORES_AGGREGATED filter strategy for map-based filtering - Add comprehensive integration tests (10/10 passing) - Configure batch size and parallelism settings * [OPIK-4380] [BE] Add MySQL deadlock retry mechanism for concurrent dataset operations Problem: - MySQL deadlock on dataset_version_tags composite PRIMARY KEY (workspace_id, dataset_id, tag) - Occurred during parallel dataset creation in same workspace - Multiple threads inserting "latest" tag for different datasets caused lock contention - Experiments with parallel execution were failing with MySQLTransactionRollbackException Solution: - Add handleOnDeadLocks() method in RetryUtils with: - 5 retry attempts with exponential backoff (250ms to 2s) - 0.5 jitter to reduce thundering herd effect - Recursive isDatabaseDeadlock() detection for MySQLTransactionRollbackException - Apply retry logic in DatasetItemService.setDatasetItemVersion() - Enables concurrent dataset creation for same workspace Impact: - Supports parallel experiment execution with proper deadlock handling - Test success rate improved from 0/10 to 10/10 in ExperimentAggregatesIntegrationTest * Fix visibility * [OPIK-4380] [BE] Address PR review comments for experiment aggregates Fixed 11 automated review comments from baz-reviewer: CRITICAL fixes: - Prevent NPE on null span aggregations by adding coalesce() in SQL - Handle multi-project experiments with LIMIT 1 in GET_PROJECT_ID - Handle zero-item experiments with empty aggregation helpers - Bind feedback_scores_percentiles map instead of empty CAST HIGH priority fixes: - Use toDecimal128(12) instead of toDecimal64(9) for cost percentiles - Add null-safe tags handling with Optional.ofNullable() - Include exception objects in retry logging for stack traces MEDIUM priority fixes: - Add missing log_comment to SELECT_EXPERIMENT_BY_ID query - Add missing log_comment to GET_PROJECT_ID query LOW priority fixes: - Remove duplicate "id" binding in bindItemsParameters - Enhance batchSize config documentation with details All 11 integration tests passing. * [OPIK-4382] [BE] Refactor experiment aggregates with import cleanup and Optional patterns - Add missing imports for IntStream, ProjectStats, and other dependencies - Replace fully-qualified class names with proper imports across DAO and Service classes - Fix IS_NOT_EMPTY filter handling for FEEDBACK_SCORES_AGGREGATED strategies - Refactor null checks to use Optional in mapping methods: - mapFeedbackScoreAggregations, mapExperimentFromAggregates - mapFeedbackScoreData, mapExperimentGroupAggregationItem - Batch insert preparation with Optional chains - Improve code readability and maintainability with functional patterns * [OPIK-4380] [BE] Fix table definition * [OPIK-4380] [BE] Address PR comments and consolidate DatasetItemService methods - Fix tags NPE in ExperimentAggregatesDAO with defaultIfNull - Remove unnecessary FINAL clause from GET_EXPERIMENT_DATA query - Fix test naming in ExperimentAggregatesIntegrationTest - Consolidate 7 duplicate createVersionFromDelta methods into single canonical implementation - Remove debug logger from config-test.yml * [OPIK-4380] [BE] Fix missing log_comment and centralize search criteria binding - Fix SELECT_EXPERIMENT_BY_ID to properly render log_comment metadata - Use getSTWithLogComment pattern in getExperimentFromAggregates - Ensures ClickHouse query logging populates workspace/experiment IDs - Centralize ExperimentSearchCriteria binding logic - Create ExperimentSearchCriteriaBinder utility class - Parameterize filter strategies to support both DAO variants - Eliminate 29-line duplication between ExperimentDAO and ExperimentAggregatesDAO - Single source of truth prevents DAOs from getting out of sync * [OPIK-4380] [BE] Fix createVersionFromDelta consolidation after rebase - Update canonical method signature to include new parameters: - List<EvaluatorItem> evaluators - ExecutionPolicy executionPolicy - boolean clearExecutionPolicy - Update all 5 caller sites to pass new parameters: - Use changes.evaluators(), changes.executionPolicy(), changes.clearExecutionPolicy() when available - Pass null/false for auto-generated versions that inherit from base - Add imports for EvaluatorItem and ExecutionPolicy Fixes compilation errors introduced by rebase with upstream changes to DatasetVersionService * [OPIK-4380] [BE] Address PR review comments - fix type mismatch, extract constants, remove DAO logging - Fixed BigDecimal[] to Double[] conversion for experiment_scores (matches ClickHouse Float64) - Extracted FilterStrategy lists to static final constants to avoid repeated allocations - Added @nonnull validation to populateExperimentAggregate parameter - Removed DAO layer logging, keeping service-level logging only * [OPIK-4380] [BE] Extract shared helper for experiment data pagination Extract streamWithExperimentPagination() helper method to eliminate duplication in getTracesData(), getSpansData(), and getFeedbackScoresData(). All three methods followed identical pattern: - asyncTemplate.stream with connection - getSTWithLogComment with cursor flag - Bind workspace_id, experiment_id, project_id, limit - Optional cursor binding - Result mapping Benefits: - Single source of truth for pagination binding logic - Prevents divergence when tweaking cursor/limit bindings - Reduces code from ~20 lines to ~10 lines per method - Type-safe generic implementation Note: CTE redundancy (3x experiment_items scan) is intentional to avoid passing large trace ID lists as parameters, which would cause performance issues with 10K+ traces. * [OPIK-4383] [BE] Add Redis stream subscriber for debounced experiment aggregates recomputation - Add ExperimentDenormalizationConfig implementing StreamConfiguration with debounce, job lock, and per-experiment aggregation lock settings - Add ExperimentAggregationMessage as stream message record - Add ExperimentAggregatesSubscriber consuming from the denormalization stream; acquires a workspace-scoped distributed lock per experiment before calling populateAggregations() - Add experimentDenormalizationEnabled feature flag to ServiceTogglesConfig and FeatureFlags - Wire ExperimentDenormalizationConfig into OpikConfiguration - Update config.yml and config-test.yml with full experimentDenormalization block (enabled for tests) - Add ExperimentAggregatesSubscriberTest covering lifecycle gating and processEvent success/error paths * Revision 2: Address PR comments - add config defaults, remove toggle, rename tests - ExperimentDenormalizationConfig: add sensible defaults to all fields so Dropwizard validation doesn't fail when the config block is absent from old deployments (config.isEnabled()=false still gates the subscriber) - Remove experimentDenormalizationEnabled service toggle from ServiceTogglesConfig, FeatureFlags, config.yml and config-test.yml - the infrastructure gate (config.isEnabled()) is the single control point - Rename lifecycle test methods to camelCase per project conventions: startSkipsStartupWhenDisabled / stopSkipsShutdownWhenDisabled * Revision 3: Add @max(500) to consumerBatchSize and @NotNull to jobLockWaitTime * [OPIK-4380] [BE] Address PR review comments - fix TYPE_REFERENCE visibility, redundant IN subquery, hardcoded context keys, Instant.now in loop, and inline defaultIfNull * Revision 4: Address remaining JetoPistola review comments (#7, #8, #10) - #7: Remove "Used for testing and verification" from getExperimentFromAggregates javadoc - #8: Replace recursive flatMap with Mono.expand() in populateExperimentItemsInBatches - #10: Remove unrelated subscribeOn addition from DatasetItemService.createVersionFromDelta * Revision 3: Add switchIfEmpty fallback for deleted traces in populateExperimentAggregate * Fix tests * Revision 6: Move countTotal log from DAO to service layer Operational logs belong in the service layer, not the DAO. * Revision 7: Apply Spotless formatting * Revision 8: Make populateAggregations(UUID, int) private Removes the uncapped public batch size entry point. All callers now go through the public no-arg overload which reads batchSize safely from config. * [OPIK-4380] [BE] Add evaluation_method support to experiment_aggregates pipeline - Add ClickHouse migration (000062) to add evaluation_method column to experiment_aggregates table - Add evaluationMethod field to ExperimentData record - Update GET_EXPERIMENT_DATA query to read evaluation_method from experiments - Update INSERT_EXPERIMENT_AGGREGATE to write evaluation_method to experiment_aggregates - Update SELECT_EXPERIMENT_BY_ID to read evaluation_method from experiment_aggregates - Fix Experiment record constructor call: insert EvaluationMethod at correct position (10) * [OPIK-4380] [BE] Extract shared helper for experiment aggregation queries Reduce copy-paste in getTraceAggregations, getSpanAggregations, and getFeedbackScoreAggregations by extracting queryExperimentAggregation, which centralises the context-aware execution, workspace/experiment/project parameter binding, and singleOrEmpty pattern shared by all three methods. * [OPIK-4380] [BE] Enforce non-null contract on countTotal criteria parameter Add @nonnull to ExperimentSearchCriteria in the interface and implementation so that a null argument fails fast with an explicit NullPointerException at the DAO boundary instead of crashing deep inside buildCountTemplate. * [OPIK-4380] [BE] Fix countTotal ignoring target project IDs in normal path target_project_ids was only applied inside the project_deleted LEFT JOIN subquery; the main WHERE had no project restriction, so counts were workspace-wide. Reuse has_target_projects in the main WHERE so project_id IN :target_project_ids always takes effect. Also replace manual null/empty checks with CollectionUtils.isNotEmpty. * [OPIK-4380] [BE] Apply Spotless formatting * [OPIK-4382] [BE] Address PR review comments on experiment aggregates - Fix :versionId → :version_id parameter naming in SQL templates and bindings - Fix last_updated_at binding to use item.lastUpdatedAt() instead of Instant.now() - Fix FEEDBACK_SCORES_AGGREGATED_IS_EMPTY filter: embed generated SQL into templates instead of hard-coded static condition, and add missing bind calls - Fix RetryUtils log duplication (remove getMessage() + pass exception directly) - Add batchSize = 1000 default in ExperimentAggregatesConfig - Extract resolveVersionIdForCriteria helper to deduplicate version-id resolution - Add null/blank/ClickHouse placeholder guards in extractUuidsFromGroupValues - Extract loadEntityMap helper to deduplicate getEnrichInfoHolder enrichment logic * Revision 3: Address PR comments E, F, G, H - Fix E: Extract shared template/bind helpers in ExperimentAggregatesDAO - Fix F: Bind experiment_ids as UUID[] instead of String[] - Fix G+H: Extract getEnrichInfoHolder logic into ExperimentGroupEnricher, eliminating duplication between ExperimentService and ExperimentAggregatesService without introducing a direct dependency between them * Revision 4: Fix ExperimentServiceTest to include ExperimentGroupEnricher mock * [OPIK-4382] [BE] Extract shared Row→ExperimentGroup mappers into ExperimentGroupMappers Pull the duplicated Row→ExperimentGroupItem and Row→ExperimentGroupAggregationItem conversion logic from ExperimentDAO and ExperimentAggregatesDAO into a shared ExperimentGroupMappers utility class. Both DAOs now delegate to the same toExperimentGroupItem / toExperimentGroupAggregationItem helpers, eliminating the need to mirror DTO mapping changes in two places. * [OPIK-4382] [BE] Deduplicate bindGroupCriteria into ExperimentGroupMappers Moves the shared group-criteria binding logic out of ExperimentDAO and ExperimentAggregatesDAO into ExperimentGroupMappers.bindGroupCriteria(), following the same pattern as ExperimentSearchCriteriaBinder. Adding or fixing a criteria binding now only requires a change in one place. * [OPIK-4382] [BE] Extract streamGroupQuery helper and fix null percentiles - Deduplicate findGroups/findGroupsAggregations into a single private streamGroupQuery(queryTemplate, criteria, rowMapper) that differs only by the query constant and BiFunction row mapper. - Fix convertToBigDecimal to return null for null/unsupported inputs so absent p50/p90/p99 entries in getDuration propagate as null rather than BigDecimal.ZERO, preserving the semantic-null that lets callers apply COALESCE/fallback logic correctly. * [OPIK-4382] [BE] Consolidate cost/duration helpers into ExperimentGroupMappers Promote getCostValue and getDuration to public static in ExperimentGroupMappers and delete the private copies in ExperimentDAO. ExperimentDAO.mapToDto now delegates to the shared helpers, so any future change to cost filtering, duration percentile extraction, or the BigDecimal conversion only needs to be made in one place. Side-effect: ExperimentDAO.mapToDto also picks up the null-percentile fix (convertToBigDecimal returns null for absent/unsupported inputs) that was previously applied only to ExperimentGroupMappers. * [OPIK-4382] [BE] Fix pagination count and add criteria filter tests - Remove count() OVER () window function from paged query (returned page-scoped count instead of full result-set count) - Replace with dedicated count query + short-circuit: skip items query when count == 0, use DatasetItemPage.empty() for that case - Extract DatasetItemResultMapper.buildItemFromRow as public static helper reused by ExperimentAggregatesDAO - Add parameterized integration tests for ExperimentGroupCriteria filters (name, types, projectId, combined, empty-result) covering both findGroups and findGroupsAggregations aggregate paths * [OPIK-4380] [BE] Extract shared filter helpers into FilterQueryBuilder Add FilterStrategyParam record, applyFiltersToTemplate and bindFilters static helpers to FilterQueryBuilder, then replace duplicated per-strategy loops in DatasetItemVersionDAO and ExperimentAggregatesDAO with single delegating calls backed by per-DAO strategy constants. * [OPIK-4382] [BE] Consolidate filter helpers in getExperimentItemsStatsFromAggregates Add EXPERIMENT_ITEMS_STATS_FILTER_STRATEGY_PARAMS and EXPERIMENT_ITEMS_STATS_BIND_STRATEGIES constants and replace the per-strategy toAnalyticsDbFilters/bind blocks in getExperimentItemsStatsFromAggregates with single delegating calls to FilterQueryBuilder.applyFiltersToTemplate and bindFilters. * Revision 9: Extract shared helpers to eliminate duplication across DAOs - Create DatasetItemSearchCriteriaMapper: centralizes filters + search flag wiring for DatasetItemSearchCriteria, shared by DatasetItemVersionDAO and ExperimentAggregatesDAO - Add ExperimentGroupMappers.applyGroupCriteriaToTemplate: centralizes ExperimentGroupCriteria → ST template wiring, now shared by ExperimentDAO and ExperimentAggregatesDAO - Update DatasetItemVersionDAO.addFiltersToTemplate and bindSearchAndFilters to delegate to DatasetItemSearchCriteriaMapper - Update ExperimentAggregatesDAO.applyDatasetItemFiltersToTemplate and bindDatasetItemSearchParams to delegate to DatasetItemSearchCriteriaMapper - Update ExperimentDAO.newGroupTemplate and ExperimentAggregatesDAO.newGroupTemplate to delegate to ExperimentGroupMappers.applyGroupCriteriaToTemplate * [OPIK-4383] [BE] Add experiment aggregate event listener and no-op publisher * Revision 2: Fix missing import for ExperimentAggregationPublisher * [OPIK-4383] [BE] Add ExperimentAggregationPublisher, ExperimentDenormalizationJob and tests - ExperimentAggregationPublisher: debounces experiment aggregation triggers by writing compound workspaceId:experimentId members to a Redis ZSET scored by expiry timestamp (now + debounceDelay), plus a hash storing the userName with TTL=2×debounceDelay to handle stale entries. - ExperimentDenormalizationJob: @every("5s") job that reads ZSET members with score <= now, publishes ExperimentAggregationMessage to the Redis stream, then cleans up the ZSET entry and hash bucket. Handles stale entries (expired hash) by removing the orphaned ZSET member without publishing. - Fix processExperiment reactive chain: avoided double index.remove by returning Mono<Boolean> from flatMap branches so switchIfEmpty is only triggered when the bucket is truly empty. - ExperimentAggregationPublisherTest: integration tests with real Redis container verifying ZSET membership, score, userName storage, TTL, workspace isolation, and debounce deduplication. - ExperimentDenormalizationJobTest: unit tests with Mockito covering disabled config, lock not acquired, empty ZSET, happy path, stale entry, and batch. * Fix tests setup * [OPIK-4383] [BE] Address PR review: move DAO logs to service layer * [OPIK-4383] [BE] Address PR review: extract shared DAO helper and fix log placement * [OPIK-4383] [BE] Short-circuit deleteByTraceIds when no spans found Skip delete, cascading operations, and SpansDeleted event when getSpanIdsForTraces returns an empty set, preserving the original no-op behaviour and avoiding the Preconditions.checkArgument failure in SpanDAO.deleteByIds. * [OPIK-4383] [BE] Fix cascade deletion failures after trace delete Two bugs prevented spans and attachments from being deleted when a trace was deleted via the event-driven cascade: 1. FeedbackScoreService.deleteByTraceIds/deleteBySpanIds had @nonnull on projectId which threw NPE when TracesDeleted.projectId() was null. EventInterceptor swallowed the NPE, stopping the entire cascade chain. Fix: remove @nonnull since the DAO already handles null safely via Optional.ofNullable(projectId). 2. SpanDAO.DELETE_BY_IDS had the wrong column (trace_id) and parameter name (span_ids) — the ClickHouse R2DBC driver could not resolve :span_ids as a named parameter in the DELETE statement. Fixed by using id IN :ids to match the working pattern in TraceDAO.DELETE_BY_ID. * [OPIK-4383] [BE] Address PR review comments on ExperimentDenormalizationJob - Centralize Redis constants (EXPERIMENT_KEY_PREFIX, USER_NAME_FIELD, MEMBER_SEPARATOR) in ExperimentDenormalizationConfig - Change ExperimentAggregationPublisher.publish() to return Mono<Void> instead of void, so errors propagate to callers - Make job interval configurable via jobs map in config.yml - Fix onErrorContinue logging: remove getMessage() duplication - Demote per-experiment logs from INFO to DEBUG - Add ZSET pagination using expand() to avoid materializing entire range - Update tests for all changes * Fix @every job interval config key casing and add jobs section to test config The dropwizard-jobs framework uses WordUtils.uncapitalize(class.getSimpleName()) to look up the interval in the jobs map, so the key must be 'experimentDenormalizationJob' (lowercase first letter). Also adds the missing jobs section and jobBatchSize to config-test.yml. * Replace @every annotation with programmatic Quartz scheduling Remove @every from ExperimentDenormalizationJob and schedule it programmatically in OpikGuiceyLifecycleEventListener, following the same pattern as TraceThreadsClosingJob. Add jobInterval config field to ExperimentDenormalizationConfig. Remove the jobs YAML section that caused deserialization errors with JobConfiguration's immutable map. * Add experiment context to error log and extract publishIfNotEmpty helper - Include experimentId and workspaceId in onExperimentUpdated error log - Extract publishIfNotEmpty helper to deduplicate filter+publish logic across triggerByExperimentIds, triggerByTraceIds, triggerBySpanIds * Fix NPE in ExperimentAggregateEventListenerTest mock setup Stub publisher.publish() to return Mono.empty() in setUp so .subscribe() calls in production code don't NPE on null. * [OPIK-4385] [BE] Use pre-computed aggregation tables for experiment endpoints Apply UNION ALL hybrid pattern to ExperimentDAO (FIND, FIND_GROUPS, FIND_GROUPS_AGGREGATIONS) and ExperimentItemDAO (STREAM, SELECT_DATASET_ITEM_VERSIONS_WITH_EXPERIMENT_ITEMS_COUNT, SELECT_DATASET_ITEM_VERSIONS_WITH_EXPERIMENT_ITEMS, SELECT_DATASET_ITEM_VERSIONS_WITH_EXPERIMENT_ITEMS_STATS) so that experiments present in experiment_aggregates / experiment_item_aggregates use pre-computed values, while others fall back to live JOIN computation. Add ExperimentAggregatesIntegrationTest covering all 7 affected queries with parameterized filter, pagination, and consistency scenarios. * [OPIK-4386] [BE] Trigger lazy aggregation via publisher on GET experiment by ID When fetching an experiment by ID, if the experiment is in COMPLETED or CANCELLED state and is not yet present in the experiment_aggregates table, enqueue it for aggregation using ExperimentAggregationPublisher instead of computing aggregations synchronously. The check and publish are performed off the critical path via doOnEach, so the caller receives the experiment immediately without waiting for the side effect to complete. * [OPIK-4384] [BE] Fix missing zero_uuid binding and experiment_scores sort alias - Bind zero_uuid parameter in getById, getByIds, and get(ExperimentStreamRequest) methods that use the FIND query; the UNION ALL refactor introduced an experiments_from_aggregates CTE that requires this parameter but only the main find() method was binding it, causing 500 errors on those paths - Fix SortingQueryBuilder to reference the outer column alias experiment_scores_agg instead of es.experiment_scores; the ORDER BY sits outside the UNION ALL so the inner es alias is out of scope, while experiment_scores_agg is the consistent output alias exposed by both branches * [OPIK-4384] [BE] Fix null row injection from LEFT JOIN miss in feedback_scores and comments aggregation Pre-aggregate feedback_scores_final and comments_final into subqueries (GROUP BY entity_id) before LEFT JOIN in DatasetItemVersionDAO.STREAM. When a LEFT JOIN has no match against a pre-aggregated subquery the joined columns are NULL, so any(NULL) returns NULL instead of a default-valued row with epoch timestamps that caused Instant.parse() failures. Also adds a regression test covering the no-scores path in ExperimentAggregatesIntegrationTest. * [OPIK-4383] [BE] Remove DAO-level log.info from ExperimentAggregatesDAO methods Move operational logging responsibility to the service layer, consistent with earlier fixes for ExperimentItemDAO and SpanDAO in this PR. * Remove accidentally committed doc files These files were introduced during merge resolution but should not be part of the branch. * [OPIK-4383] [BE] refactor: extract triggerAggregation helper to centralize guard+publish flow * [OPIK-4386] [BE] fix: demote lazy aggregation check log to DEBUG * [OPIK-4383] [BE] fix: restore TagOperations.tagUpdateFragment in SpanDAO BULK_UPDATE Restores proper tag handling in SpanDAO.BULK_UPDATE query that was regressed to a simple arrayConcat. Now uses TagOperations.tagUpdateFragment() which provides arrayDistinct(), tag limit enforcement (max 50), and tags_to_add/tags_to_remove support. Also adds the required short_circuit_function_evaluation SETTINGS for throwIf evaluation. * Adding InterruptableJob * [OPIK-4383] [BE] Address PR review: expand safety valve, env var prefix - Add batchSize-capped iteration counter to expand() to prevent infinite loops when ZSET entries fail to be removed - Rename EXPERIMENT_DENORM_JOB_INTERVAL to OPIK_EXPERIMENT_DENORM_JOB_INTERVAL to follow the OPIK_ prefix convention * [OPIK-4384] [BE] Add branch optimization and CTE split to experiment queries Use pre-computed experiment_aggregates table to optimize query execution: - Add has_aggregated/has_raw flags to skip unnecessary UNION ALL branches in FIND/FIND_COUNT - Add getAggregationBranchCounts pre-query to determine which branches are needed - Apply CTE split pattern to FIND_GROUPS and FIND_GROUPS_AGGREGATIONS - Update getById to leverage branch optimization via single-ID branch count query - Add <if(id)> filter to SELECT_AGGREGATED_EXPERIMENT_IDS for getById support * [OPIK-4384] [BE] Add missing 7-arg overload for getDatasetItemsWithExperimentItems Fix test compilation error from merge: the remote branch added callers with (UUID, List, null, null, List<SortingField>, String, String) signature which needs a bridge overload to the 9-arg method. * [OPIK-4384] [BE] Add conditional LIMIT push-up, missing CTE, and fix test precision - Add conditional LIMIT push-up in STREAM query: push LIMIT into CTE when only one branch (raw or aggregated) is active for performance - Add missing experiment_item_aggr_trace_scope CTE for aggregated branch - Add AggregatedExperimentCounts record for experiment-level branching - Fix MultiValueFeedbackScoresE2ETest precision assertion: use isEqualTo instead of isEqualByComparingTo to respect custom BigDecimal comparator * [OPIK-4384] [BE] Push OFFSET into top_dataset_items CTE and fix BigDecimal comparator in DatasetsResourceTest * [OPIK-4384] [BE] Add pass rate aggregation to experiment aggregates Add pass_rate, passed_count, and total_count columns to experiment_aggregates table and compute them during aggregation. Update ExperimentDAO queries to select these columns from both raw and aggregated paths, returning NULL for non-evaluation-suite experiments. * Fix format * Fix get by id * Fix mapping * Fix mapping * [OPIK-4384] [BE] Use pre-aggregated comments from aggregate tables with ISO 8601 date formatting Update retrieval queries in ExperimentDAO, DatasetItemVersionDAO, and ExperimentAggregatesDAO to read comments_array_agg as JSON String from aggregate tables instead of live-querying the comments table. Ensure UNION ALL type compatibility by wrapping raw paths with toJSONString() and formatting dates as ISO 8601 for proper Jackson deserialization. * [OPIK-4386] [BE] Increase debounceDelay in test config to prevent race condition The denormalization job was processing finished experiments during test execution with incomplete ClickHouse data, causing stale aggregated values to be returned instead of fresh raw computations. * [OPIK-4384] [BE] Use parameterized binding for dynamic sort keys and add deterministic tiebreaker - Replace literal string interpolation in getTopSortExpression with parameterized bind variables (sf.bindKey()) to prevent SQL injection - Remove fieldMapping filter from bindDynamicKeys so all dynamic keys are bound, including those used in the top_sorting SELECT expression - Add deterministic tiebreaker (id DESC / dataset_item_id DESC) to both the push-top-limit CTE and the main ORDER BY for consistent pagination - Fix experiment_items deduplication: use FINAL where DISTINCT was used and vice versa for consistency across query branches * [OPIK-4384] [BE] Add mixed-state aggregation test for UNION ALL hybrid Test creates 3 experiments, aggregates only 1, and queries all 3 to exercise the UNION ALL hybrid path where has_aggregated and has_raw are both true simultaneously. * [OPIK-4384] [BE] Add isNotEmpty assertions to parameterized filter tests Ensure filter scenarios actually match data by asserting content() is not empty before and after aggregation in all parameterized filter tests (find, findGroups, findGroupsAggregations). * [OPIK-4384] [BE] refactor: extract assertion helpers to remove duplication in ExperimentAggregatesIntegrationTest * [OPIK-4384] [BE] refactor: rename parseFlexibleInstant to parseInstant in FeedbackScoreMapper * [OPIK-4384] [BE] Make LIMIT unconditional in FIND query The LIMIT clause was gated on filter/sort flags, so plain paged requests (only limit/offset) at the outer query level would not emit LIMIT. Simplify to always emit LIMIT when the limit parameter is provided. * [OPIK-4384] [BE] Fix comment ordering assertion in tests ClickHouse groupUniqArray does not guarantee ordering, so comment assertions must use ignoringCollectionOrder to avoid flaky failures. * [OPIK-4384] [BE] Add branch conditionals to FIND_GROUPS/FIND_GROUPS_AGGREGATIONS and revert unconditional LIMIT - Wrap SELECT branches in FIND_GROUPS and FIND_GROUPS_AGGREGATIONS with <if(has_aggregated)>/<if(has_raw)> conditionals to skip unnecessary branches when all experiments are aggregated or all are raw - Add no-args getAggregationBranchCounts() overload for workspace-only pre-query (used by group/aggregation queries that lack experiment IDs) - Update executeQueryWithTargetProjects to run both pre-queries in parallel via Mono.zip - Revert commit 215a3f9 (unconditional LIMIT) which caused double LIMIT/OFFSET bug: CTE-level LIMIT + outer LIMIT made page 2+ return 0 results. The complex conditional is correct — outer LIMIT is only needed when post-CTE processing may alter the result set. * [OPIK-4384] [BE] Add branch conditionals to SELECT_DATASET_ITEM_VERSIONS_WITH_EXPERIMENT_ITEMS_COUNT Wrap the UNION ALL in the count query with <if(has_aggregated)>/<if(has_raw)> conditionals to skip unnecessary branches. Pass branch flags through getCountWithExperimentFilters from the existing pre-query results. * [OPIK-4384] [BE] Fix ClickHouse column resolution in COUNT query Alias dataset_item_id as di_id in the COUNT subquery branches to avoid column name ambiguity when ClickHouse 25.3's query analyzer resolves COUNT(DISTINCT dataset_item_id) through a LEFT JOIN with dataset_items_resolved which also has that column. * [OPIK-4384] [BE] Use pre-computed comments in STREAM query and fix UNION ALL type mismatch Aggregated branch now reads comments_array_agg directly from experiment_item_aggregates instead of doing an expensive JOIN to the comments table. Raw branch converts comments to JSON String via toJSONString(CAST(...)) so both branches output compatible types. * [OPIK-4384] [BE] Fix target_project_ids bind error in FIND_GROUPS aggregated branch * Update config-test.yml * Update ExperimentService.java * Remove old unused query * [OPIK-4386] [BE] Address PR review comments: demote log to debug, add try-catch for context safety, add getById lazy aggregation tests * [OPIK-4386] [BE] Add workspaceId to lazy aggregation log messages

@nonnull

…ment by ID (#5579) * [OPIK-4380] [BE] Add experiment aggregates for denormalized metrics - Add experiment_aggregates and experiment_item_aggregates tables - Implement ExperimentAggregatesDAO with population and query methods - Add ExperimentAggregatesService for aggregation management - Refactor DTOs into organized model classes: - ExperimentAggregatesModel: aggregation results - ExperimentEntityData: entity models - ExperimentSourceData: raw source data - ExperimentAggregatesUtils: utilities - Add FEEDBACK_SCORES_AGGREGATED filter strategy for map-based filtering - Add comprehensive integration tests (10/10 passing) - Configure batch size and parallelism settings * [OPIK-4380] [BE] Add MySQL deadlock retry mechanism for concurrent dataset operations Problem: - MySQL deadlock on dataset_version_tags composite PRIMARY KEY (workspace_id, dataset_id, tag) - Occurred during parallel dataset creation in same workspace - Multiple threads inserting "latest" tag for different datasets caused lock contention - Experiments with parallel execution were failing with MySQLTransactionRollbackException Solution: - Add handleOnDeadLocks() method in RetryUtils with: - 5 retry attempts with exponential backoff (250ms to 2s) - 0.5 jitter to reduce thundering herd effect - Recursive isDatabaseDeadlock() detection for MySQLTransactionRollbackException - Apply retry logic in DatasetItemService.setDatasetItemVersion() - Enables concurrent dataset creation for same workspace Impact: - Supports parallel experiment execution with proper deadlock handling - Test success rate improved from 0/10 to 10/10 in ExperimentAggregatesIntegrationTest * Fix visibility * [OPIK-4380] [BE] Address PR review comments for experiment aggregates Fixed 11 automated review comments from baz-reviewer: CRITICAL fixes: - Prevent NPE on null span aggregations by adding coalesce() in SQL - Handle multi-project experiments with LIMIT 1 in GET_PROJECT_ID - Handle zero-item experiments with empty aggregation helpers - Bind feedback_scores_percentiles map instead of empty CAST HIGH priority fixes: - Use toDecimal128(12) instead of toDecimal64(9) for cost percentiles - Add null-safe tags handling with Optional.ofNullable() - Include exception objects in retry logging for stack traces MEDIUM priority fixes: - Add missing log_comment to SELECT_EXPERIMENT_BY_ID query - Add missing log_comment to GET_PROJECT_ID query LOW priority fixes: - Remove duplicate "id" binding in bindItemsParameters - Enhance batchSize config documentation with details All 11 integration tests passing. * [OPIK-4382] [BE] Refactor experiment aggregates with import cleanup and Optional patterns - Add missing imports for IntStream, ProjectStats, and other dependencies - Replace fully-qualified class names with proper imports across DAO and Service classes - Fix IS_NOT_EMPTY filter handling for FEEDBACK_SCORES_AGGREGATED strategies - Refactor null checks to use Optional in mapping methods: - mapFeedbackScoreAggregations, mapExperimentFromAggregates - mapFeedbackScoreData, mapExperimentGroupAggregationItem - Batch insert preparation with Optional chains - Improve code readability and maintainability with functional patterns * [OPIK-4380] [BE] Fix table definition * [OPIK-4380] [BE] Address PR comments and consolidate DatasetItemService methods - Fix tags NPE in ExperimentAggregatesDAO with defaultIfNull - Remove unnecessary FINAL clause from GET_EXPERIMENT_DATA query - Fix test naming in ExperimentAggregatesIntegrationTest - Consolidate 7 duplicate createVersionFromDelta methods into single canonical implementation - Remove debug logger from config-test.yml * [OPIK-4380] [BE] Fix missing log_comment and centralize search criteria binding - Fix SELECT_EXPERIMENT_BY_ID to properly render log_comment metadata - Use getSTWithLogComment pattern in getExperimentFromAggregates - Ensures ClickHouse query logging populates workspace/experiment IDs - Centralize ExperimentSearchCriteria binding logic - Create ExperimentSearchCriteriaBinder utility class - Parameterize filter strategies to support both DAO variants - Eliminate 29-line duplication between ExperimentDAO and ExperimentAggregatesDAO - Single source of truth prevents DAOs from getting out of sync * [OPIK-4380] [BE] Fix createVersionFromDelta consolidation after rebase - Update canonical method signature to include new parameters: - List<EvaluatorItem> evaluators - ExecutionPolicy executionPolicy - boolean clearExecutionPolicy - Update all 5 caller sites to pass new parameters: - Use changes.evaluators(), changes.executionPolicy(), changes.clearExecutionPolicy() when available - Pass null/false for auto-generated versions that inherit from base - Add imports for EvaluatorItem and ExecutionPolicy Fixes compilation errors introduced by rebase with upstream changes to DatasetVersionService * [OPIK-4380] [BE] Address PR review comments - fix type mismatch, extract constants, remove DAO logging - Fixed BigDecimal[] to Double[] conversion for experiment_scores (matches ClickHouse Float64) - Extracted FilterStrategy lists to static final constants to avoid repeated allocations - Added @nonnull validation to populateExperimentAggregate parameter - Removed DAO layer logging, keeping service-level logging only * [OPIK-4380] [BE] Extract shared helper for experiment data pagination Extract streamWithExperimentPagination() helper method to eliminate duplication in getTracesData(), getSpansData(), and getFeedbackScoresData(). All three methods followed identical pattern: - asyncTemplate.stream with connection - getSTWithLogComment with cursor flag - Bind workspace_id, experiment_id, project_id, limit - Optional cursor binding - Result mapping Benefits: - Single source of truth for pagination binding logic - Prevents divergence when tweaking cursor/limit bindings - Reduces code from ~20 lines to ~10 lines per method - Type-safe generic implementation Note: CTE redundancy (3x experiment_items scan) is intentional to avoid passing large trace ID lists as parameters, which would cause performance issues with 10K+ traces. * [OPIK-4383] [BE] Add Redis stream subscriber for debounced experiment aggregates recomputation - Add ExperimentDenormalizationConfig implementing StreamConfiguration with debounce, job lock, and per-experiment aggregation lock settings - Add ExperimentAggregationMessage as stream message record - Add ExperimentAggregatesSubscriber consuming from the denormalization stream; acquires a workspace-scoped distributed lock per experiment before calling populateAggregations() - Add experimentDenormalizationEnabled feature flag to ServiceTogglesConfig and FeatureFlags - Wire ExperimentDenormalizationConfig into OpikConfiguration - Update config.yml and config-test.yml with full experimentDenormalization block (enabled for tests) - Add ExperimentAggregatesSubscriberTest covering lifecycle gating and processEvent success/error paths * Revision 2: Address PR comments - add config defaults, remove toggle, rename tests - ExperimentDenormalizationConfig: add sensible defaults to all fields so Dropwizard validation doesn't fail when the config block is absent from old deployments (config.isEnabled()=false still gates the subscriber) - Remove experimentDenormalizationEnabled service toggle from ServiceTogglesConfig, FeatureFlags, config.yml and config-test.yml - the infrastructure gate (config.isEnabled()) is the single control point - Rename lifecycle test methods to camelCase per project conventions: startSkipsStartupWhenDisabled / stopSkipsShutdownWhenDisabled * Revision 3: Add @max(500) to consumerBatchSize and @NotNull to jobLockWaitTime * [OPIK-4380] [BE] Address PR review comments - fix TYPE_REFERENCE visibility, redundant IN subquery, hardcoded context keys, Instant.now in loop, and inline defaultIfNull * Revision 4: Address remaining JetoPistola review comments (#7, #8, #10) - #7: Remove "Used for testing and verification" from getExperimentFromAggregates javadoc - #8: Replace recursive flatMap with Mono.expand() in populateExperimentItemsInBatches - #10: Remove unrelated subscribeOn addition from DatasetItemService.createVersionFromDelta * Revision 3: Add switchIfEmpty fallback for deleted traces in populateExperimentAggregate * Fix tests * Revision 6: Move countTotal log from DAO to service layer Operational logs belong in the service layer, not the DAO. * Revision 7: Apply Spotless formatting * Revision 8: Make populateAggregations(UUID, int) private Removes the uncapped public batch size entry point. All callers now go through the public no-arg overload which reads batchSize safely from config. * [OPIK-4380] [BE] Add evaluation_method support to experiment_aggregates pipeline - Add ClickHouse migration (000062) to add evaluation_method column to experiment_aggregates table - Add evaluationMethod field to ExperimentData record - Update GET_EXPERIMENT_DATA query to read evaluation_method from experiments - Update INSERT_EXPERIMENT_AGGREGATE to write evaluation_method to experiment_aggregates - Update SELECT_EXPERIMENT_BY_ID to read evaluation_method from experiment_aggregates - Fix Experiment record constructor call: insert EvaluationMethod at correct position (10) * [OPIK-4380] [BE] Extract shared helper for experiment aggregation queries Reduce copy-paste in getTraceAggregations, getSpanAggregations, and getFeedbackScoreAggregations by extracting queryExperimentAggregation, which centralises the context-aware execution, workspace/experiment/project parameter binding, and singleOrEmpty pattern shared by all three methods. * [OPIK-4380] [BE] Enforce non-null contract on countTotal criteria parameter Add @nonnull to ExperimentSearchCriteria in the interface and implementation so that a null argument fails fast with an explicit NullPointerException at the DAO boundary instead of crashing deep inside buildCountTemplate. * [OPIK-4380] [BE] Fix countTotal ignoring target project IDs in normal path target_project_ids was only applied inside the project_deleted LEFT JOIN subquery; the main WHERE had no project restriction, so counts were workspace-wide. Reuse has_target_projects in the main WHERE so project_id IN :target_project_ids always takes effect. Also replace manual null/empty checks with CollectionUtils.isNotEmpty. * [OPIK-4380] [BE] Apply Spotless formatting * [OPIK-4382] [BE] Address PR review comments on experiment aggregates - Fix :versionId → :version_id parameter naming in SQL templates and bindings - Fix last_updated_at binding to use item.lastUpdatedAt() instead of Instant.now() - Fix FEEDBACK_SCORES_AGGREGATED_IS_EMPTY filter: embed generated SQL into templates instead of hard-coded static condition, and add missing bind calls - Fix RetryUtils log duplication (remove getMessage() + pass exception directly) - Add batchSize = 1000 default in ExperimentAggregatesConfig - Extract resolveVersionIdForCriteria helper to deduplicate version-id resolution - Add null/blank/ClickHouse placeholder guards in extractUuidsFromGroupValues - Extract loadEntityMap helper to deduplicate getEnrichInfoHolder enrichment logic * Revision 3: Address PR comments E, F, G, H - Fix E: Extract shared template/bind helpers in ExperimentAggregatesDAO - Fix F: Bind experiment_ids as UUID[] instead of String[] - Fix G+H: Extract getEnrichInfoHolder logic into ExperimentGroupEnricher, eliminating duplication between ExperimentService and ExperimentAggregatesService without introducing a direct dependency between them * Revision 4: Fix ExperimentServiceTest to include ExperimentGroupEnricher mock * [OPIK-4382] [BE] Extract shared Row→ExperimentGroup mappers into ExperimentGroupMappers Pull the duplicated Row→ExperimentGroupItem and Row→ExperimentGroupAggregationItem conversion logic from ExperimentDAO and ExperimentAggregatesDAO into a shared ExperimentGroupMappers utility class. Both DAOs now delegate to the same toExperimentGroupItem / toExperimentGroupAggregationItem helpers, eliminating the need to mirror DTO mapping changes in two places. * [OPIK-4382] [BE] Deduplicate bindGroupCriteria into ExperimentGroupMappers Moves the shared group-criteria binding logic out of ExperimentDAO and ExperimentAggregatesDAO into ExperimentGroupMappers.bindGroupCriteria(), following the same pattern as ExperimentSearchCriteriaBinder. Adding or fixing a criteria binding now only requires a change in one place. * [OPIK-4382] [BE] Extract streamGroupQuery helper and fix null percentiles - Deduplicate findGroups/findGroupsAggregations into a single private streamGroupQuery(queryTemplate, criteria, rowMapper) that differs only by the query constant and BiFunction row mapper. - Fix convertToBigDecimal to return null for null/unsupported inputs so absent p50/p90/p99 entries in getDuration propagate as null rather than BigDecimal.ZERO, preserving the semantic-null that lets callers apply COALESCE/fallback logic correctly. * [OPIK-4382] [BE] Consolidate cost/duration helpers into ExperimentGroupMappers Promote getCostValue and getDuration to public static in ExperimentGroupMappers and delete the private copies in ExperimentDAO. ExperimentDAO.mapToDto now delegates to the shared helpers, so any future change to cost filtering, duration percentile extraction, or the BigDecimal conversion only needs to be made in one place. Side-effect: ExperimentDAO.mapToDto also picks up the null-percentile fix (convertToBigDecimal returns null for absent/unsupported inputs) that was previously applied only to ExperimentGroupMappers. * [OPIK-4382] [BE] Fix pagination count and add criteria filter tests - Remove count() OVER () window function from paged query (returned page-scoped count instead of full result-set count) - Replace with dedicated count query + short-circuit: skip items query when count == 0, use DatasetItemPage.empty() for that case - Extract DatasetItemResultMapper.buildItemFromRow as public static helper reused by ExperimentAggregatesDAO - Add parameterized integration tests for ExperimentGroupCriteria filters (name, types, projectId, combined, empty-result) covering both findGroups and findGroupsAggregations aggregate paths * [OPIK-4380] [BE] Extract shared filter helpers into FilterQueryBuilder Add FilterStrategyParam record, applyFiltersToTemplate and bindFilters static helpers to FilterQueryBuilder, then replace duplicated per-strategy loops in DatasetItemVersionDAO and ExperimentAggregatesDAO with single delegating calls backed by per-DAO strategy constants. * [OPIK-4382] [BE] Consolidate filter helpers in getExperimentItemsStatsFromAggregates Add EXPERIMENT_ITEMS_STATS_FILTER_STRATEGY_PARAMS and EXPERIMENT_ITEMS_STATS_BIND_STRATEGIES constants and replace the per-strategy toAnalyticsDbFilters/bind blocks in getExperimentItemsStatsFromAggregates with single delegating calls to FilterQueryBuilder.applyFiltersToTemplate and bindFilters. * Revision 9: Extract shared helpers to eliminate duplication across DAOs - Create DatasetItemSearchCriteriaMapper: centralizes filters + search flag wiring for DatasetItemSearchCriteria, shared by DatasetItemVersionDAO and ExperimentAggregatesDAO - Add ExperimentGroupMappers.applyGroupCriteriaToTemplate: centralizes ExperimentGroupCriteria → ST template wiring, now shared by ExperimentDAO and ExperimentAggregatesDAO - Update DatasetItemVersionDAO.addFiltersToTemplate and bindSearchAndFilters to delegate to DatasetItemSearchCriteriaMapper - Update ExperimentAggregatesDAO.applyDatasetItemFiltersToTemplate and bindDatasetItemSearchParams to delegate to DatasetItemSearchCriteriaMapper - Update ExperimentDAO.newGroupTemplate and ExperimentAggregatesDAO.newGroupTemplate to delegate to ExperimentGroupMappers.applyGroupCriteriaToTemplate * [OPIK-4383] [BE] Add experiment aggregate event listener and no-op publisher * Revision 2: Fix missing import for ExperimentAggregationPublisher * [OPIK-4383] [BE] Add ExperimentAggregationPublisher, ExperimentDenormalizationJob and tests - ExperimentAggregationPublisher: debounces experiment aggregation triggers by writing compound workspaceId:experimentId members to a Redis ZSET scored by expiry timestamp (now + debounceDelay), plus a hash storing the userName with TTL=2×debounceDelay to handle stale entries. - ExperimentDenormalizationJob: @every("5s") job that reads ZSET members with score <= now, publishes ExperimentAggregationMessage to the Redis stream, then cleans up the ZSET entry and hash bucket. Handles stale entries (expired hash) by removing the orphaned ZSET member without publishing. - Fix processExperiment reactive chain: avoided double index.remove by returning Mono<Boolean> from flatMap branches so switchIfEmpty is only triggered when the bucket is truly empty. - ExperimentAggregationPublisherTest: integration tests with real Redis container verifying ZSET membership, score, userName storage, TTL, workspace isolation, and debounce deduplication. - ExperimentDenormalizationJobTest: unit tests with Mockito covering disabled config, lock not acquired, empty ZSET, happy path, stale entry, and batch. * Fix tests setup * [OPIK-4383] [BE] Address PR review: move DAO logs to service layer * [OPIK-4383] [BE] Address PR review: extract shared DAO helper and fix log placement * [OPIK-4383] [BE] Short-circuit deleteByTraceIds when no spans found Skip delete, cascading operations, and SpansDeleted event when getSpanIdsForTraces returns an empty set, preserving the original no-op behaviour and avoiding the Preconditions.checkArgument failure in SpanDAO.deleteByIds. * [OPIK-4383] [BE] Fix cascade deletion failures after trace delete Two bugs prevented spans and attachments from being deleted when a trace was deleted via the event-driven cascade: 1. FeedbackScoreService.deleteByTraceIds/deleteBySpanIds had @nonnull on projectId which threw NPE when TracesDeleted.projectId() was null. EventInterceptor swallowed the NPE, stopping the entire cascade chain. Fix: remove @nonnull since the DAO already handles null safely via Optional.ofNullable(projectId). 2. SpanDAO.DELETE_BY_IDS had the wrong column (trace_id) and parameter name (span_ids) — the ClickHouse R2DBC driver could not resolve :span_ids as a named parameter in the DELETE statement. Fixed by using id IN :ids to match the working pattern in TraceDAO.DELETE_BY_ID. * [OPIK-4383] [BE] Address PR review comments on ExperimentDenormalizationJob - Centralize Redis constants (EXPERIMENT_KEY_PREFIX, USER_NAME_FIELD, MEMBER_SEPARATOR) in ExperimentDenormalizationConfig - Change ExperimentAggregationPublisher.publish() to return Mono<Void> instead of void, so errors propagate to callers - Make job interval configurable via jobs map in config.yml - Fix onErrorContinue logging: remove getMessage() duplication - Demote per-experiment logs from INFO to DEBUG - Add ZSET pagination using expand() to avoid materializing entire range - Update tests for all changes * Fix @every job interval config key casing and add jobs section to test config The dropwizard-jobs framework uses WordUtils.uncapitalize(class.getSimpleName()) to look up the interval in the jobs map, so the key must be 'experimentDenormalizationJob' (lowercase first letter). Also adds the missing jobs section and jobBatchSize to config-test.yml. * Replace @every annotation with programmatic Quartz scheduling Remove @every from ExperimentDenormalizationJob and schedule it programmatically in OpikGuiceyLifecycleEventListener, following the same pattern as TraceThreadsClosingJob. Add jobInterval config field to ExperimentDenormalizationConfig. Remove the jobs YAML section that caused deserialization errors with JobConfiguration's immutable map. * Add experiment context to error log and extract publishIfNotEmpty helper - Include experimentId and workspaceId in onExperimentUpdated error log - Extract publishIfNotEmpty helper to deduplicate filter+publish logic across triggerByExperimentIds, triggerByTraceIds, triggerBySpanIds * Fix NPE in ExperimentAggregateEventListenerTest mock setup Stub publisher.publish() to return Mono.empty() in setUp so .subscribe() calls in production code don't NPE on null. * [OPIK-4385] [BE] Use pre-computed aggregation tables for experiment endpoints Apply UNION ALL hybrid pattern to ExperimentDAO (FIND, FIND_GROUPS, FIND_GROUPS_AGGREGATIONS) and ExperimentItemDAO (STREAM, SELECT_DATASET_ITEM_VERSIONS_WITH_EXPERIMENT_ITEMS_COUNT, SELECT_DATASET_ITEM_VERSIONS_WITH_EXPERIMENT_ITEMS, SELECT_DATASET_ITEM_VERSIONS_WITH_EXPERIMENT_ITEMS_STATS) so that experiments present in experiment_aggregates / experiment_item_aggregates use pre-computed values, while others fall back to live JOIN computation. Add ExperimentAggregatesIntegrationTest covering all 7 affected queries with parameterized filter, pagination, and consistency scenarios. * [OPIK-4386] [BE] Trigger lazy aggregation via publisher on GET experiment by ID When fetching an experiment by ID, if the experiment is in COMPLETED or CANCELLED state and is not yet present in the experiment_aggregates table, enqueue it for aggregation using ExperimentAggregationPublisher instead of computing aggregations synchronously. The check and publish are performed off the critical path via doOnEach, so the caller receives the experiment immediately without waiting for the side effect to complete. * [OPIK-4384] [BE] Fix missing zero_uuid binding and experiment_scores sort alias - Bind zero_uuid parameter in getById, getByIds, and get(ExperimentStreamRequest) methods that use the FIND query; the UNION ALL refactor introduced an experiments_from_aggregates CTE that requires this parameter but only the main find() method was binding it, causing 500 errors on those paths - Fix SortingQueryBuilder to reference the outer column alias experiment_scores_agg instead of es.experiment_scores; the ORDER BY sits outside the UNION ALL so the inner es alias is out of scope, while experiment_scores_agg is the consistent output alias exposed by both branches * [OPIK-4384] [BE] Fix null row injection from LEFT JOIN miss in feedback_scores and comments aggregation Pre-aggregate feedback_scores_final and comments_final into subqueries (GROUP BY entity_id) before LEFT JOIN in DatasetItemVersionDAO.STREAM. When a LEFT JOIN has no match against a pre-aggregated subquery the joined columns are NULL, so any(NULL) returns NULL instead of a default-valued row with epoch timestamps that caused Instant.parse() failures. Also adds a regression test covering the no-scores path in ExperimentAggregatesIntegrationTest. * [OPIK-4383] [BE] Remove DAO-level log.info from ExperimentAggregatesDAO methods Move operational logging responsibility to the service layer, consistent with earlier fixes for ExperimentItemDAO and SpanDAO in this PR. * Remove accidentally committed doc files These files were introduced during merge resolution but should not be part of the branch. * [OPIK-4383] [BE] refactor: extract triggerAggregation helper to centralize guard+publish flow * [OPIK-4386] [BE] fix: demote lazy aggregation check log to DEBUG * [OPIK-4383] [BE] fix: restore TagOperations.tagUpdateFragment in SpanDAO BULK_UPDATE Restores proper tag handling in SpanDAO.BULK_UPDATE query that was regressed to a simple arrayConcat. Now uses TagOperations.tagUpdateFragment() which provides arrayDistinct(), tag limit enforcement (max 50), and tags_to_add/tags_to_remove support. Also adds the required short_circuit_function_evaluation SETTINGS for throwIf evaluation. * Adding InterruptableJob * [OPIK-4383] [BE] Address PR review: expand safety valve, env var prefix - Add batchSize-capped iteration counter to expand() to prevent infinite loops when ZSET entries fail to be removed - Rename EXPERIMENT_DENORM_JOB_INTERVAL to OPIK_EXPERIMENT_DENORM_JOB_INTERVAL to follow the OPIK_ prefix convention * [OPIK-4384] [BE] Add branch optimization and CTE split to experiment queries Use pre-computed experiment_aggregates table to optimize query execution: - Add has_aggregated/has_raw flags to skip unnecessary UNION ALL branches in FIND/FIND_COUNT - Add getAggregationBranchCounts pre-query to determine which branches are needed - Apply CTE split pattern to FIND_GROUPS and FIND_GROUPS_AGGREGATIONS - Update getById to leverage branch optimization via single-ID branch count query - Add <if(id)> filter to SELECT_AGGREGATED_EXPERIMENT_IDS for getById support * [OPIK-4384] [BE] Add missing 7-arg overload for getDatasetItemsWithExperimentItems Fix test compilation error from merge: the remote branch added callers with (UUID, List, null, null, List<SortingField>, String, String) signature which needs a bridge overload to the 9-arg method. * [OPIK-4384] [BE] Add conditional LIMIT push-up, missing CTE, and fix test precision - Add conditional LIMIT push-up in STREAM query: push LIMIT into CTE when only one branch (raw or aggregated) is active for performance - Add missing experiment_item_aggr_trace_scope CTE for aggregated branch - Add AggregatedExperimentCounts record for experiment-level branching - Fix MultiValueFeedbackScoresE2ETest precision assertion: use isEqualTo instead of isEqualByComparingTo to respect custom BigDecimal comparator * [OPIK-4384] [BE] Push OFFSET into top_dataset_items CTE and fix BigDecimal comparator in DatasetsResourceTest * [OPIK-4384] [BE] Add pass rate aggregation to experiment aggregates Add pass_rate, passed_count, and total_count columns to experiment_aggregates table and compute them during aggregation. Update ExperimentDAO queries to select these columns from both raw and aggregated paths, returning NULL for non-evaluation-suite experiments. * Fix format * Fix get by id * Fix mapping * Fix mapping * [OPIK-4384] [BE] Use pre-aggregated comments from aggregate tables with ISO 8601 date formatting Update retrieval queries in ExperimentDAO, DatasetItemVersionDAO, and ExperimentAggregatesDAO to read comments_array_agg as JSON String from aggregate tables instead of live-querying the comments table. Ensure UNION ALL type compatibility by wrapping raw paths with toJSONString() and formatting dates as ISO 8601 for proper Jackson deserialization. * [OPIK-4386] [BE] Increase debounceDelay in test config to prevent race condition The denormalization job was processing finished experiments during test execution with incomplete ClickHouse data, causing stale aggregated values to be returned instead of fresh raw computations. * [OPIK-4384] [BE] Use parameterized binding for dynamic sort keys and add deterministic tiebreaker - Replace literal string interpolation in getTopSortExpression with parameterized bind variables (sf.bindKey()) to prevent SQL injection - Remove fieldMapping filter from bindDynamicKeys so all dynamic keys are bound, including those used in the top_sorting SELECT expression - Add deterministic tiebreaker (id DESC / dataset_item_id DESC) to both the push-top-limit CTE and the main ORDER BY for consistent pagination - Fix experiment_items deduplication: use FINAL where DISTINCT was used and vice versa for consistency across query branches * [OPIK-4384] [BE] Add mixed-state aggregation test for UNION ALL hybrid Test creates 3 experiments, aggregates only 1, and queries all 3 to exercise the UNION ALL hybrid path where has_aggregated and has_raw are both true simultaneously. * [OPIK-4384] [BE] Add isNotEmpty assertions to parameterized filter tests Ensure filter scenarios actually match data by asserting content() is not empty before and after aggregation in all parameterized filter tests (find, findGroups, findGroupsAggregations). * [OPIK-4384] [BE] refactor: extract assertion helpers to remove duplication in ExperimentAggregatesIntegrationTest * [OPIK-4384] [BE] refactor: rename parseFlexibleInstant to parseInstant in FeedbackScoreMapper * [OPIK-4384] [BE] Make LIMIT unconditional in FIND query The LIMIT clause was gated on filter/sort flags, so plain paged requests (only limit/offset) at the outer query level would not emit LIMIT. Simplify to always emit LIMIT when the limit parameter is provided. * [OPIK-4384] [BE] Fix comment ordering assertion in tests ClickHouse groupUniqArray does not guarantee ordering, so comment assertions must use ignoringCollectionOrder to avoid flaky failures. * [OPIK-4384] [BE] Add branch conditionals to FIND_GROUPS/FIND_GROUPS_AGGREGATIONS and revert unconditional LIMIT - Wrap SELECT branches in FIND_GROUPS and FIND_GROUPS_AGGREGATIONS with <if(has_aggregated)>/<if(has_raw)> conditionals to skip unnecessary branches when all experiments are aggregated or all are raw - Add no-args getAggregationBranchCounts() overload for workspace-only pre-query (used by group/aggregation queries that lack experiment IDs) - Update executeQueryWithTargetProjects to run both pre-queries in parallel via Mono.zip - Revert commit 215a3f9 (unconditional LIMIT) which caused double LIMIT/OFFSET bug: CTE-level LIMIT + outer LIMIT made page 2+ return 0 results. The complex conditional is correct — outer LIMIT is only needed when post-CTE processing may alter the result set. * [OPIK-4384] [BE] Add branch conditionals to SELECT_DATASET_ITEM_VERSIONS_WITH_EXPERIMENT_ITEMS_COUNT Wrap the UNION ALL in the count query with <if(has_aggregated)>/<if(has_raw)> conditionals to skip unnecessary branches. Pass branch flags through getCountWithExperimentFilters from the existing pre-query results. * [OPIK-4384] [BE] Fix ClickHouse column resolution in COUNT query Alias dataset_item_id as di_id in the COUNT subquery branches to avoid column name ambiguity when ClickHouse 25.3's query analyzer resolves COUNT(DISTINCT dataset_item_id) through a LEFT JOIN with dataset_items_resolved which also has that column. * [OPIK-4384] [BE] Use pre-computed comments in STREAM query and fix UNION ALL type mismatch Aggregated branch now reads comments_array_agg directly from experiment_item_aggregates instead of doing an expensive JOIN to the comments table. Raw branch converts comments to JSON String via toJSONString(CAST(...)) so both branches output compatible types. * [OPIK-4384] [BE] Fix target_project_ids bind error in FIND_GROUPS aggregated branch * Update config-test.yml * Update ExperimentService.java * Remove old unused query * [OPIK-4386] [BE] Address PR review comments: demote log to debug, add try-catch for context safety, add getById lazy aggregation tests * [OPIK-4386] [BE] Add workspaceId to lazy aggregation log messages

@nonnull

…nts endpoint (#5583) * [OPIK-4380] [BE] Add experiment aggregates for denormalized metrics - Add experiment_aggregates and experiment_item_aggregates tables - Implement ExperimentAggregatesDAO with population and query methods - Add ExperimentAggregatesService for aggregation management - Refactor DTOs into organized model classes: - ExperimentAggregatesModel: aggregation results - ExperimentEntityData: entity models - ExperimentSourceData: raw source data - ExperimentAggregatesUtils: utilities - Add FEEDBACK_SCORES_AGGREGATED filter strategy for map-based filtering - Add comprehensive integration tests (10/10 passing) - Configure batch size and parallelism settings * [OPIK-4380] [BE] Add MySQL deadlock retry mechanism for concurrent dataset operations Problem: - MySQL deadlock on dataset_version_tags composite PRIMARY KEY (workspace_id, dataset_id, tag) - Occurred during parallel dataset creation in same workspace - Multiple threads inserting "latest" tag for different datasets caused lock contention - Experiments with parallel execution were failing with MySQLTransactionRollbackException Solution: - Add handleOnDeadLocks() method in RetryUtils with: - 5 retry attempts with exponential backoff (250ms to 2s) - 0.5 jitter to reduce thundering herd effect - Recursive isDatabaseDeadlock() detection for MySQLTransactionRollbackException - Apply retry logic in DatasetItemService.setDatasetItemVersion() - Enables concurrent dataset creation for same workspace Impact: - Supports parallel experiment execution with proper deadlock handling - Test success rate improved from 0/10 to 10/10 in ExperimentAggregatesIntegrationTest * Fix visibility * [OPIK-4380] [BE] Address PR review comments for experiment aggregates Fixed 11 automated review comments from baz-reviewer: CRITICAL fixes: - Prevent NPE on null span aggregations by adding coalesce() in SQL - Handle multi-project experiments with LIMIT 1 in GET_PROJECT_ID - Handle zero-item experiments with empty aggregation helpers - Bind feedback_scores_percentiles map instead of empty CAST HIGH priority fixes: - Use toDecimal128(12) instead of toDecimal64(9) for cost percentiles - Add null-safe tags handling with Optional.ofNullable() - Include exception objects in retry logging for stack traces MEDIUM priority fixes: - Add missing log_comment to SELECT_EXPERIMENT_BY_ID query - Add missing log_comment to GET_PROJECT_ID query LOW priority fixes: - Remove duplicate "id" binding in bindItemsParameters - Enhance batchSize config documentation with details All 11 integration tests passing. * [OPIK-4382] [BE] Refactor experiment aggregates with import cleanup and Optional patterns - Add missing imports for IntStream, ProjectStats, and other dependencies - Replace fully-qualified class names with proper imports across DAO and Service classes - Fix IS_NOT_EMPTY filter handling for FEEDBACK_SCORES_AGGREGATED strategies - Refactor null checks to use Optional in mapping methods: - mapFeedbackScoreAggregations, mapExperimentFromAggregates - mapFeedbackScoreData, mapExperimentGroupAggregationItem - Batch insert preparation with Optional chains - Improve code readability and maintainability with functional patterns * [OPIK-4380] [BE] Fix table definition * [OPIK-4380] [BE] Address PR comments and consolidate DatasetItemService methods - Fix tags NPE in ExperimentAggregatesDAO with defaultIfNull - Remove unnecessary FINAL clause from GET_EXPERIMENT_DATA query - Fix test naming in ExperimentAggregatesIntegrationTest - Consolidate 7 duplicate createVersionFromDelta methods into single canonical implementation - Remove debug logger from config-test.yml * [OPIK-4380] [BE] Fix missing log_comment and centralize search criteria binding - Fix SELECT_EXPERIMENT_BY_ID to properly render log_comment metadata - Use getSTWithLogComment pattern in getExperimentFromAggregates - Ensures ClickHouse query logging populates workspace/experiment IDs - Centralize ExperimentSearchCriteria binding logic - Create ExperimentSearchCriteriaBinder utility class - Parameterize filter strategies to support both DAO variants - Eliminate 29-line duplication between ExperimentDAO and ExperimentAggregatesDAO - Single source of truth prevents DAOs from getting out of sync * [OPIK-4380] [BE] Fix createVersionFromDelta consolidation after rebase - Update canonical method signature to include new parameters: - List<EvaluatorItem> evaluators - ExecutionPolicy executionPolicy - boolean clearExecutionPolicy - Update all 5 caller sites to pass new parameters: - Use changes.evaluators(), changes.executionPolicy(), changes.clearExecutionPolicy() when available - Pass null/false for auto-generated versions that inherit from base - Add imports for EvaluatorItem and ExecutionPolicy Fixes compilation errors introduced by rebase with upstream changes to DatasetVersionService * [OPIK-4380] [BE] Address PR review comments - fix type mismatch, extract constants, remove DAO logging - Fixed BigDecimal[] to Double[] conversion for experiment_scores (matches ClickHouse Float64) - Extracted FilterStrategy lists to static final constants to avoid repeated allocations - Added @nonnull validation to populateExperimentAggregate parameter - Removed DAO layer logging, keeping service-level logging only * [OPIK-4380] [BE] Extract shared helper for experiment data pagination Extract streamWithExperimentPagination() helper method to eliminate duplication in getTracesData(), getSpansData(), and getFeedbackScoresData(). All three methods followed identical pattern: - asyncTemplate.stream with connection - getSTWithLogComment with cursor flag - Bind workspace_id, experiment_id, project_id, limit - Optional cursor binding - Result mapping Benefits: - Single source of truth for pagination binding logic - Prevents divergence when tweaking cursor/limit bindings - Reduces code from ~20 lines to ~10 lines per method - Type-safe generic implementation Note: CTE redundancy (3x experiment_items scan) is intentional to avoid passing large trace ID lists as parameters, which would cause performance issues with 10K+ traces. * [OPIK-4383] [BE] Add Redis stream subscriber for debounced experiment aggregates recomputation - Add ExperimentDenormalizationConfig implementing StreamConfiguration with debounce, job lock, and per-experiment aggregation lock settings - Add ExperimentAggregationMessage as stream message record - Add ExperimentAggregatesSubscriber consuming from the denormalization stream; acquires a workspace-scoped distributed lock per experiment before calling populateAggregations() - Add experimentDenormalizationEnabled feature flag to ServiceTogglesConfig and FeatureFlags - Wire ExperimentDenormalizationConfig into OpikConfiguration - Update config.yml and config-test.yml with full experimentDenormalization block (enabled for tests) - Add ExperimentAggregatesSubscriberTest covering lifecycle gating and processEvent success/error paths * Revision 2: Address PR comments - add config defaults, remove toggle, rename tests - ExperimentDenormalizationConfig: add sensible defaults to all fields so Dropwizard validation doesn't fail when the config block is absent from old deployments (config.isEnabled()=false still gates the subscriber) - Remove experimentDenormalizationEnabled service toggle from ServiceTogglesConfig, FeatureFlags, config.yml and config-test.yml - the infrastructure gate (config.isEnabled()) is the single control point - Rename lifecycle test methods to camelCase per project conventions: startSkipsStartupWhenDisabled / stopSkipsShutdownWhenDisabled * Revision 3: Add @max(500) to consumerBatchSize and @NotNull to jobLockWaitTime * [OPIK-4380] [BE] Address PR review comments - fix TYPE_REFERENCE visibility, redundant IN subquery, hardcoded context keys, Instant.now in loop, and inline defaultIfNull * Revision 4: Address remaining JetoPistola review comments (#7, #8, #10) - #7: Remove "Used for testing and verification" from getExperimentFromAggregates javadoc - #8: Replace recursive flatMap with Mono.expand() in populateExperimentItemsInBatches - #10: Remove unrelated subscribeOn addition from DatasetItemService.createVersionFromDelta * Revision 3: Add switchIfEmpty fallback for deleted traces in populateExperimentAggregate * Fix tests * Revision 6: Move countTotal log from DAO to service layer Operational logs belong in the service layer, not the DAO. * Revision 7: Apply Spotless formatting * Revision 8: Make populateAggregations(UUID, int) private Removes the uncapped public batch size entry point. All callers now go through the public no-arg overload which reads batchSize safely from config. * [OPIK-4380] [BE] Add evaluation_method support to experiment_aggregates pipeline - Add ClickHouse migration (000062) to add evaluation_method column to experiment_aggregates table - Add evaluationMethod field to ExperimentData record - Update GET_EXPERIMENT_DATA query to read evaluation_method from experiments - Update INSERT_EXPERIMENT_AGGREGATE to write evaluation_method to experiment_aggregates - Update SELECT_EXPERIMENT_BY_ID to read evaluation_method from experiment_aggregates - Fix Experiment record constructor call: insert EvaluationMethod at correct position (10) * [OPIK-4380] [BE] Extract shared helper for experiment aggregation queries Reduce copy-paste in getTraceAggregations, getSpanAggregations, and getFeedbackScoreAggregations by extracting queryExperimentAggregation, which centralises the context-aware execution, workspace/experiment/project parameter binding, and singleOrEmpty pattern shared by all three methods. * [OPIK-4380] [BE] Enforce non-null contract on countTotal criteria parameter Add @nonnull to ExperimentSearchCriteria in the interface and implementation so that a null argument fails fast with an explicit NullPointerException at the DAO boundary instead of crashing deep inside buildCountTemplate. * [OPIK-4380] [BE] Fix countTotal ignoring target project IDs in normal path target_project_ids was only applied inside the project_deleted LEFT JOIN subquery; the main WHERE had no project restriction, so counts were workspace-wide. Reuse has_target_projects in the main WHERE so project_id IN :target_project_ids always takes effect. Also replace manual null/empty checks with CollectionUtils.isNotEmpty. * [OPIK-4380] [BE] Apply Spotless formatting * [OPIK-4382] [BE] Address PR review comments on experiment aggregates - Fix :versionId → :version_id parameter naming in SQL templates and bindings - Fix last_updated_at binding to use item.lastUpdatedAt() instead of Instant.now() - Fix FEEDBACK_SCORES_AGGREGATED_IS_EMPTY filter: embed generated SQL into templates instead of hard-coded static condition, and add missing bind calls - Fix RetryUtils log duplication (remove getMessage() + pass exception directly) - Add batchSize = 1000 default in ExperimentAggregatesConfig - Extract resolveVersionIdForCriteria helper to deduplicate version-id resolution - Add null/blank/ClickHouse placeholder guards in extractUuidsFromGroupValues - Extract loadEntityMap helper to deduplicate getEnrichInfoHolder enrichment logic * Revision 3: Address PR comments E, F, G, H - Fix E: Extract shared template/bind helpers in ExperimentAggregatesDAO - Fix F: Bind experiment_ids as UUID[] instead of String[] - Fix G+H: Extract getEnrichInfoHolder logic into ExperimentGroupEnricher, eliminating duplication between ExperimentService and ExperimentAggregatesService without introducing a direct dependency between them * Revision 4: Fix ExperimentServiceTest to include ExperimentGroupEnricher mock * [OPIK-4382] [BE] Extract shared Row→ExperimentGroup mappers into ExperimentGroupMappers Pull the duplicated Row→ExperimentGroupItem and Row→ExperimentGroupAggregationItem conversion logic from ExperimentDAO and ExperimentAggregatesDAO into a shared ExperimentGroupMappers utility class. Both DAOs now delegate to the same toExperimentGroupItem / toExperimentGroupAggregationItem helpers, eliminating the need to mirror DTO mapping changes in two places. * [OPIK-4382] [BE] Deduplicate bindGroupCriteria into ExperimentGroupMappers Moves the shared group-criteria binding logic out of ExperimentDAO and ExperimentAggregatesDAO into ExperimentGroupMappers.bindGroupCriteria(), following the same pattern as ExperimentSearchCriteriaBinder. Adding or fixing a criteria binding now only requires a change in one place. * [OPIK-4382] [BE] Extract streamGroupQuery helper and fix null percentiles - Deduplicate findGroups/findGroupsAggregations into a single private streamGroupQuery(queryTemplate, criteria, rowMapper) that differs only by the query constant and BiFunction row mapper. - Fix convertToBigDecimal to return null for null/unsupported inputs so absent p50/p90/p99 entries in getDuration propagate as null rather than BigDecimal.ZERO, preserving the semantic-null that lets callers apply COALESCE/fallback logic correctly. * [OPIK-4382] [BE] Consolidate cost/duration helpers into ExperimentGroupMappers Promote getCostValue and getDuration to public static in ExperimentGroupMappers and delete the private copies in ExperimentDAO. ExperimentDAO.mapToDto now delegates to the shared helpers, so any future change to cost filtering, duration percentile extraction, or the BigDecimal conversion only needs to be made in one place. Side-effect: ExperimentDAO.mapToDto also picks up the null-percentile fix (convertToBigDecimal returns null for absent/unsupported inputs) that was previously applied only to ExperimentGroupMappers. * [OPIK-4382] [BE] Fix pagination count and add criteria filter tests - Remove count() OVER () window function from paged query (returned page-scoped count instead of full result-set count) - Replace with dedicated count query + short-circuit: skip items query when count == 0, use DatasetItemPage.empty() for that case - Extract DatasetItemResultMapper.buildItemFromRow as public static helper reused by ExperimentAggregatesDAO - Add parameterized integration tests for ExperimentGroupCriteria filters (name, types, projectId, combined, empty-result) covering both findGroups and findGroupsAggregations aggregate paths * [OPIK-4380] [BE] Extract shared filter helpers into FilterQueryBuilder Add FilterStrategyParam record, applyFiltersToTemplate and bindFilters static helpers to FilterQueryBuilder, then replace duplicated per-strategy loops in DatasetItemVersionDAO and ExperimentAggregatesDAO with single delegating calls backed by per-DAO strategy constants. * [OPIK-4382] [BE] Consolidate filter helpers in getExperimentItemsStatsFromAggregates Add EXPERIMENT_ITEMS_STATS_FILTER_STRATEGY_PARAMS and EXPERIMENT_ITEMS_STATS_BIND_STRATEGIES constants and replace the per-strategy toAnalyticsDbFilters/bind blocks in getExperimentItemsStatsFromAggregates with single delegating calls to FilterQueryBuilder.applyFiltersToTemplate and bindFilters. * Revision 9: Extract shared helpers to eliminate duplication across DAOs - Create DatasetItemSearchCriteriaMapper: centralizes filters + search flag wiring for DatasetItemSearchCriteria, shared by DatasetItemVersionDAO and ExperimentAggregatesDAO - Add ExperimentGroupMappers.applyGroupCriteriaToTemplate: centralizes ExperimentGroupCriteria → ST template wiring, now shared by ExperimentDAO and ExperimentAggregatesDAO - Update DatasetItemVersionDAO.addFiltersToTemplate and bindSearchAndFilters to delegate to DatasetItemSearchCriteriaMapper - Update ExperimentAggregatesDAO.applyDatasetItemFiltersToTemplate and bindDatasetItemSearchParams to delegate to DatasetItemSearchCriteriaMapper - Update ExperimentDAO.newGroupTemplate and ExperimentAggregatesDAO.newGroupTemplate to delegate to ExperimentGroupMappers.applyGroupCriteriaToTemplate * [OPIK-4383] [BE] Add experiment aggregate event listener and no-op publisher * Revision 2: Fix missing import for ExperimentAggregationPublisher * [OPIK-4383] [BE] Add ExperimentAggregationPublisher, ExperimentDenormalizationJob and tests - ExperimentAggregationPublisher: debounces experiment aggregation triggers by writing compound workspaceId:experimentId members to a Redis ZSET scored by expiry timestamp (now + debounceDelay), plus a hash storing the userName with TTL=2×debounceDelay to handle stale entries. - ExperimentDenormalizationJob: @every("5s") job that reads ZSET members with score <= now, publishes ExperimentAggregationMessage to the Redis stream, then cleans up the ZSET entry and hash bucket. Handles stale entries (expired hash) by removing the orphaned ZSET member without publishing. - Fix processExperiment reactive chain: avoided double index.remove by returning Mono<Boolean> from flatMap branches so switchIfEmpty is only triggered when the bucket is truly empty. - ExperimentAggregationPublisherTest: integration tests with real Redis container verifying ZSET membership, score, userName storage, TTL, workspace isolation, and debounce deduplication. - ExperimentDenormalizationJobTest: unit tests with Mockito covering disabled config, lock not acquired, empty ZSET, happy path, stale entry, and batch. * Fix tests setup * [OPIK-4383] [BE] Address PR review: move DAO logs to service layer * [OPIK-4383] [BE] Address PR review: extract shared DAO helper and fix log placement * [OPIK-4383] [BE] Short-circuit deleteByTraceIds when no spans found Skip delete, cascading operations, and SpansDeleted event when getSpanIdsForTraces returns an empty set, preserving the original no-op behaviour and avoiding the Preconditions.checkArgument failure in SpanDAO.deleteByIds. * [OPIK-4383] [BE] Fix cascade deletion failures after trace delete Two bugs prevented spans and attachments from being deleted when a trace was deleted via the event-driven cascade: 1. FeedbackScoreService.deleteByTraceIds/deleteBySpanIds had @nonnull on projectId which threw NPE when TracesDeleted.projectId() was null. EventInterceptor swallowed the NPE, stopping the entire cascade chain. Fix: remove @nonnull since the DAO already handles null safely via Optional.ofNullable(projectId). 2. SpanDAO.DELETE_BY_IDS had the wrong column (trace_id) and parameter name (span_ids) — the ClickHouse R2DBC driver could not resolve :span_ids as a named parameter in the DELETE statement. Fixed by using id IN :ids to match the working pattern in TraceDAO.DELETE_BY_ID. * [OPIK-4383] [BE] Address PR review comments on ExperimentDenormalizationJob - Centralize Redis constants (EXPERIMENT_KEY_PREFIX, USER_NAME_FIELD, MEMBER_SEPARATOR) in ExperimentDenormalizationConfig - Change ExperimentAggregationPublisher.publish() to return Mono<Void> instead of void, so errors propagate to callers - Make job interval configurable via jobs map in config.yml - Fix onErrorContinue logging: remove getMessage() duplication - Demote per-experiment logs from INFO to DEBUG - Add ZSET pagination using expand() to avoid materializing entire range - Update tests for all changes * Fix @every job interval config key casing and add jobs section to test config The dropwizard-jobs framework uses WordUtils.uncapitalize(class.getSimpleName()) to look up the interval in the jobs map, so the key must be 'experimentDenormalizationJob' (lowercase first letter). Also adds the missing jobs section and jobBatchSize to config-test.yml. * Replace @every annotation with programmatic Quartz scheduling Remove @every from ExperimentDenormalizationJob and schedule it programmatically in OpikGuiceyLifecycleEventListener, following the same pattern as TraceThreadsClosingJob. Add jobInterval config field to ExperimentDenormalizationConfig. Remove the jobs YAML section that caused deserialization errors with JobConfiguration's immutable map. * Add experiment context to error log and extract publishIfNotEmpty helper - Include experimentId and workspaceId in onExperimentUpdated error log - Extract publishIfNotEmpty helper to deduplicate filter+publish logic across triggerByExperimentIds, triggerByTraceIds, triggerBySpanIds * Fix NPE in ExperimentAggregateEventListenerTest mock setup Stub publisher.publish() to return Mono.empty() in setUp so .subscribe() calls in production code don't NPE on null. * [OPIK-4385] [BE] Use pre-computed aggregation tables for experiment endpoints Apply UNION ALL hybrid pattern to ExperimentDAO (FIND, FIND_GROUPS, FIND_GROUPS_AGGREGATIONS) and ExperimentItemDAO (STREAM, SELECT_DATASET_ITEM_VERSIONS_WITH_EXPERIMENT_ITEMS_COUNT, SELECT_DATASET_ITEM_VERSIONS_WITH_EXPERIMENT_ITEMS, SELECT_DATASET_ITEM_VERSIONS_WITH_EXPERIMENT_ITEMS_STATS) so that experiments present in experiment_aggregates / experiment_item_aggregates use pre-computed values, while others fall back to live JOIN computation. Add ExperimentAggregatesIntegrationTest covering all 7 affected queries with parameterized filter, pagination, and consistency scenarios. * [OPIK-4386] [BE] Trigger lazy aggregation via publisher on GET experiment by ID When fetching an experiment by ID, if the experiment is in COMPLETED or CANCELLED state and is not yet present in the experiment_aggregates table, enqueue it for aggregation using ExperimentAggregationPublisher instead of computing aggregations synchronously. The check and publish are performed off the critical path via doOnEach, so the caller receives the experiment immediately without waiting for the side effect to complete. * [OPIK-4384] [BE] Fix missing zero_uuid binding and experiment_scores sort alias - Bind zero_uuid parameter in getById, getByIds, and get(ExperimentStreamRequest) methods that use the FIND query; the UNION ALL refactor introduced an experiments_from_aggregates CTE that requires this parameter but only the main find() method was binding it, causing 500 errors on those paths - Fix SortingQueryBuilder to reference the outer column alias experiment_scores_agg instead of es.experiment_scores; the ORDER BY sits outside the UNION ALL so the inner es alias is out of scope, while experiment_scores_agg is the consistent output alias exposed by both branches * [OPIK-4384] [BE] Fix null row injection from LEFT JOIN miss in feedback_scores and comments aggregation Pre-aggregate feedback_scores_final and comments_final into subqueries (GROUP BY entity_id) before LEFT JOIN in DatasetItemVersionDAO.STREAM. When a LEFT JOIN has no match against a pre-aggregated subquery the joined columns are NULL, so any(NULL) returns NULL instead of a default-valued row with epoch timestamps that caused Instant.parse() failures. Also adds a regression test covering the no-scores path in ExperimentAggregatesIntegrationTest. * [OPIK-4383] [BE] Remove DAO-level log.info from ExperimentAggregatesDAO methods Move operational logging responsibility to the service layer, consistent with earlier fixes for ExperimentItemDAO and SpanDAO in this PR. * Remove accidentally committed doc files These files were introduced during merge resolution but should not be part of the branch. * [OPIK-4383] [BE] refactor: extract triggerAggregation helper to centralize guard+publish flow * [OPIK-4386] [BE] fix: demote lazy aggregation check log to DEBUG * [OPIK-4387] [BE] feat: wire aggregation publisher into finishExperiments endpoint Chain experimentAggregationPublisher.publish() after AlertEvent in finishExperiments() so experiments finished via POST /v1/private/experiments/finish are published to Redis for aggregation computation. * [OPIK-4383] [BE] fix: restore TagOperations.tagUpdateFragment in SpanDAO BULK_UPDATE Restores proper tag handling in SpanDAO.BULK_UPDATE query that was regressed to a simple arrayConcat. Now uses TagOperations.tagUpdateFragment() which provides arrayDistinct(), tag limit enforcement (max 50), and tags_to_add/tags_to_remove support. Also adds the required short_circuit_function_evaluation SETTINGS for throwIf evaluation. * [OPIK-4387] [BE] feat: add stream trimming to experiment denormalization XADD Add streamMaxLen and streamTrimLimit configuration to bound Redis stream growth on the experiment denormalization producer (ExperimentDenormalizationJob). Uses Redisson's trimNonStrict().maxLen().limit() API for approximate trimming. * [OPIK-4387] [BE] fix: make aggregation publish best-effort in finishExperiments Swallow and log Redis/publish errors so finishExperiments returns 204 even when Redis is down. Aggregation will be retried by the lazy trigger or next job cycle. * [OPIK-4387] [BE] refactor: centralize Redis stream XADD trimming in RedisStreamUtils Extract duplicate StreamAddArgs.entry().trimNonStrict().maxLen().limit() into RedisStreamUtils.buildAddArgs() so stream trimming settings live in one place. Updates all 5 producers. * [OPIK-4387] [BE] fix: defer aggregation publish and update test for best-effort behavior Wrap aggregation publisher in Mono.defer() so it subscribes only after upstream completes, and update unit test to expect completion instead of error propagation. * Adding InterruptableJob * [OPIK-4383] [BE] Address PR review: expand safety valve, env var prefix - Add batchSize-capped iteration counter to expand() to prevent infinite loops when ZSET entries fail to be removed - Rename EXPERIMENT_DENORM_JOB_INTERVAL to OPIK_EXPERIMENT_DENORM_JOB_INTERVAL to follow the OPIK_ prefix convention * [OPIK-4384] [BE] Add branch optimization and CTE split to experiment queries Use pre-computed experiment_aggregates table to optimize query execution: - Add has_aggregated/has_raw flags to skip unnecessary UNION ALL branches in FIND/FIND_COUNT - Add getAggregationBranchCounts pre-query to determine which branches are needed - Apply CTE split pattern to FIND_GROUPS and FIND_GROUPS_AGGREGATIONS - Update getById to leverage branch optimization via single-ID branch count query - Add <if(id)> filter to SELECT_AGGREGATED_EXPERIMENT_IDS for getById support * [OPIK-4384] [BE] Add missing 7-arg overload for getDatasetItemsWithExperimentItems Fix test compilation error from merge: the remote branch added callers with (UUID, List, null, null, List<SortingField>, String, String) signature which needs a bridge overload to the 9-arg method. * [OPIK-4384] [BE] Add conditional LIMIT push-up, missing CTE, and fix test precision - Add conditional LIMIT push-up in STREAM query: push LIMIT into CTE when only one branch (raw or aggregated) is active for performance - Add missing experiment_item_aggr_trace_scope CTE for aggregated branch - Add AggregatedExperimentCounts record for experiment-level branching - Fix MultiValueFeedbackScoresE2ETest precision assertion: use isEqualTo instead of isEqualByComparingTo to respect custom BigDecimal comparator * [OPIK-4384] [BE] Push OFFSET into top_dataset_items CTE and fix BigDecimal comparator in DatasetsResourceTest * [OPIK-4384] [BE] Add pass rate aggregation to experiment aggregates Add pass_rate, passed_count, and total_count columns to experiment_aggregates table and compute them during aggregation. Update ExperimentDAO queries to select these columns from both raw and aggregated paths, returning NULL for non-evaluation-suite experiments. * Fix format * Fix get by id * Fix mapping * Fix mapping * [OPIK-4384] [BE] Use pre-aggregated comments from aggregate tables with ISO 8601 date formatting Update retrieval queries in ExperimentDAO, DatasetItemVersionDAO, and ExperimentAggregatesDAO to read comments_array_agg as JSON String from aggregate tables instead of live-querying the comments table. Ensure UNION ALL type compatibility by wrapping raw paths with toJSONString() and formatting dates as ISO 8601 for proper Jackson deserialization. * [OPIK-4386] [BE] Increase debounceDelay in test config to prevent race condition The denormalization job was processing finished experiments during test execution with incomplete ClickHouse data, causing stale aggregated values to be returned instead of fresh raw computations. * [OPIK-4384] [BE] Use parameterized binding for dynamic sort keys and add deterministic tiebreaker - Replace literal string interpolation in getTopSortExpression with parameterized bind variables (sf.bindKey()) to prevent SQL injection - Remove fieldMapping filter from bindDynamicKeys so all dynamic keys are bound, including those used in the top_sorting SELECT expression - Add deterministic tiebreaker (id DESC / dataset_item_id DESC) to both the push-top-limit CTE and the main ORDER BY for consistent pagination - Fix experiment_items deduplication: use FINAL where DISTINCT was used and vice versa for consistency across query branches * [OPIK-4384] [BE] Add mixed-state aggregation test for UNION ALL hybrid Test creates 3 experiments, aggregates only 1, and queries all 3 to exercise the UNION ALL hybrid path where has_aggregated and has_raw are both true simultaneously. * [OPIK-4384] [BE] Add isNotEmpty assertions to parameterized filter tests Ensure filter scenarios actually match data by asserting content() is not empty before and after aggregation in all parameterized filter tests (find, findGroups, findGroupsAggregations). * [OPIK-4384] [BE] refactor: extract assertion helpers to remove duplication in ExperimentAggregatesIntegrationTest * [OPIK-4384] [BE] refactor: rename parseFlexibleInstant to parseInstant in FeedbackScoreMapper * [OPIK-4384] [BE] Make LIMIT unconditional in FIND query The LIMIT clause was gated on filter/sort flags, so plain paged requests (only limit/offset) at the outer query level would not emit LIMIT. Simplify to always emit LIMIT when the limit parameter is provided. * [OPIK-4384] [BE] Fix comment ordering assertion in tests ClickHouse groupUniqArray does not guarantee ordering, so comment assertions must use ignoringCollectionOrder to avoid flaky failures. * [OPIK-4384] [BE] Add branch conditionals to FIND_GROUPS/FIND_GROUPS_AGGREGATIONS and revert unconditional LIMIT - Wrap SELECT branches in FIND_GROUPS and FIND_GROUPS_AGGREGATIONS with <if(has_aggregated)>/<if(has_raw)> conditionals to skip unnecessary branches when all experiments are aggregated or all are raw - Add no-args getAggregationBranchCounts() overload for workspace-only pre-query (used by group/aggregation queries that lack experiment IDs) - Update executeQueryWithTargetProjects to run both pre-queries in parallel via Mono.zip - Revert commit 215a3f9 (unconditional LIMIT) which caused double LIMIT/OFFSET bug: CTE-level LIMIT + outer LIMIT made page 2+ return 0 results. The complex conditional is correct — outer LIMIT is only needed when post-CTE processing may alter the result set. * [OPIK-4384] [BE] Add branch conditionals to SELECT_DATASET_ITEM_VERSIONS_WITH_EXPERIMENT_ITEMS_COUNT Wrap the UNION ALL in the count query with <if(has_aggregated)>/<if(has_raw)> conditionals to skip unnecessary branches. Pass branch flags through getCountWithExperimentFilters from the existing pre-query results. * [OPIK-4384] [BE] Fix ClickHouse column resolution in COUNT query Alias dataset_item_id as di_id in the COUNT subquery branches to avoid column name ambiguity when ClickHouse 25.3's query analyzer resolves COUNT(DISTINCT dataset_item_id) through a LEFT JOIN with dataset_items_resolved which also has that column. * [OPIK-4384] [BE] Use pre-computed comments in STREAM query and fix UNION ALL type mismatch Aggregated branch now reads comments_array_agg directly from experiment_item_aggregates instead of doing an expensive JOIN to the comments table. Raw branch converts comments to JSON String via toJSONString(CAST(...)) so both branches output compatible types. * [OPIK-4384] [BE] Fix target_project_ids bind error in FIND_GROUPS aggregated branch * Fix issues * [OPIK-4387] [BE] Fix missing closing brace in ExperimentServiceTest

* design doc * FE communication and ERD additions * ui reporting events in flow * changes * add reason to TrialItemRun * [NA] [SDK] feat: add greenfield optimization framework package Implements a new optimization framework (`apps/opik-optimizer`) that decouples optimizer algorithms from experiment execution, persistence, and UI concerns. Integrates via the existing optimization studio pipeline (Redis queue → Python backend → subprocess). Key components: - Orchestrator: central lifecycle controller with sampler, validator, materializer, result aggregator, and event emitter - StupidOptimizer: 2-step test optimizer (3 candidates → best → 2 more) - EvaluationAdapter: wraps SDK evaluate_optimization_suite_trial() - Backend integration: new Redis queue, framework_optimizer job processor, framework_runner subprocess entry point Also adds evaluate_optimization_suite_trial() to the Python SDK, combining optimization trial linkage with evaluation suite behavior (evaluators and execution policy from the dataset). 53 unit + integration tests passing. Verified end-to-end against Comet cloud with real LLM calls, UI progress chart, prompt display, and score tracking. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Adjustments for UI and framework review * fix: address PR review comments - dict access bug and theme color - Fix AttributeError in framework_runner.py: dataset.get_items() returns dicts, use item["id"] instead of item.id - Fix hard-coded hex color in TrialPassedCell.tsx: use text-success CSS class instead of text-[#12B76A] for proper dark theme support Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address remaining PR review comments - Add opik:optimizer-framework to default RQ queue names so framework jobs actually get consumed by workers - Add dataset size guard in orchestrator before sample_split to provide a clear error message for datasets with fewer than 2 items - Extract shared optimizer_job_helper.py to deduplicate identical logic between optimizer.py and framework_optimizer.py - Extract checkIsEvaluationSuite helper in optimizations.ts to deduplicate predicate shared between CompareTrialsPage and useCompareOptimizationsData - Fix hardcoded "pass_rate" in experiment_executor.py to use the actual metric_type parameter Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: separate experiment scores from feedback scores and handle single-item datasets Splits the combined feedback/experiment scores into distinct fields in the Optimization API and DAO so the frontend can fall back to experiment_scores when feedback_scores lack the objective. Allows single-item datasets by returning a train-only split instead of raising. Extracts shared runner environment setup into runner_common.py. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: extract shared getBestOptimizationScore helper to deduplicate logic Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: evaluate baseline on full dataset instead of validation split only The baseline was evaluated on split.validation_item_ids, which with an 80/20 split ratio meant only 1 out of 5 items was used. This gave an unrepresentative baseline score. Now uses the full dataset_item_ids list. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: enrich GEPA experiment metadata for optimization visualization Add rich metadata to each experiment so the UI can aggregate and visualize the optimization trajectory. Key changes: - step_index increments only when candidate changes (not per eval) - candidate_id is stable across re-evaluations of the same prompt - parent_candidate_ids always set correctly for derived candidates - New metadata fields: batch_index, num_items, capture_traces, eval_purpose - Refactor optimizer package: protocol + factory pattern for registration - Add GEPA adapter bridging GEPA callbacks to framework metadata - Fix BE tests for experimentScores null and queue routing - Add docs: ADDING_AN_OPTIMIZER.md and GEPA_IMPLEMENTATION.md Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address PR review comments and simplify optimizer factory - Remove register_optimizer public API and OptimizerFactory class; replace with a simple dict in _load_registry() - framework_runner: avoid holding full dataset items in memory - Update docs and tests to match simplified factory Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: lineage-based step_index and parent_candidate_ids for GEPA experiments - Replace sequential step_index counter with parent-lineage derivation (max parent step + 1), so all re-evaluations of the same candidate share the same step_index - Ensure every non-baseline experiment carries parent_candidate_ids, enabling the UI to draw lineage graphs - Pass batch_index, num_items, capture_traces, and eval_purpose through to experiment metadata for richer visualization - Revert runner scripts to direct invocation (remove runner_common.py) - Update unit tests to match new metadata contract Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: remove unused config_hash and merge event emitters - Remove canonical_config_hash from Candidate and TrialResult types, candidate_materializer, experiment_executor, and all tests - Delete util/hashing.py module (unused — GEPA does minibatching so config-hash dedup would block valid re-evaluations) - Merge SdkEventEmitter and LoggingEventEmitter into a single EventEmitter class with optional optimization_id - Update GEPA_IMPLEMENTATION.md to reflect parent_ids tracking fixes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: make CandidateConfig a plain dict and pass baseline_config through context - Replace CandidateConfig dataclass with dict[str, Any] type alias - Add baseline_config field to OptimizationContext (caller-provided, opaque) - Orchestrator passes baseline_config through without knowing its structure - Optimizers copy baseline_config and override prompt_messages only - Remove result_aggregator module (inlined into evaluation_adapter) - Move gepa imports to runtime (lazy) for optional dependency - Fix protocol.py training_set/validation_set types to list[dict] - Update ADDING_AN_OPTIMIZER.md to reflect all changes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: move gepa tests to library_integration to avoid unit suite dependency on gepa The gepa tests patch gepa.core.adapter.EvaluationBatch and gepa.optimize, requiring the optional gepa package at import time. Moving them to tests/library_integration/gepa/ with pytest.importorskip("gepa") keeps the unit suite fast and dependency-free. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: remove event_emitter from optimizer interface, auto-emit step progress Optimizers no longer receive or call event_emitter directly. The EvaluationAdapter now auto-detects step_index changes during evaluate() and emits on_step_started internally. GEPAProgressCallback simplified to only forward GEPA events to the adapter. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * test: assert on actual log messages in event emitter tests Use caplog to verify logger.info output includes optimization ID and event details, instead of just checking calls don't crash. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: set evaluation_method on optimizer trial experiments for correct UI detection evaluate_optimization_suite_trial was creating experiments without evaluation_method="evaluation_suite", causing the backend to default to "dataset". The frontend checkIsEvaluationSuite now uses the explicit evaluation_method field instead of heuristic score detection. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: validate dataset is evaluation suite before running suite evaluation Adds a guard to evaluate_suite and evaluate_optimization_suite_trial that checks dataset.dataset_type == "evaluation_suite" before proceeding. This prevents silently running an ineffective suite trial on a plain dataset with no scoring rules. - Add dataset_type param to Dataset constructor, populated at all call sites - Add dataset_type property to Dataset - Add _validate_dataset_is_evaluation_suite in evaluator.py - Update tests and add rejection test Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: extract _run_suite_evaluation to deduplicate suite evaluation flow evaluate_suite and evaluate_optimization_suite_trial had their entire body duplicated. Extract shared logic into _run_suite_evaluation, parameterized by optimization_id and dataset filters. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [OPIK-4727] [FE][BE] feat: optimization studio UI improvements Comprehensive face-lift for optimizer screens including new KPI cards, metric comparison cells, configuration diff views, progress charts, trial status indicators, and backend dataset_item_count support. Also adds backward compatibility for SDK-based optimizations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [OPIK-4727] [FE] feat: optimizer screens face-lift - Dataset name column: hover icon instead of clickable link - Split Accuracy into Pass rate + Accuracy columns with compact metric display - Conditionally hide Accuracy column when no old-type optimizations exist - Remove Logs/Configuration tabs from single optimization page - Fall back to studio_config for configuration display on old optimizations - Chart tooltip: remove pass rate percentage background color - Fix dataset hover icon vertical centering - Restore feature toggle for optimization studio Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [OPIK-4727] [FE] fix: center trend arrow icons and rename tooltip label - Fix arrow icon vertical centering in compact metric Tag - Rename "Avg. runtime cost" to "Runtime cost" in chart tooltip Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [OPIK-4727] [FE] fix: polish optimizer screens UI consistency - Fix chart tooltip background (use --background instead of --popover) - Align column types with correct icons (cost, duration, numberDictionary) - Align KPI card icons to match table column type icons - Lowercase labels: Evaluation results, Best configuration, Runtime cost, Opt. cost, Optimization cost - Darken success green color for better readability - Remove Traces KPI card from trial view Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [OPIK-4687] [SDK] feat: GEPA v2 optimizer with reflection-based prompt evolution (#5547) * [OPIK-4687] [SDK] feat: integrate GEPA v2 optimizer into framework Add GepaV2Optimizer that delegates to the external gepa library (v0.1.0+) for genetic-Pareto prompt optimization. Includes adapter bridging GEPA's evaluate/reflect interface to the framework's EvaluationAdapter, lifecycle event tracking via callbacks, result caching, and a reflection prompt that encourages generalizable instructions while preserving template variables. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(gepa-v2): improve reflection feedback with structured assertions and dynamic inputs - Extract template variables from prompt messages for dynamic input field mapping - Store per-assertion structure (name, value, reason) instead of flat reason strings - Show only failed assertions in reflection feedback for focused improvement signals Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(gepa-v2): adapter reflection control, FE chart filtering, experiment typing - Move reflection to adapter's propose_new_texts with custom prompt template - Use msg["name"] as candidate key when provided, fallback to {role}_{index} - Strip echoed parameter prefix from reflection LLM output - Disable GEPA evaluation cache so validations produce full-dataset experiments - Tag exploration evals as mini-batch, only baseline/init/validation as trial - FE: filter mini-batch experiments from optimization progress chart - FE: show individual assertion score columns alongside "passed" for eval suites - Update E2E script: no dataset split, max_candidates=10, reflection log capture Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(gepa-v2): improve reflection quality with structured feedback and template filtering - Show FAILED and PASSED assertions separately in reflection feedback - Keep worst run per item (most failed assertions) for reflection - Sort reflective dataset records by failure count (most failures first) - Exclude template-only messages (e.g. {question}) from GEPA seed candidate - Rewrite reflection prompt: focus on failures, preserve what works, 500-word limit Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(gepa-v2): classify experiment type by batch size, not eval purpose The purpose-based classification was unreliable: GEPA calls evaluate() with capture_traces=False for both full validations and minibatch evaluations of new candidates, making them indistinguishable by purpose. Now records the full dataset size on the first evaluate call (initialization) and classifies any call with fewer items as mini-batch. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(gepa-v2): improve scoring, stopping, and reflection quality - Use mean instead of min for per-item assertion scores, giving GEPA granular signal instead of binary 0/1 - Track total_runs/passed_runs per item so reflection prompt shows whether failures are consistent or intermittent - Stop on trial.score (framework experiment score) instead of GEPA's internal mean, so pass_threshold semantics are respected - Rewrite reflection template with 4-step structure: diagnose, keep what works, write assertion-matched rules, generalize - Increase max_metric_calls multiplier to 5x for deeper exploration Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(gepa-v2): hide mini-batch trials from table, use domain-neutral examples - Filter mini-batch experiments from the trials table rows so only full evaluation trials are shown - Replace customer-support-specific examples in the reflection template with domain-neutral ones Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(gepa-v2): show all runs per item in reflection feedback, log rendered prompt Previously kept only the worst run per item for reflection. Now all runs are preserved and shown separately (Run 1/3, Run 2/3, etc.) so the reflection LLM can see what varies across attempts. Also captures the fully rendered reflection prompt in the reflection log for debugging. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(gepa-v2): consolidate runs per input in reflection dataset, label assertion/reason Consolidate multiple runs for the same input into a single record with a Runs field and per-item Summary (pass count + consistent failures). This eliminates input duplication (~40% token savings) and makes cross-run comparison trivial. Also separates Assertion/Reason onto labeled lines for clearer parsing by the reflection LLM. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * test(gepa-v2): add feedback format coverage for reflection dataset Add tests for single-run flat keys, multi-run Assertion/Reason labels in Runs field, and failed assertions with empty reason. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(gepa-v2): tell reflection LLM that examples are sorted by priority Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(gepa-v2): use flat config instead of prompt_messages GEPA adapter now works with flat dict[str, str] candidates instead of knowing about message roles. baseline_config is the single source of truth with system_prompt and user_message keys. Added LLMChatTask that constructs LLM messages from flat config keys, replacing the prompt_messages reconstruction path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(gepa-v2): use TrialResult.config instead of prompt_messages, fix validator and UI prompt display Replace TrialResult.prompt_messages with TrialResult.config so config is the single source of truth. Update candidate_validator to accept flat message keys (system_prompt, user_message) in addition to prompt_messages. Populate experiment metadata "prompt" from flat keys so the UI displays prompts correctly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(gepa-v2): add optimizable_keys to OptimizationContext Replace hardcoded PROMPT_KEYS in GepaV2Optimizer with context.optimizable_keys so the caller explicitly controls which baseline config keys get optimized. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(gepa-v2): add failure-aware batch sampler for minibatch item selection Replace GEPA's default uniform sampler with FailureAwareBatchSampler that guarantees failed items from the last full eval appear in subsequent minibatches, giving the reflection LLM actionable signal instead of wasting iterations on easy items. Parameters: min_failed_per_batch, min_unseen_per_batch, failure_threshold. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(gepa-v2): strict types in sampler, worst-first failed selection, update implementation doc Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(experiment): surface optimizable keys in experiment configuration Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(gepa-v2): failure streak tracking, history-aware reflection prompt, optimizable keys in config - Track per-item failure streaks and failing assertion names in sampler - Annotate reflective dataset records with "Failure History" for stuck items - Rewrite reflection prompt: failure history step, structured output, topic headers - Surface optimizable_keys in experiment config and baseline evaluation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(gepa-v2): balanced reflection prompt, lower failure history threshold - Rewrite reflection prompt to balance conservative and aggressive approaches: preserve working rules while encouraging grouped topic headers (## Empathy, ## Resolution, etc.) instead of flat numbered lists - Lower failure history threshold from streak >= 2 to >= 1 so the reflection LLM sees failure context from the first repeated failure - Guard failure history annotation with `if stuck` to avoid empty annotations - Relax "3 unreturned callbacks" assertion to "multiple unreturned callbacks" (the exact-count version was too brittle for gpt-4o-mini to satisfy reliably) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(gepa-v2): balanced 50/50 minibatch sampling, 20-item e2e suite Balanced sampling: split minibatches ~50/50 between failed (worst-first) and passed (random) items. Previously batches were almost entirely failed items, causing the reflection LLM to over-correct and regress passing behaviors (catastrophic 0.0 scores). Passed items now act as behavioral anchors. - Remove unseen item tracking (mark_seen, min_unseen_per_batch) - Default min_failed_per_batch=1 (was batch_size-1) - Minimum reflection_minibatch_size=4 (ensures 2+2 split) - Redesign e2e suite: 20 items (5 easy, 7 medium, 8 hard) - Fix contradicting assertions (hedging language vs no promises) - Remove impossible assertions (specific loyalty benefits) - Add problematic items summary to reflection log - Save reflection log from orchestrator finally block - Update GEPA_IMPLEMENTATION.md Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(gepa-v2): extract collaborators from adapter with DI Split FrameworkGEPAAdapter into three injectable collaborators: - CandidateTracker: candidate identity, parent lineage, GEPA index mapping - ReflectiveDatasetBuilder: feedback dataset construction for reflection LLM - ReflectionProposer: reflection LLM interaction and logging The adapter is now a thin facade (~300 lines, down from 664) that orchestrates evaluation and delegates to collaborators. Compatibility properties ensure all existing tests pass unchanged. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(gepa-v2): move reflection template to ReflectionProposer Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(gepa-v2): task-agnostic reflection template with prompt descriptions and sibling awareness Rewrite the reflection template to be domain-neutral, add optional prompt_descriptions to OptimizationContext so the reflection LLM understands what each parameter does, and include sibling parameter context so the LLM knows what other params exist without modifying them. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs(gepa-v2): update implementation doc and reflection prompt algorithm Update GEPA_IMPLEMENTATION.md with prompt descriptions, sibling awareness, and task-agnostic template details. Rewrite REFLECTION_PROMPT_EXAMPLE.md to document the full reflection prompt assembly algorithm with a rendered example showing the new header format. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(gepa-v2): robust header stripping, markdown formatting in reflection template The LLM sometimes echoes header metadata (Parameter:, Description:, param name) in reformulated form. Replace exact-prefix matching with line-by-line stripping of metadata patterns. Add IMPORTANT instruction to not include metadata in output. Request markdown ## headers in STEP 4. Add 11 unit tests for ReflectionProposer: header stripping edge cases, build_header with/without descriptions, template content assertions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * revert(fe): remove debug FE changes (mini-batch filtering, column reorder) These were temporary UI tweaks for debugging the GEPA v2 optimizer. They'll be re-implemented properly in a separate FE PR. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(gepa-v2): preserve template variables in reflection prompt Instruct the reflection LLM to keep all template variables (e.g. {var}, {{var}}, <var>, {% var %}) intact during prompt rewriting. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: rename gepa_v2 to gepa, clean up OptimizationContext - Rename gepa_v2/ -> gepa/ (now the primary optimizer) - Rename gepa/ -> gepa_old/ (legacy optimizer) - Rename GepaV2Optimizer -> GepaOptimizer - Rename GepaOptimizer -> GepaLegacyOptimizer - Remove unused fields from OptimizationContext: prompt_messages, metric_parameters, model_parameters - Rename prompt_descriptions -> config_descriptions - Delete SimpleOptimizer and its tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add configurable split_strategy to OptimizationContext Add split_strategy field ("80_20" default, "no_split" for GEPA) so the orchestrator handles dataset splitting instead of individual optimizers. Remove internal train+val dedup logic from GepaOptimizer. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(gepa): clean up adapter API and fix code quality issues - Make adapter facade properties public (remove underscore prefix) - Add standalone reflection_log fallback to prevent silent data loss - Rename consume_pending_capture_traces → get_pending_capture_traces - Remove dead guard in _build_evaluation_batch - Move SYSTEM_PROMPT_KEY constant to test file - Fix update_scores type annotation in failure_aware_sampler Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: remove prompt_messages logic, validate optimizable_keys - Adapt LLMTask to use config dict, remove LLMChatTask duplicate - Simplify candidate_validator to check optimizable_keys from adapter - Remove prompt_messages fallback from experiment_executor metadata - Update all tests, fixtures, scripts, and docs to flat key format Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(scripts): remove stale prompt_messages and API references Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(gepa): remove optimizable_keys from config dict to fix caching optimizable_keys was being injected into CandidateConfig by both _make_config_builder and the orchestrator, causing cache key mismatches between baseline and initialization evaluations. Pass it as an explicit parameter through the evaluation chain instead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(scripts): rename gepa_v2 scripts to gepa, delete run_optimization_e2e Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: use optimizable_keys generically in ADDING_AN_OPTIMIZER guide Remove hardcoded system_prompt references from code examples. Optimizers should iterate over context.optimizable_keys instead of assuming specific key names. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * fix(fe): address baz review comments - Use Tag component variants instead of hard-coded color spans for theme-aware diff badges (Added/Removed/Changed) - Clamp formatAsPercentage input to [0, 1] range to prevent >100% or negative percentage display - Read baseline score from experiment_scores as fallback when feedback_scores lacks the objective (evaluation-suite support) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(fe): extract getObjectiveScoreValue shared helper Move the feedback_scores -> experiment_scores fallback into a reusable getObjectiveScoreValue helper in feedback-scores.tsx. Replace all 4 call sites (CompareTrialsPage, TrialKPICards, useOptimizationScores, useCompareOptimizationsData) with the shared helper. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(be): resolve CI failures - migration conflict and test ignored fields - Rename migration 000063 → 000064 to avoid prefix conflict with main - Add datasetItemCount to EXPERIMENT_IGNORED_FIELDS and test builder - Add datasetName to OPTIMIZATION_IGNORED_FIELDS (transient field) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(fe): extract shared aggregateExperimentMetrics helper Deduplicate weighted score/cost/latency accumulation logic that was duplicated between TrialKPICards and useCompareOptimizationsData. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add optimization_id index on experiments and remove dead code - Add minmax index on experiments.optimization_id to speed up optimization queries that join experiments by optimization_id - Remove unused OptimizationDiffView component (dead code from iteration) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Update Helm documentation * [OPIK-4383] [BE] Redis stream subscriber for debounced experiment aggregates recomputation (#5371) * [OPIK-4380] [BE] Add experiment aggregates for denormalized metrics - Add experiment_aggregates and experiment_item_aggregates tables - Implement ExperimentAggregatesDAO with population and query methods - Add ExperimentAggregatesService for aggregation management - Refactor DTOs into organized model classes: - ExperimentAggregatesModel: aggregation results - ExperimentEntityData: entity models - ExperimentSourceData: raw source data - ExperimentAggregatesUtils: utilities - Add FEEDBACK_SCORES_AGGREGATED filter strategy for map-based filtering - Add comprehensive integration tests (10/10 passing) - Configure batch size and parallelism settings * [OPIK-4380] [BE] Add MySQL deadlock retry mechanism for concurrent dataset operations Problem: - MySQL deadlock on dataset_version_tags composite PRIMARY KEY (workspace_id, dataset_id, tag) - Occurred during parallel dataset creation in same workspace - Multiple threads inserting "latest" tag for different datasets caused lock contention - Experiments with parallel execution were failing with MySQLTransactionRollbackException Solution: - Add handleOnDeadLocks() method in RetryUtils with: - 5 retry attempts with exponential backoff (250ms to 2s) - 0.5 jitter to reduce thundering herd effect - Recursive isDatabaseDeadlock() detection for MySQLTransactionRollbackException - Apply retry logic in DatasetItemService.setDatasetItemVersion() - Enables concurrent dataset creation for same workspace Impact: - Supports parallel experiment execution with proper deadlock handling - Test success rate improved from 0/10 to 10/10 in ExperimentAggregatesIntegrationTest * Fix visibility * [OPIK-4380] [BE] Address PR review comments for experiment aggregates Fixed 11 automated review comments from baz-reviewer: CRITICAL fixes: - Prevent NPE on null span aggregations by adding coalesce() in SQL - Handle multi-project experiments with LIMIT 1 in GET_PROJECT_ID - Handle zero-item experiments with empty aggregation helpers - Bind feedback_scores_percentiles map instead of empty CAST HIGH priority fixes: - Use toDecimal128(12) instead of toDecimal64(9) for cost percentiles - Add null-safe tags handling with Optional.ofNullable() - Include exception objects in retry logging for stack traces MEDIUM priority fixes: - Add missing log_comment to SELECT_EXPERIMENT_BY_ID query - Add missing log_comment to GET_PROJECT_ID query LOW priority fixes: - Remove duplicate "id" binding in bindItemsParameters - Enhance batchSize config documentation with details All 11 integration tests passing. * [OPIK-4382] [BE] Refactor experiment aggregates with import cleanup and Optional patterns - Add missing imports for IntStream, ProjectStats, and other dependencies - Replace fully-qualified class names with proper imports across DAO and Service classes - Fix IS_NOT_EMPTY filter handling for FEEDBACK_SCORES_AGGREGATED strategies - Refactor null checks to use Optional in mapping methods: - mapFeedbackScoreAggregations, mapExperimentFromAggregates - mapFeedbackScoreData, mapExperimentGroupAggregationItem - Batch insert preparation with Optional chains - Improve code readability and maintainability with functional patterns * [OPIK-4380] [BE] Fix table definition * [OPIK-4380] [BE] Address PR comments and consolidate DatasetItemService methods - Fix tags NPE in ExperimentAggregatesDAO with defaultIfNull - Remove unnecessary FINAL clause from GET_EXPERIMENT_DATA query - Fix test naming in ExperimentAggregatesIntegrationTest - Consolidate 7 duplicate createVersionFromDelta methods into single canonical implementation - Remove debug logger from config-test.yml * [OPIK-4380] [BE] Fix missing log_comment and centralize search criteria binding - Fix SELECT_EXPERIMENT_BY_ID to properly render log_comment metadata - Use getSTWithLogComment pattern in getExperimentFromAggregates - Ensures ClickHouse query logging populates workspace/experiment IDs - Centralize ExperimentSearchCriteria binding logic - Create ExperimentSearchCriteriaBinder utility class - Parameterize filter strategies to support both DAO variants - Eliminate 29-line duplication between ExperimentDAO and ExperimentAggregatesDAO - Single source of truth prevents DAOs from getting out of sync * [OPIK-4380] [BE] Fix createVersionFromDelta consolidation after rebase - Update canonical method signature to include new parameters: - List<EvaluatorItem> evaluators - ExecutionPolicy executionPolicy - boolean clearExecutionPolicy - Update all 5 caller sites to pass new parameters: - Use changes.evaluators(), changes.executionPolicy(), changes.clearExecutionPolicy() when available - Pass null/false for auto-generated versions that inherit from base - Add imports for EvaluatorItem and ExecutionPolicy Fixes compilation errors introduced by rebase with upstream changes to DatasetVersionService * [OPIK-4380] [BE] Address PR review comments - fix type mismatch, extract constants, remove DAO logging - Fixed BigDecimal[] to Double[] conversion for experiment_scores (matches ClickHouse Float64) - Extracted FilterStrategy lists to static final constants to avoid repeated allocations - Added @NonNull validation to populateExperimentAggregate parameter - Removed DAO layer logging, keeping service-level logging only * [OPIK-4380] [BE] Extract shared helper for experiment data pagination Extract streamWithExperimentPagination() helper method to eliminate duplication in getTracesData(), getSpansData(), and getFeedbackScoresData(). All three methods followed identical pattern: - asyncTemplate.stream with connection - getSTWithLogComment with cursor flag - Bind workspace_id, experiment_id, project_id, limit - Optional cursor binding - Result mapping Benefits: - Single source of truth for pagination binding logic - Prevents divergence when tweaking cursor/limit bindings - Reduces code from ~20 lines to ~10 lines per method - Type-safe generic implementation Note: CTE redundancy (3x experiment_items scan) is intentional to avoid passing large trace ID lists as parameters, which would cause performance issues with 10K+ traces. * [OPIK-4383] [BE] Add Redis stream subscriber for debounced experiment aggregates recomputation - Add ExperimentDenormalizationConfig implementing StreamConfiguration with debounce, job lock, and per-experiment aggregation lock settings - Add ExperimentAggregationMessage as stream message record - Add ExperimentAggregatesSubscriber consuming from the denormalization stream; acquires a workspace-scoped distributed lock per experiment before calling populateAggregations() - Add experimentDenormalizationEnabled feature flag to ServiceTogglesConfig and FeatureFlags - Wire ExperimentDenormalizationConfig into OpikConfiguration - Update config.yml and config-test.yml with full experimentDenormalization block (enabled for tests) - Add ExperimentAggregatesSubscriberTest covering lifecycle gating and processEvent success/error paths * Revision 2: Address PR comments - add config defaults, remove toggle, rename tests - ExperimentDenormalizationConfig: add sensible defaults to all fields so Dropwizard validation doesn't fail when the config block is absent from old deployments (config.isEnabled()=false still gates the subscriber) - Remove experimentDenormalizationEnabled service toggle from ServiceTogglesConfig, FeatureFlags, config.yml and config-test.yml - the infrastructure gate (config.isEnabled()) is the single control point - Rename lifecycle test methods to camelCase per project conventions: startSkipsStartupWhenDisabled / stopSkipsShutdownWhenDisabled * Revision 3: Add @Max(500) to consumerBatchSize and @NotNull to jobLockWaitTime * [OPIK-4380] [BE] Address PR review comments - fix TYPE_REFERENCE visibility, redundant IN subquery, hardcoded context keys, Instant.now in loop, and inline defaultIfNull * Revision 4: Address remaining JetoPistola review comments (#7, #8, #10) - #7: Remove "Used for testing and verification" from getExperimentFromAggregates javadoc - #8: Replace recursive flatMap with Mono.expand() in populateExperimentItemsInBatches - #10: Remove unrelated subscribeOn addition from DatasetItemService.createVersionFromDelta * Revision 3: Add switchIfEmpty fallback for deleted traces in populateExperimentAggregate * Fix tests * Revision 6: Move countTotal log from DAO to service layer Operational logs belong in the service layer, not the DAO. * Revision 7: Apply Spotless formatting * Revision 8: Make populateAggregations(UUID, int) private Removes the uncapped public batch size entry point. All callers now go through the public no-arg overload which reads batchSize safely from config. * [OPIK-4380] [BE] Add evaluation_method support to experiment_aggregates pipeline - Add ClickHouse migration (000062) to add evaluation_method column to experiment_aggregates table - Add evaluationMethod field to ExperimentData record - Update GET_EXPERIMENT_DATA query to read evaluation_method from experiments - Update INSERT_EXPERIMENT_AGGREGATE to write evaluation_method to experiment_aggregates - Update SELECT_EXPERIMENT_BY_ID to read evaluation_method from experiment_aggregates - Fix Experiment record constructor call: insert EvaluationMethod at correct position (10) * [OPIK-4380] [BE] Extract shared helper for experiment aggregation queries Reduce copy-paste in getTraceAggregations, getSpanAggregations, and getFeedbackScoreAggregations by extracting queryExperimentAggregation, which centralises the context-aware execution, workspace/experiment/project parameter binding, and singleOrEmpty pattern shared by all three methods. * [OPIK-4380] [BE] Enforce non-null contract on countTotal criteria parameter Add @NonNull to ExperimentSearchCriteria in the interface and implementation so that a null argument fails fast with an explicit NullPointerException at the DAO boundary instead of crashing deep inside buildCountTemplate. * [OPIK-4380] [BE] Fix countTotal ignoring target project IDs in normal path target_project_ids was only applied inside the project_deleted LEFT JOIN subquery; the main WHERE had no project restriction, so counts were workspace-wide. Reuse has_target_projects in the main WHERE so project_id IN :target_project_ids always takes effect. Also replace manual null/empty checks with CollectionUtils.isNotEmpty. * [OPIK-4380] [BE] Apply Spotless formatting * [OPIK-4382] [BE] Address PR review comments on experiment aggregates - Fix :versionId → :version_id parameter naming in SQL templates and bindings - Fix last_updated_at binding to use item.lastUpdatedAt() instead of Instant.now() - Fix FEEDBACK_SCORES_AGGREGATED_IS_EMPTY filter: embed generated SQL into templates instead of hard-coded static condition, and add missing bind calls - Fix RetryUtils log duplication (remove getMessage() + pass exception directly) - Add batchSize = 1000 default in ExperimentAggregatesConfig - Extract resolveVersionIdForCriteria helper to deduplicate version-id resolution - Add null/blank/ClickHouse placeholder guards in extractUuidsFromGroupValues - Extract loadEntityMap helper to deduplicate getEnrichInfoHolder enrichment logic * Revision 3: Address PR comments E, F, G, H - Fix E: Extract shared template/bind helpers in ExperimentAggregatesDAO - Fix F: Bind experiment_ids as UUID[] instead of String[] - Fix G+H: Extract getEnrichInfoHolder logic into ExperimentGroupEnricher, eliminating duplication between ExperimentService and ExperimentAggregatesService without introducing a direct dependency between them * Revision 4: Fix ExperimentServiceTest to include ExperimentGroupEnricher mock * [OPIK-4382] [BE] Extract shared Row→ExperimentGroup mappers into ExperimentGroupMappers Pull the duplicated Row→ExperimentGroupItem and Row→ExperimentGroupAggregationItem conversion logic from ExperimentDAO and ExperimentAggregatesDAO into a shared ExperimentGroupMappers utility class. Both DAOs now delegate to the same toExperimentGroupItem / toExperimentGroupAggregationItem helpers, eliminating the need to mirror DTO mapping changes in two places. * [OPIK-4382] [BE] Deduplicate bindGroupCriteria into ExperimentGroupMappers Moves the shared group-criteria binding logic out of ExperimentDAO and ExperimentAggregatesDAO into ExperimentGroupMappers.bindGroupCriteria(), following the same pattern as ExperimentSearchCriteriaBinder. Adding or fixing a criteria binding now only requires a change in one place. * [OPIK-4382] [BE] Extract streamGroupQuery helper and fix null percentiles - Deduplicate findGroups/findGroupsAggregations into a single private streamGroupQuery(queryTemplate, criteria, rowMapper) that differs only by the query constant and BiFunction row mapper. - Fix convertToBigDecimal to return null for null/unsupported inputs so absent p50/p90/p99 entries in getDuration propagate as null rather than BigDecimal.ZERO, preserving the semantic-null that lets callers apply COALESCE/fallback logic correctly. * [OPIK-4382] [BE] Consolidate cost/duration helpers into ExperimentGroupMappers Promote getCostValue and getDuration to public static in ExperimentGroupMappers and delete the private copies in ExperimentDAO. ExperimentDAO.mapToDto now delegates to the shared helpers, so any future change to cost filtering, duration percentile extraction, or the BigDecimal conversion only needs to be made in one place. Side-effect: ExperimentDAO.mapToDto also picks up the null-percentile fix (convertToBigDecimal returns null for absent/unsupported inputs) that was previously applied only to ExperimentGroupMappers. * [OPIK-4382] [BE] Fix pagination count and add criteria filter tests - Remove count() OVER () window function from paged query (returned page-scoped count instead of full result-set count) - Replace with dedicated count query + short-circuit: skip items query when count == 0, use DatasetItemPage.empty() for that case - Extract DatasetItemResultMapper.buildItemFromRow as public static helper reused by ExperimentAggregatesDAO - Add parameterized integration tests for ExperimentGroupCriteria filters (name, types, projectId, combined, empty-result) covering both findGroups and findGroupsAggregations aggregate paths * [OPIK-4380] [BE] Extract shared filter helpers into FilterQueryBuilder Add FilterStrategyParam record, applyFiltersToTemplate and bindFilters static helpers to FilterQueryBuilder, then replace duplicated per-strategy loops in DatasetItemVersionDAO and ExperimentAggregatesDAO with single delegating calls backed by per-DAO strategy constants. * [OPIK-4382] [BE] Consolidate filter helpers in getExperimentItemsStatsFromAggregates Add EXPERIMENT_ITEMS_STATS_FILTER_STRATEGY_PARAMS and EXPERIMENT_ITEMS_STATS_BIND_STRATEGIES constants and replace the per-strategy toAnalyticsDbFilters/bind blocks in getExperimentItemsStatsFromAggregates with single delegating calls to FilterQueryBuilder.applyFiltersToTemplate and bindFilters. * Revision 9: Extract shared helpers to eliminate duplication across DAOs - Create DatasetItemSearchCriteriaMapper: centralizes filters + search flag wiring for DatasetItemSearchCriteria, shared by DatasetItemVersionDAO and ExperimentAggregatesDAO - Add ExperimentGroupMappers.applyGroupCriteriaToTemplate: centralizes ExperimentGroupCriteria → ST template wiring, now shared by ExperimentDAO and ExperimentAggregatesDAO - Update DatasetItemVersionDAO.addFiltersToTemplate and bindSearchAndFilters to delegate to DatasetItemSearchCriteriaMapper - Update ExperimentAggregatesDAO.applyDatasetItemFiltersToTemplate and bindDatasetItemSearchParams to delegate to DatasetItemSearchCriteriaMapper - Update ExperimentDAO.newGroupTemplate and ExperimentAggregatesDAO.newGroupTemplate to delegate to ExperimentGroupMappers.applyGroupCriteriaToTemplate * [OPIK-4383] [BE] Fix project_deleted filter and comments_dedup scope in ExperimentAggregatesDAO - Fix project_deleted filter: use zero UUID sentinel instead of empty string for FixedString(36) column comparison in FIND_GROUPS and FIND_GROUPS_AGGREGATIONS - Fix comments_dedup CTE: scope trace_id subquery by dataset_id to avoid scanning the entire workspace's comments table - Add missing streamMaxLen and streamTrimLimit fields to ExperimentDenormalizationConfig (implements StreamConfiguration interface) * [OPIK-4383] [BE] Address PR review comments: extract ZERO_UUID constant and fix config comment - Promote zero UUID sentinel to shared constant in ExperimentGroupMappers - Use parameterized :zero_uuid binding in SQL templates instead of hardcoded string - Fix config.yml comment from "Default: 120s" to "Default: 1m" * [OPIK-4383] [BE] Add streamMaxLen and streamTrimLimit to experimentDenormalization config * [OPIK-4727] fix: remove old GEPA code, fix aggregates test, add migration rollback docs - Remove gepa_old/ optimizer source and tests, clean factory registry - Add datasetItemCount to EXPERIMENT_AGGREGATED_FIELDS_TO_IGNORE (not stored in aggregates table) - Add rollback documentation to mutation experiment type migration Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [OPIK-4727] [FE] refactor: deduplicate KPI cards, metric cells, and cleanup - Extract shared KPICard/MetricKPICard to pages-shared/experiments/KPICard - Extract calcPercentageVsBaseline helper and TrialMetricCellContent to deduplicate percentage calculation across 3 trial metric cells - Remove unused OptimizationUpdate interface from types - Fix inconsistent color token (text-light-slate → text-muted-slate) - Move IIFE out of JSX in MetricComparisonCell compact mode Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [OPIK-4727] [FE] refactor: rename Compare* to Optimization/Trial, simplify URL structure - Rename CompareOptimizations* → Optimization* and CompareTrials* → Trial* - Simplify optimization URL from /$datasetId/$optimizationId to /$optimizationId - Change trial route from /compare to /trials - Add OptimizationCompareRedirect for legacy URL backwards compatibility - Update all navigation references across pages (OptimizationsPage, HomePage, BestPrompt, ResourceLink, etc.) - Fix breadcrumbs: show raw optimization ID, "Trial #N" for trials - Split optimization detail into Report & Trials tabs with underline style - Replace ToggleGroup with underline Tabs on trial page Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [OPIK-4727] [FE] feat: rename tabs, add diff vs parent, fix word-level diffs - Rename "Report" tab to "Overview" on optimization page - Rename "Best configuration" to "Best trial configuration" - Change "Diff" button to "Diff vs. baseline" in configuration sections - Add "Diff vs. parent" option in trial configuration tab - Fix prompt diff to use word-level mode for inline change highlights - Fix TextDiff word-mode layout to flow inline instead of dropping lines Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [OPIK-4727] [FE] refactor: extract shared config flattening, add redirect tests - Extract flattenConfig, EXCLUDED_CONFIG_KEYS, shouldSkipRedundantKey into configuration-renderer.ts (shared by TrialConfigurationSection and ConfigurationDiffContent) - Convert ConfigViewMode string union to CONFIG_VIEW_MODE const object - Add missing `replace` prop on fallback Navigate in OptimizationCompareRedirect - Restore isArray guard in ConfigurationDiffContent collectPrompts - Add unit tests for configuration-renderer (21 tests) - Add unit tests for OptimizationCompareRedirect (4 tests) - Add E2E Playwright test for legacy /compare URL redirect (2 tests) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [OPIK-4727] [FE] fix: PR review fixes - generic flattenConfig, NamedPrompts diff, parent fallback - Make flattenConfig accept generic skipKey callback instead of hardcoded filtering (addresses Baz review comment) - Fix NamedPrompts format not recognized as "prompt" type in detectConfigValueType, causing JSON-level diff instead of word-level - Add parent experiment fallback for old optimizations using chronological ordering (enables "Diff vs. parent" for non-GEPA v2) - Fix PromptDiff fallback paths to use mode="words" for word-level diffs - Add tests for NamedPrompts detection and generic skipKey behavior Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [OPIK-4727] [FE] fix: PR review - shared makeSkipKey, parent fallback, label tweak - Extract makeSkipKey helper into configuration-renderer.ts to eliminate duplicate skipKey predicates in TrialConfigurationSection and ConfigurationDiffContent - Fix parentCandidateIds lookup: fall through to chronological fallback when GEPA v2 metadata exists but no matching parent is found - Use shared skipKey in collectPrompts instead of inline checks - Remove period from "Diff vs." labels Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [OPIK-4727] [FE] fix: rename Config toggle label to Configuration Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [OPIK-4838] [SDK] feat: GEPA convergence improvements (#5570) * [OPIK-4838] [SDK] feat: GEPA convergence improvements - Add GepaConfig dataclass for centralized algorithm parameters - Cache parent scores during minibatch gate with configurable tolerance to absorb LLM judge noise (gate_tolerance=0.1) - Rewrite FailureAwareBatchSampler to use assertion-based pass/fail instead of score threshold, prioritize by failing assertion count - Switch default candidate selection to current_best - Add docs on scoring pipeline and candidate selection strategies Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore(scripts): use GepaConfig defaults in e2e scripts Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: remove local-only docs from PR Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address baz review comments, tighten e2e assertions - Remove unused _global_assertion_failures Counter from sampler - Update sampler docstring to match implementation (assertion count, not frequency) - Bound _cached_full_eval_scores to max_candidates entries with FIFO eviction - Tighten e2e assertions: add context-awareness checks to EASY tier, sharpen MEDIUM tier Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: remove gate_tolerance, require strict minibatch improvement Cached parent scores are still used for deterministic comparison, but mutations must now strictly beat the parent on the minibatch without any tolerance cushion. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: shuffle failed items randomly instead of sorting by priority Simplifies minibatch sampling — all failed items are shuffled equally rather than sorted by assertion failure count then shuffled within tiers. This gives better variety across iterations since most items share the same tier anyway. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: top-tier sampling, template var validation, improved reflection prompt - Sampler splits failed items into top/rest by assertion failure count, draws randomly from top tier first (configurable top_failed_fraction) - Reject reflection proposals that drop template variables (e.g. {question}) - Reflection prompt encourages surgical edits over full rewrites Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: only update best_trial from full evaluations, not minibatches A minibatch scoring 1.0 on 4 items was being reported as best_trial even when the full evaluation only reached 0.9 on 20 items. Now best_trial is only updated when experiment_type is None (full eval). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: remove accidentally committed reflection logs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: update GEPA docs for top-tier sampling, 5-step reflection, template var validation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: increase default reflection_minibatch_size from 4 to 6 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: thread evaluator_model through pipeline, increase max_candidates to 25 The default LLMJudge model (gpt-5-nano) was too lenient, causing pass_rate to always report 1.0. Thread evaluator_model from OptimizationContext through EvaluationAdapter and experiment_executor so callers can specify a more capable judge model. Also increase GEPA max_candidates default from 5 to 25 to allow longer optimization runs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: configurable blended scoring with assertion-level tiebreaker Add ScoringConfig with strategy ("blended" | "pass_rate"), configurable weights, and auto-computed epsilon (1/(num_items+1)) that guarantees pass_rate always dominates while giving the algorithm gradient signal from individual assertion progress. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: log raw pass_rate to UI, use blended score only for algorithm _extract_score now returns (optimization_score, display_score) tuple. The blended score drives the algorithm's acceptance gate, while the raw pass_rate is logged as the experiment score for the UI chart. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: align GEPA per-item scoring with pass_rate, expose blended score as internal - Replace GEPA's mean-of-assertions scoring with pass_rate-aligned formula: passing items score 1.0, failing items score ε × assertion_frac (where ε = 1/(num_items+1)), preserving gradient for subsample gate - Use build_suite_result as source of truth for item pass/fail - Store pass_rate in trial.score (user-facing), blended score in internal_optimization_score (algorithm-only) - Use pass_rate for stop condition threshold comparison - Add type hints and ScoreResult type to _extract_per_item_feedback Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: only record full evaluations as visible trials Skip appending subsample/minibatch evals and cache hits to state.trials so the UI only shows meaningful trial progression. Internal evals are still returned to the optimizer for its scoring logic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: update GEPA docs for scoring contract, trial visibility, per-item scoring Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: make reflection prompt less conservative to reduce stagnation Relax the "surgical edits / MINIMAL EDIT" constraints that prevented the reflection LLM from making meaningful structural changes when persistent failures are detected. Key changes: graduated edit aggressiveness based on Failure History, concrete escalation strategies (restructure, step-by-step procedures, conditional logic, section rewrite), and removal of the single-rule-per-failure cap. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: cumulative assertion failure tracking with persistent failure threshold - Track per-assertion total failures and total evaluations across the entire optimization run in FailureAwareBatchSampler - Show Failure History only when an assertion has failed >=10 times (persistent failure threshold) AND failed again in current eval - Include failures/evals ratio so the reflection LLM can gauge severity - Remove streak-based logic in favor of cumulative counts - Simplify reflection prompt: 5 steps → 4, merge write+apply steps, remove duplicated escalation, cleaner multiline formatting - Remove duplicate cumulative info from Summary's Blocking assertions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: show only worst run for multi-run items, reorder Failure History before run output Reduces feedback verbosity by showing only the worst run instead of all runs for multi-run items. Places Failure History right after Inputs in the record so the reflection LLM sees persistent failure context before the run details. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: guide reflection LLM toward general rules, allow specificity for persistent failures Step 3 now instructs to abstract specific examples into general categories. Step 2 allows more specific rules for persistently failing assertions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: update GEPA reflection prompt docs for current algorithm Updates template (4 steps), feedback format (worst-run-only, Failure History with cumulative counts and Z=10 threshold), and generalization guidance. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: reduce prompt overfitting — prefer updating existing rules, group by behavior pattern Step 3: check whether an existing rule covers the failing behavior before adding a new one. Step 4: group by behavior pattern, not scenario type. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: make persistent failure threshold configurable, default to 7 Add persistent_failure_threshold to GepaConfig (default=7), thread through GepaOptimizer → FrameworkGEPAAdapter → ReflectiveDatasetBuilder. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: strengthen anti-overfitting in reflection prompt Step 3: NEVER copy specific names/details/scenarios from feedback — they are samples that change at runtime. Step 2: persistent failure specificity still avoids non-generalizable details. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: update GEPA docs for current algorithm state Update both implementation guide and reflection prompt docs to match current code: 4-step template, worst-run-only feedback, cumulative failure threshold (configurable, default 7), anti-overfitting guidance, and corrected config defaults. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: handle edge cases in scoring and trial recording - _item_score: return 0.0 (not 1.0) for failed items with empty assertions, so they don't get scored as passes - evaluation_adapter: guard trial is not None before appending to state.trials and accessing trial.optimization_score Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(BE): exclude minibatch/mutation experiments from pass_rate computation The FIND query's best_objective_score was computed as a weighted average across ALL experiments including minibatch and mutation, causing the UI to show incorrect pass_rate during optimization. Filter experiment_candidates to only include full-eval experiments (regular/trial types). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(test): replace ambiguous XYZ headphones e2e item with clear bluetooth speaker scenario Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(test): use assertions= shorthand and typed ExecutionPolicy in e2e scripts Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(optimizer): add early stopping when pass_rate plateaus Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(test): update reflective dataset tests to use Worst Run instead of Runs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(optimizer): update assertion failure counters on minibatch evals too Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * [NA] [SDK] refactor: convert reflection template to triple-quoted string Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: extract apps/opik-optimizer to comet-ml/opik-optimizer repo Remove apps/opik-optimizer/ directory (moved to separate repo). Clean up python-backend framework optimizer references: - Remove framework_optimizer.py and framework_runner.py - Remove OPTIMIZER_FRAMEWORK queue from Java Queue enum - Remove opik-optimizer additional_contexts from docker-compose and CI - Simplify resolveQueue to always use OPTIMIZER_CLOUD Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: revert python-backend DRY refactor that was only needed for framework optimizer Restore optimizer.py and rq_worker_manager.py to main state, delete optimizer_job_helper.py which only existed to share code with the now-removed framework_optimizer.py. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [NA] [FE] fix: trials table sorting, metric trend precision, and word diff readability - Implement client-side sorting for optimization trials table (all columns) - Default sort by Trial # ascending - Show 0% trend when formatted values are identical (below display resolution) - Fall back to block diff when word changes exceed 60% of content Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [NA] [FE] fix: improve diff readability and optimization progress status - Use hybrid line+word diff: line-level diff first to find changed regions, then word-level refinement within paired lines for precise highlights - Use diffTrimmedLines to ignore trailing whitespace differences - Fall back to separate removed/added blocks when line pairs are too different - Add "Running initial calculations..." status for early GEPA phases - Unify Changed/Added/Removed tag styling in prompt diff view Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [NA] [FE] fix: address PR review — sortCandidates tests, diffLines, formatter comments - Add unit tests for sortCandidates covering all sort branches - Switch TextDiff from diffTrimmedLines to diffLines for whitespace detection - Add comments explaining intentional formatter comparison in percentage calc - Export sortCandidates and CANDIDATE_SORT_FIELD_MAP for testability Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [NA] [FE] fix: address PR feedback — shared percentage helper, migration renumber, test timezone fixes, dataset button layout - Extract calcFormatterAwarePercentage into shared lib/percentage.ts (review comment) - Renumber migrations 000064→000065, 000065→000066 to avoid prefix conflict with main - Fix timezone-sensitive test failures in MetricDateRangeSelect/utils.test.ts - Fix dataset NavigationTag rendering as block in OptimizationHeader - Fix mypy return type for _run_suite_evaluation in evaluator.py Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [NA] [FE] feat: dynamic chart legend, clickable ghost dot, dataset button width fix - Chart legend now shows only statuses present in the data - Ghost (in-progress) dot is clickable to select the trial - Dataset NavigationTag constrained to content width with w-fit Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [NA] [FE] refactor: simplify trial statuses, ghost dot color, best candidate breathing animation - Remove "evaluating" status — candidates are now passed (scored > parent) or pruned (scored <= parent) - Simplify computeCandidateStatuses to compare against parent score - Ghost dot uses running status color (yellow) instead of hardcoded blue - Best candidate dot breathes (opacity pulse) when optimization is active but no ghost dot is shown - Clean up unused isOptimizationFinished/inProgressStepIndex props from columns and cells Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [NA] [FE] fix: optimization cost duration from created_at, live timer, absolute time in trials table - Duration now starts from optimization created_at instead of first experiment - Live ticking timer while optimization is in progress - Trials table "Created" column shows absolute date/time instead of relative Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [NA] [SDK] fix: correct return type of evaluate_optimization_suite_trial Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Remove orphaned test for evaluate_optimization_suite_trial Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Revert unused dataset_type additions in Python SDK Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [OPIK-4727] [FE] fix: improve non-eval-suite optimization display - All chart dots blue for non-eval-suite optimizations (no pruned status) - Chart legend shows metric name instead of status labels - Table shows "Baseline" for step 0, "Passed" for scored candidates - Column header shows metric name (e.g. "Accuracy (geval)") - Add reason tooltip (speech bubble) to trial items score columns - Remove jailbreak password demo template - Revert temporary feature flag override Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: renumber migration prefixes 000065→000067, 000066→000068 to avoid conflicts with main Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [OPIK-4928] [BE] fix: use correct column to lookup execution policies for experiment items The execution policy lookup query was filtering by dataset_item_versions.dataset_item_id instead of dataset_item_versions.id. With dataset versioning enabled, experiment items reference dataset_item_versions.id as their datasetItemId, causing the lookup to miss and fall back to the default policy {runs_per_item:1, pass_threshold:1}. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add missing dataset_item_count to aggregated experiment CTEs The experiments_from_aggregates_final CTEs in both FIND and FIND_GROUPS_AGGREGATIONS queries were missing dataset_item_count, causing ClickHouse UNKNOWN_IDENTIFIER errors when the aggregated branch was used. Maps ea.experiment_items_count to dataset_item_count. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: remove duplicate SELECT in FIND_GROUPS_AGGREGATIONS and deduplicate chart empty states - The FIND_GROUPS_AGGREGATIONS query had two outer SELECTs after the subquery, causing a ClickHouse syntax error. Merged dataset_item_count into the single outer SELECT and removed the duplicate. - Collapsed identical spinner/NoData branches in OptimizationProgressChartContainer. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [OPIK-4928] [BE] fix: remove feature flag gate preventing execution policy resolution Root cause: ExperimentItemService.fetchItemPolicies() was gated behind isDatasetVersioningEnabled(). When disabled, item-level execution policies were never fetched from dataset_item_versions, so all experiment items fell back to ExecutionPolicy.DEFAULT {1, 1}. Also reverts the incorrect DatasetItemVersionDAO column change from 576791acc — experiment_items.dataset_item_id stores the logical dataset_items.id which matches dataset_item_versions.dataset_item_id. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address PR review comments and fix execution_po…

@nonnull

…ndpoints (#5577) * [OPIK-4380] [BE] Add experiment aggregates for denormalized metrics - Add experiment_aggregates and experiment_item_aggregates tables - Implement ExperimentAggregatesDAO with population and query methods - Add ExperimentAggregatesService for aggregation management - Refactor DTOs into organized model classes: - ExperimentAggregatesModel: aggregation results - ExperimentEntityData: entity models - ExperimentSourceData: raw source data - ExperimentAggregatesUtils: utilities - Add FEEDBACK_SCORES_AGGREGATED filter strategy for map-based filtering - Add comprehensive integration tests (10/10 passing) - Configure batch size and parallelism settings * [OPIK-4380] [BE] Add MySQL deadlock retry mechanism for concurrent dataset operations Problem: - MySQL deadlock on dataset_version_tags composite PRIMARY KEY (workspace_id, dataset_id, tag) - Occurred during parallel dataset creation in same workspace - Multiple threads inserting "latest" tag for different datasets caused lock contention - Experiments with parallel execution were failing with MySQLTransactionRollbackException Solution: - Add handleOnDeadLocks() method in RetryUtils with: - 5 retry attempts with exponential backoff (250ms to 2s) - 0.5 jitter to reduce thundering herd effect - Recursive isDatabaseDeadlock() detection for MySQLTransactionRollbackException - Apply retry logic in DatasetItemService.setDatasetItemVersion() - Enables concurrent dataset creation for same workspace Impact: - Supports parallel experiment execution with proper deadlock handling - Test success rate improved from 0/10 to 10/10 in ExperimentAggregatesIntegrationTest * Fix visibility * [OPIK-4380] [BE] Address PR review comments for experiment aggregates Fixed 11 automated review comments from baz-reviewer: CRITICAL fixes: - Prevent NPE on null span aggregations by adding coalesce() in SQL - Handle multi-project experiments with LIMIT 1 in GET_PROJECT_ID - Handle zero-item experiments with empty aggregation helpers - Bind feedback_scores_percentiles map instead of empty CAST HIGH priority fixes: - Use toDecimal128(12) instead of toDecimal64(9) for cost percentiles - Add null-safe tags handling with Optional.ofNullable() - Include exception objects in retry logging for stack traces MEDIUM priority fixes: - Add missing log_comment to SELECT_EXPERIMENT_BY_ID query - Add missing log_comment to GET_PROJECT_ID query LOW priority fixes: - Remove duplicate "id" binding in bindItemsParameters - Enhance batchSize config documentation with details All 11 integration tests passing. * [OPIK-4382] [BE] Refactor experiment aggregates with import cleanup and Optional patterns - Add missing imports for IntStream, ProjectStats, and other dependencies - Replace fully-qualified class names with proper imports across DAO and Service classes - Fix IS_NOT_EMPTY filter handling for FEEDBACK_SCORES_AGGREGATED strategies - Refactor null checks to use Optional in mapping methods: - mapFeedbackScoreAggregations, mapExperimentFromAggregates - mapFeedbackScoreData, mapExperimentGroupAggregationItem - Batch insert preparation with Optional chains - Improve code readability and maintainability with functional patterns * [OPIK-4380] [BE] Fix table definition * [OPIK-4380] [BE] Address PR comments and consolidate DatasetItemService methods - Fix tags NPE in ExperimentAggregatesDAO with defaultIfNull - Remove unnecessary FINAL clause from GET_EXPERIMENT_DATA query - Fix test naming in ExperimentAggregatesIntegrationTest - Consolidate 7 duplicate createVersionFromDelta methods into single canonical implementation - Remove debug logger from config-test.yml * [OPIK-4380] [BE] Fix missing log_comment and centralize search criteria binding - Fix SELECT_EXPERIMENT_BY_ID to properly render log_comment metadata - Use getSTWithLogComment pattern in getExperimentFromAggregates - Ensures ClickHouse query logging populates workspace/experiment IDs - Centralize ExperimentSearchCriteria binding logic - Create ExperimentSearchCriteriaBinder utility class - Parameterize filter strategies to support both DAO variants - Eliminate 29-line duplication between ExperimentDAO and ExperimentAggregatesDAO - Single source of truth prevents DAOs from getting out of sync * [OPIK-4380] [BE] Fix createVersionFromDelta consolidation after rebase - Update canonical method signature to include new parameters: - List<EvaluatorItem> evaluators - ExecutionPolicy executionPolicy - boolean clearExecutionPolicy - Update all 5 caller sites to pass new parameters: - Use changes.evaluators(), changes.executionPolicy(), changes.clearExecutionPolicy() when available - Pass null/false for auto-generated versions that inherit from base - Add imports for EvaluatorItem and ExecutionPolicy Fixes compilation errors introduced by rebase with upstream changes to DatasetVersionService * [OPIK-4380] [BE] Address PR review comments - fix type mismatch, extract constants, remove DAO logging - Fixed BigDecimal[] to Double[] conversion for experiment_scores (matches ClickHouse Float64) - Extracted FilterStrategy lists to static final constants to avoid repeated allocations - Added @nonnull validation to populateExperimentAggregate parameter - Removed DAO layer logging, keeping service-level logging only * [OPIK-4380] [BE] Extract shared helper for experiment data pagination Extract streamWithExperimentPagination() helper method to eliminate duplication in getTracesData(), getSpansData(), and getFeedbackScoresData(). All three methods followed identical pattern: - asyncTemplate.stream with connection - getSTWithLogComment with cursor flag - Bind workspace_id, experiment_id, project_id, limit - Optional cursor binding - Result mapping Benefits: - Single source of truth for pagination binding logic - Prevents divergence when tweaking cursor/limit bindings - Reduces code from ~20 lines to ~10 lines per method - Type-safe generic implementation Note: CTE redundancy (3x experiment_items scan) is intentional to avoid passing large trace ID lists as parameters, which would cause performance issues with 10K+ traces. * [OPIK-4383] [BE] Add Redis stream subscriber for debounced experiment aggregates recomputation - Add ExperimentDenormalizationConfig implementing StreamConfiguration with debounce, job lock, and per-experiment aggregation lock settings - Add ExperimentAggregationMessage as stream message record - Add ExperimentAggregatesSubscriber consuming from the denormalization stream; acquires a workspace-scoped distributed lock per experiment before calling populateAggregations() - Add experimentDenormalizationEnabled feature flag to ServiceTogglesConfig and FeatureFlags - Wire ExperimentDenormalizationConfig into OpikConfiguration - Update config.yml and config-test.yml with full experimentDenormalization block (enabled for tests) - Add ExperimentAggregatesSubscriberTest covering lifecycle gating and processEvent success/error paths * Revision 2: Address PR comments - add config defaults, remove toggle, rename tests - ExperimentDenormalizationConfig: add sensible defaults to all fields so Dropwizard validation doesn't fail when the config block is absent from old deployments (config.isEnabled()=false still gates the subscriber) - Remove experimentDenormalizationEnabled service toggle from ServiceTogglesConfig, FeatureFlags, config.yml and config-test.yml - the infrastructure gate (config.isEnabled()) is the single control point - Rename lifecycle test methods to camelCase per project conventions: startSkipsStartupWhenDisabled / stopSkipsShutdownWhenDisabled * Revision 3: Add @max(500) to consumerBatchSize and @NotNull to jobLockWaitTime * [OPIK-4380] [BE] Address PR review comments - fix TYPE_REFERENCE visibility, redundant IN subquery, hardcoded context keys, Instant.now in loop, and inline defaultIfNull * Revision 4: Address remaining JetoPistola review comments (#7, #8, #10) - #7: Remove "Used for testing and verification" from getExperimentFromAggregates javadoc - #8: Replace recursive flatMap with Mono.expand() in populateExperimentItemsInBatches - #10: Remove unrelated subscribeOn addition from DatasetItemService.createVersionFromDelta * Revision 3: Add switchIfEmpty fallback for deleted traces in populateExperimentAggregate * Fix tests * Revision 6: Move countTotal log from DAO to service layer Operational logs belong in the service layer, not the DAO. * Revision 7: Apply Spotless formatting * Revision 8: Make populateAggregations(UUID, int) private Removes the uncapped public batch size entry point. All callers now go through the public no-arg overload which reads batchSize safely from config. * [OPIK-4380] [BE] Add evaluation_method support to experiment_aggregates pipeline - Add ClickHouse migration (000062) to add evaluation_method column to experiment_aggregates table - Add evaluationMethod field to ExperimentData record - Update GET_EXPERIMENT_DATA query to read evaluation_method from experiments - Update INSERT_EXPERIMENT_AGGREGATE to write evaluation_method to experiment_aggregates - Update SELECT_EXPERIMENT_BY_ID to read evaluation_method from experiment_aggregates - Fix Experiment record constructor call: insert EvaluationMethod at correct position (10) * [OPIK-4380] [BE] Extract shared helper for experiment aggregation queries Reduce copy-paste in getTraceAggregations, getSpanAggregations, and getFeedbackScoreAggregations by extracting queryExperimentAggregation, which centralises the context-aware execution, workspace/experiment/project parameter binding, and singleOrEmpty pattern shared by all three methods. * [OPIK-4380] [BE] Enforce non-null contract on countTotal criteria parameter Add @nonnull to ExperimentSearchCriteria in the interface and implementation so that a null argument fails fast with an explicit NullPointerException at the DAO boundary instead of crashing deep inside buildCountTemplate. * [OPIK-4380] [BE] Fix countTotal ignoring target project IDs in normal path target_project_ids was only applied inside the project_deleted LEFT JOIN subquery; the main WHERE had no project restriction, so counts were workspace-wide. Reuse has_target_projects in the main WHERE so project_id IN :target_project_ids always takes effect. Also replace manual null/empty checks with CollectionUtils.isNotEmpty. * [OPIK-4380] [BE] Apply Spotless formatting * [OPIK-4382] [BE] Address PR review comments on experiment aggregates - Fix :versionId → :version_id parameter naming in SQL templates and bindings - Fix last_updated_at binding to use item.lastUpdatedAt() instead of Instant.now() - Fix FEEDBACK_SCORES_AGGREGATED_IS_EMPTY filter: embed generated SQL into templates instead of hard-coded static condition, and add missing bind calls - Fix RetryUtils log duplication (remove getMessage() + pass exception directly) - Add batchSize = 1000 default in ExperimentAggregatesConfig - Extract resolveVersionIdForCriteria helper to deduplicate version-id resolution - Add null/blank/ClickHouse placeholder guards in extractUuidsFromGroupValues - Extract loadEntityMap helper to deduplicate getEnrichInfoHolder enrichment logic * Revision 3: Address PR comments E, F, G, H - Fix E: Extract shared template/bind helpers in ExperimentAggregatesDAO - Fix F: Bind experiment_ids as UUID[] instead of String[] - Fix G+H: Extract getEnrichInfoHolder logic into ExperimentGroupEnricher, eliminating duplication between ExperimentService and ExperimentAggregatesService without introducing a direct dependency between them * Revision 4: Fix ExperimentServiceTest to include ExperimentGroupEnricher mock * [OPIK-4382] [BE] Extract shared Row→ExperimentGroup mappers into ExperimentGroupMappers Pull the duplicated Row→ExperimentGroupItem and Row→ExperimentGroupAggregationItem conversion logic from ExperimentDAO and ExperimentAggregatesDAO into a shared ExperimentGroupMappers utility class. Both DAOs now delegate to the same toExperimentGroupItem / toExperimentGroupAggregationItem helpers, eliminating the need to mirror DTO mapping changes in two places. * [OPIK-4382] [BE] Deduplicate bindGroupCriteria into ExperimentGroupMappers Moves the shared group-criteria binding logic out of ExperimentDAO and ExperimentAggregatesDAO into ExperimentGroupMappers.bindGroupCriteria(), following the same pattern as ExperimentSearchCriteriaBinder. Adding or fixing a criteria binding now only requires a change in one place. * [OPIK-4382] [BE] Extract streamGroupQuery helper and fix null percentiles - Deduplicate findGroups/findGroupsAggregations into a single private streamGroupQuery(queryTemplate, criteria, rowMapper) that differs only by the query constant and BiFunction row mapper. - Fix convertToBigDecimal to return null for null/unsupported inputs so absent p50/p90/p99 entries in getDuration propagate as null rather than BigDecimal.ZERO, preserving the semantic-null that lets callers apply COALESCE/fallback logic correctly. * [OPIK-4382] [BE] Consolidate cost/duration helpers into ExperimentGroupMappers Promote getCostValue and getDuration to public static in ExperimentGroupMappers and delete the private copies in ExperimentDAO. ExperimentDAO.mapToDto now delegates to the shared helpers, so any future change to cost filtering, duration percentile extraction, or the BigDecimal conversion only needs to be made in one place. Side-effect: ExperimentDAO.mapToDto also picks up the null-percentile fix (convertToBigDecimal returns null for absent/unsupported inputs) that was previously applied only to ExperimentGroupMappers. * [OPIK-4382] [BE] Fix pagination count and add criteria filter tests - Remove count() OVER () window function from paged query (returned page-scoped count instead of full result-set count) - Replace with dedicated count query + short-circuit: skip items query when count == 0, use DatasetItemPage.empty() for that case - Extract DatasetItemResultMapper.buildItemFromRow as public static helper reused by ExperimentAggregatesDAO - Add parameterized integration tests for ExperimentGroupCriteria filters (name, types, projectId, combined, empty-result) covering both findGroups and findGroupsAggregations aggregate paths * [OPIK-4380] [BE] Extract shared filter helpers into FilterQueryBuilder Add FilterStrategyParam record, applyFiltersToTemplate and bindFilters static helpers to FilterQueryBuilder, then replace duplicated per-strategy loops in DatasetItemVersionDAO and ExperimentAggregatesDAO with single delegating calls backed by per-DAO strategy constants. * [OPIK-4382] [BE] Consolidate filter helpers in getExperimentItemsStatsFromAggregates Add EXPERIMENT_ITEMS_STATS_FILTER_STRATEGY_PARAMS and EXPERIMENT_ITEMS_STATS_BIND_STRATEGIES constants and replace the per-strategy toAnalyticsDbFilters/bind blocks in getExperimentItemsStatsFromAggregates with single delegating calls to FilterQueryBuilder.applyFiltersToTemplate and bindFilters. * Revision 9: Extract shared helpers to eliminate duplication across DAOs - Create DatasetItemSearchCriteriaMapper: centralizes filters + search flag wiring for DatasetItemSearchCriteria, shared by DatasetItemVersionDAO and ExperimentAggregatesDAO - Add ExperimentGroupMappers.applyGroupCriteriaToTemplate: centralizes ExperimentGroupCriteria → ST template wiring, now shared by ExperimentDAO and ExperimentAggregatesDAO - Update DatasetItemVersionDAO.addFiltersToTemplate and bindSearchAndFilters to delegate to DatasetItemSearchCriteriaMapper - Update ExperimentAggregatesDAO.applyDatasetItemFiltersToTemplate and bindDatasetItemSearchParams to delegate to DatasetItemSearchCriteriaMapper - Update ExperimentDAO.newGroupTemplate and ExperimentAggregatesDAO.newGroupTemplate to delegate to ExperimentGroupMappers.applyGroupCriteriaToTemplate * [OPIK-4383] [BE] Add experiment aggregate event listener and no-op publisher * Revision 2: Fix missing import for ExperimentAggregationPublisher * [OPIK-4383] [BE] Add ExperimentAggregationPublisher, ExperimentDenormalizationJob and tests - ExperimentAggregationPublisher: debounces experiment aggregation triggers by writing compound workspaceId:experimentId members to a Redis ZSET scored by expiry timestamp (now + debounceDelay), plus a hash storing the userName with TTL=2×debounceDelay to handle stale entries. - ExperimentDenormalizationJob: @every("5s") job that reads ZSET members with score <= now, publishes ExperimentAggregationMessage to the Redis stream, then cleans up the ZSET entry and hash bucket. Handles stale entries (expired hash) by removing the orphaned ZSET member without publishing. - Fix processExperiment reactive chain: avoided double index.remove by returning Mono<Boolean> from flatMap branches so switchIfEmpty is only triggered when the bucket is truly empty. - ExperimentAggregationPublisherTest: integration tests with real Redis container verifying ZSET membership, score, userName storage, TTL, workspace isolation, and debounce deduplication. - ExperimentDenormalizationJobTest: unit tests with Mockito covering disabled config, lock not acquired, empty ZSET, happy path, stale entry, and batch. * Fix tests setup * [OPIK-4383] [BE] Address PR review: move DAO logs to service layer * [OPIK-4383] [BE] Address PR review: extract shared DAO helper and fix log placement * [OPIK-4383] [BE] Short-circuit deleteByTraceIds when no spans found Skip delete, cascading operations, and SpansDeleted event when getSpanIdsForTraces returns an empty set, preserving the original no-op behaviour and avoiding the Preconditions.checkArgument failure in SpanDAO.deleteByIds. * [OPIK-4383] [BE] Fix cascade deletion failures after trace delete Two bugs prevented spans and attachments from being deleted when a trace was deleted via the event-driven cascade: 1. FeedbackScoreService.deleteByTraceIds/deleteBySpanIds had @nonnull on projectId which threw NPE when TracesDeleted.projectId() was null. EventInterceptor swallowed the NPE, stopping the entire cascade chain. Fix: remove @nonnull since the DAO already handles null safely via Optional.ofNullable(projectId). 2. SpanDAO.DELETE_BY_IDS had the wrong column (trace_id) and parameter name (span_ids) — the ClickHouse R2DBC driver could not resolve :span_ids as a named parameter in the DELETE statement. Fixed by using id IN :ids to match the working pattern in TraceDAO.DELETE_BY_ID. * [OPIK-4383] [BE] Address PR review comments on ExperimentDenormalizationJob - Centralize Redis constants (EXPERIMENT_KEY_PREFIX, USER_NAME_FIELD, MEMBER_SEPARATOR) in ExperimentDenormalizationConfig - Change ExperimentAggregationPublisher.publish() to return Mono<Void> instead of void, so errors propagate to callers - Make job interval configurable via jobs map in config.yml - Fix onErrorContinue logging: remove getMessage() duplication - Demote per-experiment logs from INFO to DEBUG - Add ZSET pagination using expand() to avoid materializing entire range - Update tests for all changes * Fix @every job interval config key casing and add jobs section to test config The dropwizard-jobs framework uses WordUtils.uncapitalize(class.getSimpleName()) to look up the interval in the jobs map, so the key must be 'experimentDenormalizationJob' (lowercase first letter). Also adds the missing jobs section and jobBatchSize to config-test.yml. * Replace @every annotation with programmatic Quartz scheduling Remove @every from ExperimentDenormalizationJob and schedule it programmatically in OpikGuiceyLifecycleEventListener, following the same pattern as TraceThreadsClosingJob. Add jobInterval config field to ExperimentDenormalizationConfig. Remove the jobs YAML section that caused deserialization errors with JobConfiguration's immutable map. * Add experiment context to error log and extract publishIfNotEmpty helper - Include experimentId and workspaceId in onExperimentUpdated error log - Extract publishIfNotEmpty helper to deduplicate filter+publish logic across triggerByExperimentIds, triggerByTraceIds, triggerBySpanIds * Fix NPE in ExperimentAggregateEventListenerTest mock setup Stub publisher.publish() to return Mono.empty() in setUp so .subscribe() calls in production code don't NPE on null. * [OPIK-4385] [BE] Use pre-computed aggregation tables for experiment endpoints Apply UNION ALL hybrid pattern to ExperimentDAO (FIND, FIND_GROUPS, FIND_GROUPS_AGGREGATIONS) and ExperimentItemDAO (STREAM, SELECT_DATASET_ITEM_VERSIONS_WITH_EXPERIMENT_ITEMS_COUNT, SELECT_DATASET_ITEM_VERSIONS_WITH_EXPERIMENT_ITEMS, SELECT_DATASET_ITEM_VERSIONS_WITH_EXPERIMENT_ITEMS_STATS) so that experiments present in experiment_aggregates / experiment_item_aggregates use pre-computed values, while others fall back to live JOIN computation. Add ExperimentAggregatesIntegrationTest covering all 7 affected queries with parameterized filter, pagination, and consistency scenarios. * [OPIK-4384] [BE] Fix missing zero_uuid binding and experiment_scores sort alias - Bind zero_uuid parameter in getById, getByIds, and get(ExperimentStreamRequest) methods that use the FIND query; the UNION ALL refactor introduced an experiments_from_aggregates CTE that requires this parameter but only the main find() method was binding it, causing 500 errors on those paths - Fix SortingQueryBuilder to reference the outer column alias experiment_scores_agg instead of es.experiment_scores; the ORDER BY sits outside the UNION ALL so the inner es alias is out of scope, while experiment_scores_agg is the consistent output alias exposed by both branches * [OPIK-4384] [BE] Fix null row injection from LEFT JOIN miss in feedback_scores and comments aggregation Pre-aggregate feedback_scores_final and comments_final into subqueries (GROUP BY entity_id) before LEFT JOIN in DatasetItemVersionDAO.STREAM. When a LEFT JOIN has no match against a pre-aggregated subquery the joined columns are NULL, so any(NULL) returns NULL instead of a default-valued row with epoch timestamps that caused Instant.parse() failures. Also adds a regression test covering the no-scores path in ExperimentAggregatesIntegrationTest. * [OPIK-4383] [BE] Remove DAO-level log.info from ExperimentAggregatesDAO methods Move operational logging responsibility to the service layer, consistent with earlier fixes for ExperimentItemDAO and SpanDAO in this PR. * Remove accidentally committed doc files These files were introduced during merge resolution but should not be part of the branch. * [OPIK-4383] [BE] refactor: extract triggerAggregation helper to centralize guard+publish flow * [OPIK-4383] [BE] fix: restore TagOperations.tagUpdateFragment in SpanDAO BULK_UPDATE Restores proper tag handling in SpanDAO.BULK_UPDATE query that was regressed to a simple arrayConcat. Now uses TagOperations.tagUpdateFragment() which provides arrayDistinct(), tag limit enforcement (max 50), and tags_to_add/tags_to_remove support. Also adds the required short_circuit_function_evaluation SETTINGS for throwIf evaluation. * Adding InterruptableJob * [OPIK-4383] [BE] Address PR review: expand safety valve, env var prefix - Add batchSize-capped iteration counter to expand() to prevent infinite loops when ZSET entries fail to be removed - Rename EXPERIMENT_DENORM_JOB_INTERVAL to OPIK_EXPERIMENT_DENORM_JOB_INTERVAL to follow the OPIK_ prefix convention * [OPIK-4384] [BE] Add branch optimization and CTE split to experiment queries Use pre-computed experiment_aggregates table to optimize query execution: - Add has_aggregated/has_raw flags to skip unnecessary UNION ALL branches in FIND/FIND_COUNT - Add getAggregationBranchCounts pre-query to determine which branches are needed - Apply CTE split pattern to FIND_GROUPS and FIND_GROUPS_AGGREGATIONS - Update getById to leverage branch optimization via single-ID branch count query - Add <if(id)> filter to SELECT_AGGREGATED_EXPERIMENT_IDS for getById support * [OPIK-4384] [BE] Add missing 7-arg overload for getDatasetItemsWithExperimentItems Fix test compilation error from merge: the remote branch added callers with (UUID, List, null, null, List<SortingField>, String, String) signature which needs a bridge overload to the 9-arg method. * [OPIK-4384] [BE] Add conditional LIMIT push-up, missing CTE, and fix test precision - Add conditional LIMIT push-up in STREAM query: push LIMIT into CTE when only one branch (raw or aggregated) is active for performance - Add missing experiment_item_aggr_trace_scope CTE for aggregated branch - Add AggregatedExperimentCounts record for experiment-level branching - Fix MultiValueFeedbackScoresE2ETest precision assertion: use isEqualTo instead of isEqualByComparingTo to respect custom BigDecimal comparator * [OPIK-4384] [BE] Push OFFSET into top_dataset_items CTE and fix BigDecimal comparator in DatasetsResourceTest * [OPIK-4384] [BE] Add pass rate aggregation to experiment aggregates Add pass_rate, passed_count, and total_count columns to experiment_aggregates table and compute them during aggregation. Update ExperimentDAO queries to select these columns from both raw and aggregated paths, returning NULL for non-evaluation-suite experiments. * Fix get by id * Fix mapping * [OPIK-4384] [BE] Use pre-aggregated comments from aggregate tables with ISO 8601 date formatting Update retrieval queries in ExperimentDAO, DatasetItemVersionDAO, and ExperimentAggregatesDAO to read comments_array_agg as JSON String from aggregate tables instead of live-querying the comments table. Ensure UNION ALL type compatibility by wrapping raw paths with toJSONString() and formatting dates as ISO 8601 for proper Jackson deserialization. * [OPIK-4384] [BE] Use parameterized binding for dynamic sort keys and add deterministic tiebreaker - Replace literal string interpolation in getTopSortExpression with parameterized bind variables (sf.bindKey()) to prevent SQL injection - Remove fieldMapping filter from bindDynamicKeys so all dynamic keys are bound, including those used in the top_sorting SELECT expression - Add deterministic tiebreaker (id DESC / dataset_item_id DESC) to both the push-top-limit CTE and the main ORDER BY for consistent pagination - Fix experiment_items deduplication: use FINAL where DISTINCT was used and vice versa for consistency across query branches * [OPIK-4384] [BE] Add mixed-state aggregation test for UNION ALL hybrid Test creates 3 experiments, aggregates only 1, and queries all 3 to exercise the UNION ALL hybrid path where has_aggregated and has_raw are both true simultaneously. * [OPIK-4384] [BE] Add isNotEmpty assertions to parameterized filter tests Ensure filter scenarios actually match data by asserting content() is not empty before and after aggregation in all parameterized filter tests (find, findGroups, findGroupsAggregations). * [OPIK-4384] [BE] refactor: extract assertion helpers to remove duplication in ExperimentAggregatesIntegrationTest * [OPIK-4384] [BE] refactor: rename parseFlexibleInstant to parseInstant in FeedbackScoreMapper * [OPIK-4384] [BE] Make LIMIT unconditional in FIND query The LIMIT clause was gated on filter/sort flags, so plain paged requests (only limit/offset) at the outer query level would not emit LIMIT. Simplify to always emit LIMIT when the limit parameter is provided. * [OPIK-4384] [BE] Fix comment ordering assertion in tests ClickHouse groupUniqArray does not guarantee ordering, so comment assertions must use ignoringCollectionOrder to avoid flaky failures. * [OPIK-4384] [BE] Add branch conditionals to FIND_GROUPS/FIND_GROUPS_AGGREGATIONS and revert unconditional LIMIT - Wrap SELECT branches in FIND_GROUPS and FIND_GROUPS_AGGREGATIONS with <if(has_aggregated)>/<if(has_raw)> conditionals to skip unnecessary branches when all experiments are aggregated or all are raw - Add no-args getAggregationBranchCounts() overload for workspace-only pre-query (used by group/aggregation queries that lack experiment IDs) - Update executeQueryWithTargetProjects to run both pre-queries in parallel via Mono.zip - Revert commit 215a3f9 (unconditional LIMIT) which caused double LIMIT/OFFSET bug: CTE-level LIMIT + outer LIMIT made page 2+ return 0 results. The complex conditional is correct — outer LIMIT is only needed when post-CTE processing may alter the result set. * [OPIK-4384] [BE] Add branch conditionals to SELECT_DATASET_ITEM_VERSIONS_WITH_EXPERIMENT_ITEMS_COUNT Wrap the UNION ALL in the count query with <if(has_aggregated)>/<if(has_raw)> conditionals to skip unnecessary branches. Pass branch flags through getCountWithExperimentFilters from the existing pre-query results. * [OPIK-4384] [BE] Fix ClickHouse column resolution in COUNT query Alias dataset_item_id as di_id in the COUNT subquery branches to avoid column name ambiguity when ClickHouse 25.3's query analyzer resolves COUNT(DISTINCT dataset_item_id) through a LEFT JOIN with dataset_items_resolved which also has that column. * [OPIK-4384] [BE] Use pre-computed comments in STREAM query and fix UNION ALL type mismatch Aggregated branch now reads comments_array_agg directly from experiment_item_aggregates instead of doing an expensive JOIN to the comments table. Raw branch converts comments to JSON String via toJSONString(CAST(...)) so both branches output compatible types. * [OPIK-4384] [BE] Fix target_project_ids bind error in FIND_GROUPS aggregated branch * [OPIK-4384] [BE] Unify aggregation branch counts with shared ExperimentAggregationSql Extract SELECT_AGGREGATED_EXPERIMENT_IDS SQL and AggregatedExperimentCounts into shared ExperimentAggregationSql utility class. Introduce AggregationBranchCountsCriteria DTO to unify getAggregationBranchCounts overloads across ExperimentDAO, ExperimentItemDAO, and DatasetItemVersionDAO. * [OPIK-4384] [BE] Move getAggregationBranchCounts to ExperimentAggregatesDAO Consolidate aggregation branch counting logic into ExperimentAggregatesDAO instead of a separate utility class. Extract DTOs into their own files in the experiments.aggregations package. * [OPIK-4384] [BE] Deduplicate experiment_aggregates subquery with SELECT DISTINCT Prevent inflated counts from ReplacingMergeTree pre-merge duplicates in the aggregation branch counting query.

@nonnull

…ment by ID (#5579) * [OPIK-4380] [BE] Add experiment aggregates for denormalized metrics - Add experiment_aggregates and experiment_item_aggregates tables - Implement ExperimentAggregatesDAO with population and query methods - Add ExperimentAggregatesService for aggregation management - Refactor DTOs into organized model classes: - ExperimentAggregatesModel: aggregation results - ExperimentEntityData: entity models - ExperimentSourceData: raw source data - ExperimentAggregatesUtils: utilities - Add FEEDBACK_SCORES_AGGREGATED filter strategy for map-based filtering - Add comprehensive integration tests (10/10 passing) - Configure batch size and parallelism settings * [OPIK-4380] [BE] Add MySQL deadlock retry mechanism for concurrent dataset operations Problem: - MySQL deadlock on dataset_version_tags composite PRIMARY KEY (workspace_id, dataset_id, tag) - Occurred during parallel dataset creation in same workspace - Multiple threads inserting "latest" tag for different datasets caused lock contention - Experiments with parallel execution were failing with MySQLTransactionRollbackException Solution: - Add handleOnDeadLocks() method in RetryUtils with: - 5 retry attempts with exponential backoff (250ms to 2s) - 0.5 jitter to reduce thundering herd effect - Recursive isDatabaseDeadlock() detection for MySQLTransactionRollbackException - Apply retry logic in DatasetItemService.setDatasetItemVersion() - Enables concurrent dataset creation for same workspace Impact: - Supports parallel experiment execution with proper deadlock handling - Test success rate improved from 0/10 to 10/10 in ExperimentAggregatesIntegrationTest * Fix visibility * [OPIK-4380] [BE] Address PR review comments for experiment aggregates Fixed 11 automated review comments from baz-reviewer: CRITICAL fixes: - Prevent NPE on null span aggregations by adding coalesce() in SQL - Handle multi-project experiments with LIMIT 1 in GET_PROJECT_ID - Handle zero-item experiments with empty aggregation helpers - Bind feedback_scores_percentiles map instead of empty CAST HIGH priority fixes: - Use toDecimal128(12) instead of toDecimal64(9) for cost percentiles - Add null-safe tags handling with Optional.ofNullable() - Include exception objects in retry logging for stack traces MEDIUM priority fixes: - Add missing log_comment to SELECT_EXPERIMENT_BY_ID query - Add missing log_comment to GET_PROJECT_ID query LOW priority fixes: - Remove duplicate "id" binding in bindItemsParameters - Enhance batchSize config documentation with details All 11 integration tests passing. * [OPIK-4382] [BE] Refactor experiment aggregates with import cleanup and Optional patterns - Add missing imports for IntStream, ProjectStats, and other dependencies - Replace fully-qualified class names with proper imports across DAO and Service classes - Fix IS_NOT_EMPTY filter handling for FEEDBACK_SCORES_AGGREGATED strategies - Refactor null checks to use Optional in mapping methods: - mapFeedbackScoreAggregations, mapExperimentFromAggregates - mapFeedbackScoreData, mapExperimentGroupAggregationItem - Batch insert preparation with Optional chains - Improve code readability and maintainability with functional patterns * [OPIK-4380] [BE] Fix table definition * [OPIK-4380] [BE] Address PR comments and consolidate DatasetItemService methods - Fix tags NPE in ExperimentAggregatesDAO with defaultIfNull - Remove unnecessary FINAL clause from GET_EXPERIMENT_DATA query - Fix test naming in ExperimentAggregatesIntegrationTest - Consolidate 7 duplicate createVersionFromDelta methods into single canonical implementation - Remove debug logger from config-test.yml * [OPIK-4380] [BE] Fix missing log_comment and centralize search criteria binding - Fix SELECT_EXPERIMENT_BY_ID to properly render log_comment metadata - Use getSTWithLogComment pattern in getExperimentFromAggregates - Ensures ClickHouse query logging populates workspace/experiment IDs - Centralize ExperimentSearchCriteria binding logic - Create ExperimentSearchCriteriaBinder utility class - Parameterize filter strategies to support both DAO variants - Eliminate 29-line duplication between ExperimentDAO and ExperimentAggregatesDAO - Single source of truth prevents DAOs from getting out of sync * [OPIK-4380] [BE] Fix createVersionFromDelta consolidation after rebase - Update canonical method signature to include new parameters: - List<EvaluatorItem> evaluators - ExecutionPolicy executionPolicy - boolean clearExecutionPolicy - Update all 5 caller sites to pass new parameters: - Use changes.evaluators(), changes.executionPolicy(), changes.clearExecutionPolicy() when available - Pass null/false for auto-generated versions that inherit from base - Add imports for EvaluatorItem and ExecutionPolicy Fixes compilation errors introduced by rebase with upstream changes to DatasetVersionService * [OPIK-4380] [BE] Address PR review comments - fix type mismatch, extract constants, remove DAO logging - Fixed BigDecimal[] to Double[] conversion for experiment_scores (matches ClickHouse Float64) - Extracted FilterStrategy lists to static final constants to avoid repeated allocations - Added @nonnull validation to populateExperimentAggregate parameter - Removed DAO layer logging, keeping service-level logging only * [OPIK-4380] [BE] Extract shared helper for experiment data pagination Extract streamWithExperimentPagination() helper method to eliminate duplication in getTracesData(), getSpansData(), and getFeedbackScoresData(). All three methods followed identical pattern: - asyncTemplate.stream with connection - getSTWithLogComment with cursor flag - Bind workspace_id, experiment_id, project_id, limit - Optional cursor binding - Result mapping Benefits: - Single source of truth for pagination binding logic - Prevents divergence when tweaking cursor/limit bindings - Reduces code from ~20 lines to ~10 lines per method - Type-safe generic implementation Note: CTE redundancy (3x experiment_items scan) is intentional to avoid passing large trace ID lists as parameters, which would cause performance issues with 10K+ traces. * [OPIK-4383] [BE] Add Redis stream subscriber for debounced experiment aggregates recomputation - Add ExperimentDenormalizationConfig implementing StreamConfiguration with debounce, job lock, and per-experiment aggregation lock settings - Add ExperimentAggregationMessage as stream message record - Add ExperimentAggregatesSubscriber consuming from the denormalization stream; acquires a workspace-scoped distributed lock per experiment before calling populateAggregations() - Add experimentDenormalizationEnabled feature flag to ServiceTogglesConfig and FeatureFlags - Wire ExperimentDenormalizationConfig into OpikConfiguration - Update config.yml and config-test.yml with full experimentDenormalization block (enabled for tests) - Add ExperimentAggregatesSubscriberTest covering lifecycle gating and processEvent success/error paths * Revision 2: Address PR comments - add config defaults, remove toggle, rename tests - ExperimentDenormalizationConfig: add sensible defaults to all fields so Dropwizard validation doesn't fail when the config block is absent from old deployments (config.isEnabled()=false still gates the subscriber) - Remove experimentDenormalizationEnabled service toggle from ServiceTogglesConfig, FeatureFlags, config.yml and config-test.yml - the infrastructure gate (config.isEnabled()) is the single control point - Rename lifecycle test methods to camelCase per project conventions: startSkipsStartupWhenDisabled / stopSkipsShutdownWhenDisabled * Revision 3: Add @max(500) to consumerBatchSize and @NotNull to jobLockWaitTime * [OPIK-4380] [BE] Address PR review comments - fix TYPE_REFERENCE visibility, redundant IN subquery, hardcoded context keys, Instant.now in loop, and inline defaultIfNull * Revision 4: Address remaining JetoPistola review comments (#7, #8, #10) - #7: Remove "Used for testing and verification" from getExperimentFromAggregates javadoc - #8: Replace recursive flatMap with Mono.expand() in populateExperimentItemsInBatches - #10: Remove unrelated subscribeOn addition from DatasetItemService.createVersionFromDelta * Revision 3: Add switchIfEmpty fallback for deleted traces in populateExperimentAggregate * Fix tests * Revision 6: Move countTotal log from DAO to service layer Operational logs belong in the service layer, not the DAO. * Revision 7: Apply Spotless formatting * Revision 8: Make populateAggregations(UUID, int) private Removes the uncapped public batch size entry point. All callers now go through the public no-arg overload which reads batchSize safely from config. * [OPIK-4380] [BE] Add evaluation_method support to experiment_aggregates pipeline - Add ClickHouse migration (000062) to add evaluation_method column to experiment_aggregates table - Add evaluationMethod field to ExperimentData record - Update GET_EXPERIMENT_DATA query to read evaluation_method from experiments - Update INSERT_EXPERIMENT_AGGREGATE to write evaluation_method to experiment_aggregates - Update SELECT_EXPERIMENT_BY_ID to read evaluation_method from experiment_aggregates - Fix Experiment record constructor call: insert EvaluationMethod at correct position (10) * [OPIK-4380] [BE] Extract shared helper for experiment aggregation queries Reduce copy-paste in getTraceAggregations, getSpanAggregations, and getFeedbackScoreAggregations by extracting queryExperimentAggregation, which centralises the context-aware execution, workspace/experiment/project parameter binding, and singleOrEmpty pattern shared by all three methods. * [OPIK-4380] [BE] Enforce non-null contract on countTotal criteria parameter Add @nonnull to ExperimentSearchCriteria in the interface and implementation so that a null argument fails fast with an explicit NullPointerException at the DAO boundary instead of crashing deep inside buildCountTemplate. * [OPIK-4380] [BE] Fix countTotal ignoring target project IDs in normal path target_project_ids was only applied inside the project_deleted LEFT JOIN subquery; the main WHERE had no project restriction, so counts were workspace-wide. Reuse has_target_projects in the main WHERE so project_id IN :target_project_ids always takes effect. Also replace manual null/empty checks with CollectionUtils.isNotEmpty. * [OPIK-4380] [BE] Apply Spotless formatting * [OPIK-4382] [BE] Address PR review comments on experiment aggregates - Fix :versionId → :version_id parameter naming in SQL templates and bindings - Fix last_updated_at binding to use item.lastUpdatedAt() instead of Instant.now() - Fix FEEDBACK_SCORES_AGGREGATED_IS_EMPTY filter: embed generated SQL into templates instead of hard-coded static condition, and add missing bind calls - Fix RetryUtils log duplication (remove getMessage() + pass exception directly) - Add batchSize = 1000 default in ExperimentAggregatesConfig - Extract resolveVersionIdForCriteria helper to deduplicate version-id resolution - Add null/blank/ClickHouse placeholder guards in extractUuidsFromGroupValues - Extract loadEntityMap helper to deduplicate getEnrichInfoHolder enrichment logic * Revision 3: Address PR comments E, F, G, H - Fix E: Extract shared template/bind helpers in ExperimentAggregatesDAO - Fix F: Bind experiment_ids as UUID[] instead of String[] - Fix G+H: Extract getEnrichInfoHolder logic into ExperimentGroupEnricher, eliminating duplication between ExperimentService and ExperimentAggregatesService without introducing a direct dependency between them * Revision 4: Fix ExperimentServiceTest to include ExperimentGroupEnricher mock * [OPIK-4382] [BE] Extract shared Row→ExperimentGroup mappers into ExperimentGroupMappers Pull the duplicated Row→ExperimentGroupItem and Row→ExperimentGroupAggregationItem conversion logic from ExperimentDAO and ExperimentAggregatesDAO into a shared ExperimentGroupMappers utility class. Both DAOs now delegate to the same toExperimentGroupItem / toExperimentGroupAggregationItem helpers, eliminating the need to mirror DTO mapping changes in two places. * [OPIK-4382] [BE] Deduplicate bindGroupCriteria into ExperimentGroupMappers Moves the shared group-criteria binding logic out of ExperimentDAO and ExperimentAggregatesDAO into ExperimentGroupMappers.bindGroupCriteria(), following the same pattern as ExperimentSearchCriteriaBinder. Adding or fixing a criteria binding now only requires a change in one place. * [OPIK-4382] [BE] Extract streamGroupQuery helper and fix null percentiles - Deduplicate findGroups/findGroupsAggregations into a single private streamGroupQuery(queryTemplate, criteria, rowMapper) that differs only by the query constant and BiFunction row mapper. - Fix convertToBigDecimal to return null for null/unsupported inputs so absent p50/p90/p99 entries in getDuration propagate as null rather than BigDecimal.ZERO, preserving the semantic-null that lets callers apply COALESCE/fallback logic correctly. * [OPIK-4382] [BE] Consolidate cost/duration helpers into ExperimentGroupMappers Promote getCostValue and getDuration to public static in ExperimentGroupMappers and delete the private copies in ExperimentDAO. ExperimentDAO.mapToDto now delegates to the shared helpers, so any future change to cost filtering, duration percentile extraction, or the BigDecimal conversion only needs to be made in one place. Side-effect: ExperimentDAO.mapToDto also picks up the null-percentile fix (convertToBigDecimal returns null for absent/unsupported inputs) that was previously applied only to ExperimentGroupMappers. * [OPIK-4382] [BE] Fix pagination count and add criteria filter tests - Remove count() OVER () window function from paged query (returned page-scoped count instead of full result-set count) - Replace with dedicated count query + short-circuit: skip items query when count == 0, use DatasetItemPage.empty() for that case - Extract DatasetItemResultMapper.buildItemFromRow as public static helper reused by ExperimentAggregatesDAO - Add parameterized integration tests for ExperimentGroupCriteria filters (name, types, projectId, combined, empty-result) covering both findGroups and findGroupsAggregations aggregate paths * [OPIK-4380] [BE] Extract shared filter helpers into FilterQueryBuilder Add FilterStrategyParam record, applyFiltersToTemplate and bindFilters static helpers to FilterQueryBuilder, then replace duplicated per-strategy loops in DatasetItemVersionDAO and ExperimentAggregatesDAO with single delegating calls backed by per-DAO strategy constants. * [OPIK-4382] [BE] Consolidate filter helpers in getExperimentItemsStatsFromAggregates Add EXPERIMENT_ITEMS_STATS_FILTER_STRATEGY_PARAMS and EXPERIMENT_ITEMS_STATS_BIND_STRATEGIES constants and replace the per-strategy toAnalyticsDbFilters/bind blocks in getExperimentItemsStatsFromAggregates with single delegating calls to FilterQueryBuilder.applyFiltersToTemplate and bindFilters. * Revision 9: Extract shared helpers to eliminate duplication across DAOs - Create DatasetItemSearchCriteriaMapper: centralizes filters + search flag wiring for DatasetItemSearchCriteria, shared by DatasetItemVersionDAO and ExperimentAggregatesDAO - Add ExperimentGroupMappers.applyGroupCriteriaToTemplate: centralizes ExperimentGroupCriteria → ST template wiring, now shared by ExperimentDAO and ExperimentAggregatesDAO - Update DatasetItemVersionDAO.addFiltersToTemplate and bindSearchAndFilters to delegate to DatasetItemSearchCriteriaMapper - Update ExperimentAggregatesDAO.applyDatasetItemFiltersToTemplate and bindDatasetItemSearchParams to delegate to DatasetItemSearchCriteriaMapper - Update ExperimentDAO.newGroupTemplate and ExperimentAggregatesDAO.newGroupTemplate to delegate to ExperimentGroupMappers.applyGroupCriteriaToTemplate * [OPIK-4383] [BE] Add experiment aggregate event listener and no-op publisher * Revision 2: Fix missing import for ExperimentAggregationPublisher * [OPIK-4383] [BE] Add ExperimentAggregationPublisher, ExperimentDenormalizationJob and tests - ExperimentAggregationPublisher: debounces experiment aggregation triggers by writing compound workspaceId:experimentId members to a Redis ZSET scored by expiry timestamp (now + debounceDelay), plus a hash storing the userName with TTL=2×debounceDelay to handle stale entries. - ExperimentDenormalizationJob: @every("5s") job that reads ZSET members with score <= now, publishes ExperimentAggregationMessage to the Redis stream, then cleans up the ZSET entry and hash bucket. Handles stale entries (expired hash) by removing the orphaned ZSET member without publishing. - Fix processExperiment reactive chain: avoided double index.remove by returning Mono<Boolean> from flatMap branches so switchIfEmpty is only triggered when the bucket is truly empty. - ExperimentAggregationPublisherTest: integration tests with real Redis container verifying ZSET membership, score, userName storage, TTL, workspace isolation, and debounce deduplication. - ExperimentDenormalizationJobTest: unit tests with Mockito covering disabled config, lock not acquired, empty ZSET, happy path, stale entry, and batch. * Fix tests setup * [OPIK-4383] [BE] Address PR review: move DAO logs to service layer * [OPIK-4383] [BE] Address PR review: extract shared DAO helper and fix log placement * [OPIK-4383] [BE] Short-circuit deleteByTraceIds when no spans found Skip delete, cascading operations, and SpansDeleted event when getSpanIdsForTraces returns an empty set, preserving the original no-op behaviour and avoiding the Preconditions.checkArgument failure in SpanDAO.deleteByIds. * [OPIK-4383] [BE] Fix cascade deletion failures after trace delete Two bugs prevented spans and attachments from being deleted when a trace was deleted via the event-driven cascade: 1. FeedbackScoreService.deleteByTraceIds/deleteBySpanIds had @nonnull on projectId which threw NPE when TracesDeleted.projectId() was null. EventInterceptor swallowed the NPE, stopping the entire cascade chain. Fix: remove @nonnull since the DAO already handles null safely via Optional.ofNullable(projectId). 2. SpanDAO.DELETE_BY_IDS had the wrong column (trace_id) and parameter name (span_ids) — the ClickHouse R2DBC driver could not resolve :span_ids as a named parameter in the DELETE statement. Fixed by using id IN :ids to match the working pattern in TraceDAO.DELETE_BY_ID. * [OPIK-4383] [BE] Address PR review comments on ExperimentDenormalizationJob - Centralize Redis constants (EXPERIMENT_KEY_PREFIX, USER_NAME_FIELD, MEMBER_SEPARATOR) in ExperimentDenormalizationConfig - Change ExperimentAggregationPublisher.publish() to return Mono<Void> instead of void, so errors propagate to callers - Make job interval configurable via jobs map in config.yml - Fix onErrorContinue logging: remove getMessage() duplication - Demote per-experiment logs from INFO to DEBUG - Add ZSET pagination using expand() to avoid materializing entire range - Update tests for all changes * Fix @every job interval config key casing and add jobs section to test config The dropwizard-jobs framework uses WordUtils.uncapitalize(class.getSimpleName()) to look up the interval in the jobs map, so the key must be 'experimentDenormalizationJob' (lowercase first letter). Also adds the missing jobs section and jobBatchSize to config-test.yml. * Replace @every annotation with programmatic Quartz scheduling Remove @every from ExperimentDenormalizationJob and schedule it programmatically in OpikGuiceyLifecycleEventListener, following the same pattern as TraceThreadsClosingJob. Add jobInterval config field to ExperimentDenormalizationConfig. Remove the jobs YAML section that caused deserialization errors with JobConfiguration's immutable map. * Add experiment context to error log and extract publishIfNotEmpty helper - Include experimentId and workspaceId in onExperimentUpdated error log - Extract publishIfNotEmpty helper to deduplicate filter+publish logic across triggerByExperimentIds, triggerByTraceIds, triggerBySpanIds * Fix NPE in ExperimentAggregateEventListenerTest mock setup Stub publisher.publish() to return Mono.empty() in setUp so .subscribe() calls in production code don't NPE on null. * [OPIK-4385] [BE] Use pre-computed aggregation tables for experiment endpoints Apply UNION ALL hybrid pattern to ExperimentDAO (FIND, FIND_GROUPS, FIND_GROUPS_AGGREGATIONS) and ExperimentItemDAO (STREAM, SELECT_DATASET_ITEM_VERSIONS_WITH_EXPERIMENT_ITEMS_COUNT, SELECT_DATASET_ITEM_VERSIONS_WITH_EXPERIMENT_ITEMS, SELECT_DATASET_ITEM_VERSIONS_WITH_EXPERIMENT_ITEMS_STATS) so that experiments present in experiment_aggregates / experiment_item_aggregates use pre-computed values, while others fall back to live JOIN computation. Add ExperimentAggregatesIntegrationTest covering all 7 affected queries with parameterized filter, pagination, and consistency scenarios. * [OPIK-4386] [BE] Trigger lazy aggregation via publisher on GET experiment by ID When fetching an experiment by ID, if the experiment is in COMPLETED or CANCELLED state and is not yet present in the experiment_aggregates table, enqueue it for aggregation using ExperimentAggregationPublisher instead of computing aggregations synchronously. The check and publish are performed off the critical path via doOnEach, so the caller receives the experiment immediately without waiting for the side effect to complete. * [OPIK-4384] [BE] Fix missing zero_uuid binding and experiment_scores sort alias - Bind zero_uuid parameter in getById, getByIds, and get(ExperimentStreamRequest) methods that use the FIND query; the UNION ALL refactor introduced an experiments_from_aggregates CTE that requires this parameter but only the main find() method was binding it, causing 500 errors on those paths - Fix SortingQueryBuilder to reference the outer column alias experiment_scores_agg instead of es.experiment_scores; the ORDER BY sits outside the UNION ALL so the inner es alias is out of scope, while experiment_scores_agg is the consistent output alias exposed by both branches * [OPIK-4384] [BE] Fix null row injection from LEFT JOIN miss in feedback_scores and comments aggregation Pre-aggregate feedback_scores_final and comments_final into subqueries (GROUP BY entity_id) before LEFT JOIN in DatasetItemVersionDAO.STREAM. When a LEFT JOIN has no match against a pre-aggregated subquery the joined columns are NULL, so any(NULL) returns NULL instead of a default-valued row with epoch timestamps that caused Instant.parse() failures. Also adds a regression test covering the no-scores path in ExperimentAggregatesIntegrationTest. * [OPIK-4383] [BE] Remove DAO-level log.info from ExperimentAggregatesDAO methods Move operational logging responsibility to the service layer, consistent with earlier fixes for ExperimentItemDAO and SpanDAO in this PR. * Remove accidentally committed doc files These files were introduced during merge resolution but should not be part of the branch. * [OPIK-4383] [BE] refactor: extract triggerAggregation helper to centralize guard+publish flow * [OPIK-4386] [BE] fix: demote lazy aggregation check log to DEBUG * [OPIK-4383] [BE] fix: restore TagOperations.tagUpdateFragment in SpanDAO BULK_UPDATE Restores proper tag handling in SpanDAO.BULK_UPDATE query that was regressed to a simple arrayConcat. Now uses TagOperations.tagUpdateFragment() which provides arrayDistinct(), tag limit enforcement (max 50), and tags_to_add/tags_to_remove support. Also adds the required short_circuit_function_evaluation SETTINGS for throwIf evaluation. * Adding InterruptableJob * [OPIK-4383] [BE] Address PR review: expand safety valve, env var prefix - Add batchSize-capped iteration counter to expand() to prevent infinite loops when ZSET entries fail to be removed - Rename EXPERIMENT_DENORM_JOB_INTERVAL to OPIK_EXPERIMENT_DENORM_JOB_INTERVAL to follow the OPIK_ prefix convention * [OPIK-4384] [BE] Add branch optimization and CTE split to experiment queries Use pre-computed experiment_aggregates table to optimize query execution: - Add has_aggregated/has_raw flags to skip unnecessary UNION ALL branches in FIND/FIND_COUNT - Add getAggregationBranchCounts pre-query to determine which branches are needed - Apply CTE split pattern to FIND_GROUPS and FIND_GROUPS_AGGREGATIONS - Update getById to leverage branch optimization via single-ID branch count query - Add <if(id)> filter to SELECT_AGGREGATED_EXPERIMENT_IDS for getById support * [OPIK-4384] [BE] Add missing 7-arg overload for getDatasetItemsWithExperimentItems Fix test compilation error from merge: the remote branch added callers with (UUID, List, null, null, List<SortingField>, String, String) signature which needs a bridge overload to the 9-arg method. * [OPIK-4384] [BE] Add conditional LIMIT push-up, missing CTE, and fix test precision - Add conditional LIMIT push-up in STREAM query: push LIMIT into CTE when only one branch (raw or aggregated) is active for performance - Add missing experiment_item_aggr_trace_scope CTE for aggregated branch - Add AggregatedExperimentCounts record for experiment-level branching - Fix MultiValueFeedbackScoresE2ETest precision assertion: use isEqualTo instead of isEqualByComparingTo to respect custom BigDecimal comparator * [OPIK-4384] [BE] Push OFFSET into top_dataset_items CTE and fix BigDecimal comparator in DatasetsResourceTest * [OPIK-4384] [BE] Add pass rate aggregation to experiment aggregates Add pass_rate, passed_count, and total_count columns to experiment_aggregates table and compute them during aggregation. Update ExperimentDAO queries to select these columns from both raw and aggregated paths, returning NULL for non-evaluation-suite experiments. * Fix format * Fix get by id * Fix mapping * Fix mapping * [OPIK-4384] [BE] Use pre-aggregated comments from aggregate tables with ISO 8601 date formatting Update retrieval queries in ExperimentDAO, DatasetItemVersionDAO, and ExperimentAggregatesDAO to read comments_array_agg as JSON String from aggregate tables instead of live-querying the comments table. Ensure UNION ALL type compatibility by wrapping raw paths with toJSONString() and formatting dates as ISO 8601 for proper Jackson deserialization. * [OPIK-4386] [BE] Increase debounceDelay in test config to prevent race condition The denormalization job was processing finished experiments during test execution with incomplete ClickHouse data, causing stale aggregated values to be returned instead of fresh raw computations. * [OPIK-4384] [BE] Use parameterized binding for dynamic sort keys and add deterministic tiebreaker - Replace literal string interpolation in getTopSortExpression with parameterized bind variables (sf.bindKey()) to prevent SQL injection - Remove fieldMapping filter from bindDynamicKeys so all dynamic keys are bound, including those used in the top_sorting SELECT expression - Add deterministic tiebreaker (id DESC / dataset_item_id DESC) to both the push-top-limit CTE and the main ORDER BY for consistent pagination - Fix experiment_items deduplication: use FINAL where DISTINCT was used and vice versa for consistency across query branches * [OPIK-4384] [BE] Add mixed-state aggregation test for UNION ALL hybrid Test creates 3 experiments, aggregates only 1, and queries all 3 to exercise the UNION ALL hybrid path where has_aggregated and has_raw are both true simultaneously. * [OPIK-4384] [BE] Add isNotEmpty assertions to parameterized filter tests Ensure filter scenarios actually match data by asserting content() is not empty before and after aggregation in all parameterized filter tests (find, findGroups, findGroupsAggregations). * [OPIK-4384] [BE] refactor: extract assertion helpers to remove duplication in ExperimentAggregatesIntegrationTest * [OPIK-4384] [BE] refactor: rename parseFlexibleInstant to parseInstant in FeedbackScoreMapper * [OPIK-4384] [BE] Make LIMIT unconditional in FIND query The LIMIT clause was gated on filter/sort flags, so plain paged requests (only limit/offset) at the outer query level would not emit LIMIT. Simplify to always emit LIMIT when the limit parameter is provided. * [OPIK-4384] [BE] Fix comment ordering assertion in tests ClickHouse groupUniqArray does not guarantee ordering, so comment assertions must use ignoringCollectionOrder to avoid flaky failures. * [OPIK-4384] [BE] Add branch conditionals to FIND_GROUPS/FIND_GROUPS_AGGREGATIONS and revert unconditional LIMIT - Wrap SELECT branches in FIND_GROUPS and FIND_GROUPS_AGGREGATIONS with <if(has_aggregated)>/<if(has_raw)> conditionals to skip unnecessary branches when all experiments are aggregated or all are raw - Add no-args getAggregationBranchCounts() overload for workspace-only pre-query (used by group/aggregation queries that lack experiment IDs) - Update executeQueryWithTargetProjects to run both pre-queries in parallel via Mono.zip - Revert commit 215a3f9 (unconditional LIMIT) which caused double LIMIT/OFFSET bug: CTE-level LIMIT + outer LIMIT made page 2+ return 0 results. The complex conditional is correct — outer LIMIT is only needed when post-CTE processing may alter the result set. * [OPIK-4384] [BE] Add branch conditionals to SELECT_DATASET_ITEM_VERSIONS_WITH_EXPERIMENT_ITEMS_COUNT Wrap the UNION ALL in the count query with <if(has_aggregated)>/<if(has_raw)> conditionals to skip unnecessary branches. Pass branch flags through getCountWithExperimentFilters from the existing pre-query results. * [OPIK-4384] [BE] Fix ClickHouse column resolution in COUNT query Alias dataset_item_id as di_id in the COUNT subquery branches to avoid column name ambiguity when ClickHouse 25.3's query analyzer resolves COUNT(DISTINCT dataset_item_id) through a LEFT JOIN with dataset_items_resolved which also has that column. * [OPIK-4384] [BE] Use pre-computed comments in STREAM query and fix UNION ALL type mismatch Aggregated branch now reads comments_array_agg directly from experiment_item_aggregates instead of doing an expensive JOIN to the comments table. Raw branch converts comments to JSON String via toJSONString(CAST(...)) so both branches output compatible types. * [OPIK-4384] [BE] Fix target_project_ids bind error in FIND_GROUPS aggregated branch * Update config-test.yml * Update ExperimentService.java * Remove old unused query * [OPIK-4386] [BE] Address PR review comments: demote log to debug, add try-catch for context safety, add getById lazy aggregation tests * [OPIK-4386] [BE] Add workspaceId to lazy aggregation log messages

@nonnull

…nts endpoint (#5583) * [OPIK-4380] [BE] Add experiment aggregates for denormalized metrics - Add experiment_aggregates and experiment_item_aggregates tables - Implement ExperimentAggregatesDAO with population and query methods - Add ExperimentAggregatesService for aggregation management - Refactor DTOs into organized model classes: - ExperimentAggregatesModel: aggregation results - ExperimentEntityData: entity models - ExperimentSourceData: raw source data - ExperimentAggregatesUtils: utilities - Add FEEDBACK_SCORES_AGGREGATED filter strategy for map-based filtering - Add comprehensive integration tests (10/10 passing) - Configure batch size and parallelism settings * [OPIK-4380] [BE] Add MySQL deadlock retry mechanism for concurrent dataset operations Problem: - MySQL deadlock on dataset_version_tags composite PRIMARY KEY (workspace_id, dataset_id, tag) - Occurred during parallel dataset creation in same workspace - Multiple threads inserting "latest" tag for different datasets caused lock contention - Experiments with parallel execution were failing with MySQLTransactionRollbackException Solution: - Add handleOnDeadLocks() method in RetryUtils with: - 5 retry attempts with exponential backoff (250ms to 2s) - 0.5 jitter to reduce thundering herd effect - Recursive isDatabaseDeadlock() detection for MySQLTransactionRollbackException - Apply retry logic in DatasetItemService.setDatasetItemVersion() - Enables concurrent dataset creation for same workspace Impact: - Supports parallel experiment execution with proper deadlock handling - Test success rate improved from 0/10 to 10/10 in ExperimentAggregatesIntegrationTest * Fix visibility * [OPIK-4380] [BE] Address PR review comments for experiment aggregates Fixed 11 automated review comments from baz-reviewer: CRITICAL fixes: - Prevent NPE on null span aggregations by adding coalesce() in SQL - Handle multi-project experiments with LIMIT 1 in GET_PROJECT_ID - Handle zero-item experiments with empty aggregation helpers - Bind feedback_scores_percentiles map instead of empty CAST HIGH priority fixes: - Use toDecimal128(12) instead of toDecimal64(9) for cost percentiles - Add null-safe tags handling with Optional.ofNullable() - Include exception objects in retry logging for stack traces MEDIUM priority fixes: - Add missing log_comment to SELECT_EXPERIMENT_BY_ID query - Add missing log_comment to GET_PROJECT_ID query LOW priority fixes: - Remove duplicate "id" binding in bindItemsParameters - Enhance batchSize config documentation with details All 11 integration tests passing. * [OPIK-4382] [BE] Refactor experiment aggregates with import cleanup and Optional patterns - Add missing imports for IntStream, ProjectStats, and other dependencies - Replace fully-qualified class names with proper imports across DAO and Service classes - Fix IS_NOT_EMPTY filter handling for FEEDBACK_SCORES_AGGREGATED strategies - Refactor null checks to use Optional in mapping methods: - mapFeedbackScoreAggregations, mapExperimentFromAggregates - mapFeedbackScoreData, mapExperimentGroupAggregationItem - Batch insert preparation with Optional chains - Improve code readability and maintainability with functional patterns * [OPIK-4380] [BE] Fix table definition * [OPIK-4380] [BE] Address PR comments and consolidate DatasetItemService methods - Fix tags NPE in ExperimentAggregatesDAO with defaultIfNull - Remove unnecessary FINAL clause from GET_EXPERIMENT_DATA query - Fix test naming in ExperimentAggregatesIntegrationTest - Consolidate 7 duplicate createVersionFromDelta methods into single canonical implementation - Remove debug logger from config-test.yml * [OPIK-4380] [BE] Fix missing log_comment and centralize search criteria binding - Fix SELECT_EXPERIMENT_BY_ID to properly render log_comment metadata - Use getSTWithLogComment pattern in getExperimentFromAggregates - Ensures ClickHouse query logging populates workspace/experiment IDs - Centralize ExperimentSearchCriteria binding logic - Create ExperimentSearchCriteriaBinder utility class - Parameterize filter strategies to support both DAO variants - Eliminate 29-line duplication between ExperimentDAO and ExperimentAggregatesDAO - Single source of truth prevents DAOs from getting out of sync * [OPIK-4380] [BE] Fix createVersionFromDelta consolidation after rebase - Update canonical method signature to include new parameters: - List<EvaluatorItem> evaluators - ExecutionPolicy executionPolicy - boolean clearExecutionPolicy - Update all 5 caller sites to pass new parameters: - Use changes.evaluators(), changes.executionPolicy(), changes.clearExecutionPolicy() when available - Pass null/false for auto-generated versions that inherit from base - Add imports for EvaluatorItem and ExecutionPolicy Fixes compilation errors introduced by rebase with upstream changes to DatasetVersionService * [OPIK-4380] [BE] Address PR review comments - fix type mismatch, extract constants, remove DAO logging - Fixed BigDecimal[] to Double[] conversion for experiment_scores (matches ClickHouse Float64) - Extracted FilterStrategy lists to static final constants to avoid repeated allocations - Added @nonnull validation to populateExperimentAggregate parameter - Removed DAO layer logging, keeping service-level logging only * [OPIK-4380] [BE] Extract shared helper for experiment data pagination Extract streamWithExperimentPagination() helper method to eliminate duplication in getTracesData(), getSpansData(), and getFeedbackScoresData(). All three methods followed identical pattern: - asyncTemplate.stream with connection - getSTWithLogComment with cursor flag - Bind workspace_id, experiment_id, project_id, limit - Optional cursor binding - Result mapping Benefits: - Single source of truth for pagination binding logic - Prevents divergence when tweaking cursor/limit bindings - Reduces code from ~20 lines to ~10 lines per method - Type-safe generic implementation Note: CTE redundancy (3x experiment_items scan) is intentional to avoid passing large trace ID lists as parameters, which would cause performance issues with 10K+ traces. * [OPIK-4383] [BE] Add Redis stream subscriber for debounced experiment aggregates recomputation - Add ExperimentDenormalizationConfig implementing StreamConfiguration with debounce, job lock, and per-experiment aggregation lock settings - Add ExperimentAggregationMessage as stream message record - Add ExperimentAggregatesSubscriber consuming from the denormalization stream; acquires a workspace-scoped distributed lock per experiment before calling populateAggregations() - Add experimentDenormalizationEnabled feature flag to ServiceTogglesConfig and FeatureFlags - Wire ExperimentDenormalizationConfig into OpikConfiguration - Update config.yml and config-test.yml with full experimentDenormalization block (enabled for tests) - Add ExperimentAggregatesSubscriberTest covering lifecycle gating and processEvent success/error paths * Revision 2: Address PR comments - add config defaults, remove toggle, rename tests - ExperimentDenormalizationConfig: add sensible defaults to all fields so Dropwizard validation doesn't fail when the config block is absent from old deployments (config.isEnabled()=false still gates the subscriber) - Remove experimentDenormalizationEnabled service toggle from ServiceTogglesConfig, FeatureFlags, config.yml and config-test.yml - the infrastructure gate (config.isEnabled()) is the single control point - Rename lifecycle test methods to camelCase per project conventions: startSkipsStartupWhenDisabled / stopSkipsShutdownWhenDisabled * Revision 3: Add @max(500) to consumerBatchSize and @NotNull to jobLockWaitTime * [OPIK-4380] [BE] Address PR review comments - fix TYPE_REFERENCE visibility, redundant IN subquery, hardcoded context keys, Instant.now in loop, and inline defaultIfNull * Revision 4: Address remaining JetoPistola review comments (#7, #8, #10) - #7: Remove "Used for testing and verification" from getExperimentFromAggregates javadoc - #8: Replace recursive flatMap with Mono.expand() in populateExperimentItemsInBatches - #10: Remove unrelated subscribeOn addition from DatasetItemService.createVersionFromDelta * Revision 3: Add switchIfEmpty fallback for deleted traces in populateExperimentAggregate * Fix tests * Revision 6: Move countTotal log from DAO to service layer Operational logs belong in the service layer, not the DAO. * Revision 7: Apply Spotless formatting * Revision 8: Make populateAggregations(UUID, int) private Removes the uncapped public batch size entry point. All callers now go through the public no-arg overload which reads batchSize safely from config. * [OPIK-4380] [BE] Add evaluation_method support to experiment_aggregates pipeline - Add ClickHouse migration (000062) to add evaluation_method column to experiment_aggregates table - Add evaluationMethod field to ExperimentData record - Update GET_EXPERIMENT_DATA query to read evaluation_method from experiments - Update INSERT_EXPERIMENT_AGGREGATE to write evaluation_method to experiment_aggregates - Update SELECT_EXPERIMENT_BY_ID to read evaluation_method from experiment_aggregates - Fix Experiment record constructor call: insert EvaluationMethod at correct position (10) * [OPIK-4380] [BE] Extract shared helper for experiment aggregation queries Reduce copy-paste in getTraceAggregations, getSpanAggregations, and getFeedbackScoreAggregations by extracting queryExperimentAggregation, which centralises the context-aware execution, workspace/experiment/project parameter binding, and singleOrEmpty pattern shared by all three methods. * [OPIK-4380] [BE] Enforce non-null contract on countTotal criteria parameter Add @nonnull to ExperimentSearchCriteria in the interface and implementation so that a null argument fails fast with an explicit NullPointerException at the DAO boundary instead of crashing deep inside buildCountTemplate. * [OPIK-4380] [BE] Fix countTotal ignoring target project IDs in normal path target_project_ids was only applied inside the project_deleted LEFT JOIN subquery; the main WHERE had no project restriction, so counts were workspace-wide. Reuse has_target_projects in the main WHERE so project_id IN :target_project_ids always takes effect. Also replace manual null/empty checks with CollectionUtils.isNotEmpty. * [OPIK-4380] [BE] Apply Spotless formatting * [OPIK-4382] [BE] Address PR review comments on experiment aggregates - Fix :versionId → :version_id parameter naming in SQL templates and bindings - Fix last_updated_at binding to use item.lastUpdatedAt() instead of Instant.now() - Fix FEEDBACK_SCORES_AGGREGATED_IS_EMPTY filter: embed generated SQL into templates instead of hard-coded static condition, and add missing bind calls - Fix RetryUtils log duplication (remove getMessage() + pass exception directly) - Add batchSize = 1000 default in ExperimentAggregatesConfig - Extract resolveVersionIdForCriteria helper to deduplicate version-id resolution - Add null/blank/ClickHouse placeholder guards in extractUuidsFromGroupValues - Extract loadEntityMap helper to deduplicate getEnrichInfoHolder enrichment logic * Revision 3: Address PR comments E, F, G, H - Fix E: Extract shared template/bind helpers in ExperimentAggregatesDAO - Fix F: Bind experiment_ids as UUID[] instead of String[] - Fix G+H: Extract getEnrichInfoHolder logic into ExperimentGroupEnricher, eliminating duplication between ExperimentService and ExperimentAggregatesService without introducing a direct dependency between them * Revision 4: Fix ExperimentServiceTest to include ExperimentGroupEnricher mock * [OPIK-4382] [BE] Extract shared Row→ExperimentGroup mappers into ExperimentGroupMappers Pull the duplicated Row→ExperimentGroupItem and Row→ExperimentGroupAggregationItem conversion logic from ExperimentDAO and ExperimentAggregatesDAO into a shared ExperimentGroupMappers utility class. Both DAOs now delegate to the same toExperimentGroupItem / toExperimentGroupAggregationItem helpers, eliminating the need to mirror DTO mapping changes in two places. * [OPIK-4382] [BE] Deduplicate bindGroupCriteria into ExperimentGroupMappers Moves the shared group-criteria binding logic out of ExperimentDAO and ExperimentAggregatesDAO into ExperimentGroupMappers.bindGroupCriteria(), following the same pattern as ExperimentSearchCriteriaBinder. Adding or fixing a criteria binding now only requires a change in one place. * [OPIK-4382] [BE] Extract streamGroupQuery helper and fix null percentiles - Deduplicate findGroups/findGroupsAggregations into a single private streamGroupQuery(queryTemplate, criteria, rowMapper) that differs only by the query constant and BiFunction row mapper. - Fix convertToBigDecimal to return null for null/unsupported inputs so absent p50/p90/p99 entries in getDuration propagate as null rather than BigDecimal.ZERO, preserving the semantic-null that lets callers apply COALESCE/fallback logic correctly. * [OPIK-4382] [BE] Consolidate cost/duration helpers into ExperimentGroupMappers Promote getCostValue and getDuration to public static in ExperimentGroupMappers and delete the private copies in ExperimentDAO. ExperimentDAO.mapToDto now delegates to the shared helpers, so any future change to cost filtering, duration percentile extraction, or the BigDecimal conversion only needs to be made in one place. Side-effect: ExperimentDAO.mapToDto also picks up the null-percentile fix (convertToBigDecimal returns null for absent/unsupported inputs) that was previously applied only to ExperimentGroupMappers. * [OPIK-4382] [BE] Fix pagination count and add criteria filter tests - Remove count() OVER () window function from paged query (returned page-scoped count instead of full result-set count) - Replace with dedicated count query + short-circuit: skip items query when count == 0, use DatasetItemPage.empty() for that case - Extract DatasetItemResultMapper.buildItemFromRow as public static helper reused by ExperimentAggregatesDAO - Add parameterized integration tests for ExperimentGroupCriteria filters (name, types, projectId, combined, empty-result) covering both findGroups and findGroupsAggregations aggregate paths * [OPIK-4380] [BE] Extract shared filter helpers into FilterQueryBuilder Add FilterStrategyParam record, applyFiltersToTemplate and bindFilters static helpers to FilterQueryBuilder, then replace duplicated per-strategy loops in DatasetItemVersionDAO and ExperimentAggregatesDAO with single delegating calls backed by per-DAO strategy constants. * [OPIK-4382] [BE] Consolidate filter helpers in getExperimentItemsStatsFromAggregates Add EXPERIMENT_ITEMS_STATS_FILTER_STRATEGY_PARAMS and EXPERIMENT_ITEMS_STATS_BIND_STRATEGIES constants and replace the per-strategy toAnalyticsDbFilters/bind blocks in getExperimentItemsStatsFromAggregates with single delegating calls to FilterQueryBuilder.applyFiltersToTemplate and bindFilters. * Revision 9: Extract shared helpers to eliminate duplication across DAOs - Create DatasetItemSearchCriteriaMapper: centralizes filters + search flag wiring for DatasetItemSearchCriteria, shared by DatasetItemVersionDAO and ExperimentAggregatesDAO - Add ExperimentGroupMappers.applyGroupCriteriaToTemplate: centralizes ExperimentGroupCriteria → ST template wiring, now shared by ExperimentDAO and ExperimentAggregatesDAO - Update DatasetItemVersionDAO.addFiltersToTemplate and bindSearchAndFilters to delegate to DatasetItemSearchCriteriaMapper - Update ExperimentAggregatesDAO.applyDatasetItemFiltersToTemplate and bindDatasetItemSearchParams to delegate to DatasetItemSearchCriteriaMapper - Update ExperimentDAO.newGroupTemplate and ExperimentAggregatesDAO.newGroupTemplate to delegate to ExperimentGroupMappers.applyGroupCriteriaToTemplate * [OPIK-4383] [BE] Add experiment aggregate event listener and no-op publisher * Revision 2: Fix missing import for ExperimentAggregationPublisher * [OPIK-4383] [BE] Add ExperimentAggregationPublisher, ExperimentDenormalizationJob and tests - ExperimentAggregationPublisher: debounces experiment aggregation triggers by writing compound workspaceId:experimentId members to a Redis ZSET scored by expiry timestamp (now + debounceDelay), plus a hash storing the userName with TTL=2×debounceDelay to handle stale entries. - ExperimentDenormalizationJob: @every("5s") job that reads ZSET members with score <= now, publishes ExperimentAggregationMessage to the Redis stream, then cleans up the ZSET entry and hash bucket. Handles stale entries (expired hash) by removing the orphaned ZSET member without publishing. - Fix processExperiment reactive chain: avoided double index.remove by returning Mono<Boolean> from flatMap branches so switchIfEmpty is only triggered when the bucket is truly empty. - ExperimentAggregationPublisherTest: integration tests with real Redis container verifying ZSET membership, score, userName storage, TTL, workspace isolation, and debounce deduplication. - ExperimentDenormalizationJobTest: unit tests with Mockito covering disabled config, lock not acquired, empty ZSET, happy path, stale entry, and batch. * Fix tests setup * [OPIK-4383] [BE] Address PR review: move DAO logs to service layer * [OPIK-4383] [BE] Address PR review: extract shared DAO helper and fix log placement * [OPIK-4383] [BE] Short-circuit deleteByTraceIds when no spans found Skip delete, cascading operations, and SpansDeleted event when getSpanIdsForTraces returns an empty set, preserving the original no-op behaviour and avoiding the Preconditions.checkArgument failure in SpanDAO.deleteByIds. * [OPIK-4383] [BE] Fix cascade deletion failures after trace delete Two bugs prevented spans and attachments from being deleted when a trace was deleted via the event-driven cascade: 1. FeedbackScoreService.deleteByTraceIds/deleteBySpanIds had @nonnull on projectId which threw NPE when TracesDeleted.projectId() was null. EventInterceptor swallowed the NPE, stopping the entire cascade chain. Fix: remove @nonnull since the DAO already handles null safely via Optional.ofNullable(projectId). 2. SpanDAO.DELETE_BY_IDS had the wrong column (trace_id) and parameter name (span_ids) — the ClickHouse R2DBC driver could not resolve :span_ids as a named parameter in the DELETE statement. Fixed by using id IN :ids to match the working pattern in TraceDAO.DELETE_BY_ID. * [OPIK-4383] [BE] Address PR review comments on ExperimentDenormalizationJob - Centralize Redis constants (EXPERIMENT_KEY_PREFIX, USER_NAME_FIELD, MEMBER_SEPARATOR) in ExperimentDenormalizationConfig - Change ExperimentAggregationPublisher.publish() to return Mono<Void> instead of void, so errors propagate to callers - Make job interval configurable via jobs map in config.yml - Fix onErrorContinue logging: remove getMessage() duplication - Demote per-experiment logs from INFO to DEBUG - Add ZSET pagination using expand() to avoid materializing entire range - Update tests for all changes * Fix @every job interval config key casing and add jobs section to test config The dropwizard-jobs framework uses WordUtils.uncapitalize(class.getSimpleName()) to look up the interval in the jobs map, so the key must be 'experimentDenormalizationJob' (lowercase first letter). Also adds the missing jobs section and jobBatchSize to config-test.yml. * Replace @every annotation with programmatic Quartz scheduling Remove @every from ExperimentDenormalizationJob and schedule it programmatically in OpikGuiceyLifecycleEventListener, following the same pattern as TraceThreadsClosingJob. Add jobInterval config field to ExperimentDenormalizationConfig. Remove the jobs YAML section that caused deserialization errors with JobConfiguration's immutable map. * Add experiment context to error log and extract publishIfNotEmpty helper - Include experimentId and workspaceId in onExperimentUpdated error log - Extract publishIfNotEmpty helper to deduplicate filter+publish logic across triggerByExperimentIds, triggerByTraceIds, triggerBySpanIds * Fix NPE in ExperimentAggregateEventListenerTest mock setup Stub publisher.publish() to return Mono.empty() in setUp so .subscribe() calls in production code don't NPE on null. * [OPIK-4385] [BE] Use pre-computed aggregation tables for experiment endpoints Apply UNION ALL hybrid pattern to ExperimentDAO (FIND, FIND_GROUPS, FIND_GROUPS_AGGREGATIONS) and ExperimentItemDAO (STREAM, SELECT_DATASET_ITEM_VERSIONS_WITH_EXPERIMENT_ITEMS_COUNT, SELECT_DATASET_ITEM_VERSIONS_WITH_EXPERIMENT_ITEMS, SELECT_DATASET_ITEM_VERSIONS_WITH_EXPERIMENT_ITEMS_STATS) so that experiments present in experiment_aggregates / experiment_item_aggregates use pre-computed values, while others fall back to live JOIN computation. Add ExperimentAggregatesIntegrationTest covering all 7 affected queries with parameterized filter, pagination, and consistency scenarios. * [OPIK-4386] [BE] Trigger lazy aggregation via publisher on GET experiment by ID When fetching an experiment by ID, if the experiment is in COMPLETED or CANCELLED state and is not yet present in the experiment_aggregates table, enqueue it for aggregation using ExperimentAggregationPublisher instead of computing aggregations synchronously. The check and publish are performed off the critical path via doOnEach, so the caller receives the experiment immediately without waiting for the side effect to complete. * [OPIK-4384] [BE] Fix missing zero_uuid binding and experiment_scores sort alias - Bind zero_uuid parameter in getById, getByIds, and get(ExperimentStreamRequest) methods that use the FIND query; the UNION ALL refactor introduced an experiments_from_aggregates CTE that requires this parameter but only the main find() method was binding it, causing 500 errors on those paths - Fix SortingQueryBuilder to reference the outer column alias experiment_scores_agg instead of es.experiment_scores; the ORDER BY sits outside the UNION ALL so the inner es alias is out of scope, while experiment_scores_agg is the consistent output alias exposed by both branches * [OPIK-4384] [BE] Fix null row injection from LEFT JOIN miss in feedback_scores and comments aggregation Pre-aggregate feedback_scores_final and comments_final into subqueries (GROUP BY entity_id) before LEFT JOIN in DatasetItemVersionDAO.STREAM. When a LEFT JOIN has no match against a pre-aggregated subquery the joined columns are NULL, so any(NULL) returns NULL instead of a default-valued row with epoch timestamps that caused Instant.parse() failures. Also adds a regression test covering the no-scores path in ExperimentAggregatesIntegrationTest. * [OPIK-4383] [BE] Remove DAO-level log.info from ExperimentAggregatesDAO methods Move operational logging responsibility to the service layer, consistent with earlier fixes for ExperimentItemDAO and SpanDAO in this PR. * Remove accidentally committed doc files These files were introduced during merge resolution but should not be part of the branch. * [OPIK-4383] [BE] refactor: extract triggerAggregation helper to centralize guard+publish flow * [OPIK-4386] [BE] fix: demote lazy aggregation check log to DEBUG * [OPIK-4387] [BE] feat: wire aggregation publisher into finishExperiments endpoint Chain experimentAggregationPublisher.publish() after AlertEvent in finishExperiments() so experiments finished via POST /v1/private/experiments/finish are published to Redis for aggregation computation. * [OPIK-4383] [BE] fix: restore TagOperations.tagUpdateFragment in SpanDAO BULK_UPDATE Restores proper tag handling in SpanDAO.BULK_UPDATE query that was regressed to a simple arrayConcat. Now uses TagOperations.tagUpdateFragment() which provides arrayDistinct(), tag limit enforcement (max 50), and tags_to_add/tags_to_remove support. Also adds the required short_circuit_function_evaluation SETTINGS for throwIf evaluation. * [OPIK-4387] [BE] feat: add stream trimming to experiment denormalization XADD Add streamMaxLen and streamTrimLimit configuration to bound Redis stream growth on the experiment denormalization producer (ExperimentDenormalizationJob). Uses Redisson's trimNonStrict().maxLen().limit() API for approximate trimming. * [OPIK-4387] [BE] fix: make aggregation publish best-effort in finishExperiments Swallow and log Redis/publish errors so finishExperiments returns 204 even when Redis is down. Aggregation will be retried by the lazy trigger or next job cycle. * [OPIK-4387] [BE] refactor: centralize Redis stream XADD trimming in RedisStreamUtils Extract duplicate StreamAddArgs.entry().trimNonStrict().maxLen().limit() into RedisStreamUtils.buildAddArgs() so stream trimming settings live in one place. Updates all 5 producers. * [OPIK-4387] [BE] fix: defer aggregation publish and update test for best-effort behavior Wrap aggregation publisher in Mono.defer() so it subscribes only after upstream completes, and update unit test to expect completion instead of error propagation. * Adding InterruptableJob * [OPIK-4383] [BE] Address PR review: expand safety valve, env var prefix - Add batchSize-capped iteration counter to expand() to prevent infinite loops when ZSET entries fail to be removed - Rename EXPERIMENT_DENORM_JOB_INTERVAL to OPIK_EXPERIMENT_DENORM_JOB_INTERVAL to follow the OPIK_ prefix convention * [OPIK-4384] [BE] Add branch optimization and CTE split to experiment queries Use pre-computed experiment_aggregates table to optimize query execution: - Add has_aggregated/has_raw flags to skip unnecessary UNION ALL branches in FIND/FIND_COUNT - Add getAggregationBranchCounts pre-query to determine which branches are needed - Apply CTE split pattern to FIND_GROUPS and FIND_GROUPS_AGGREGATIONS - Update getById to leverage branch optimization via single-ID branch count query - Add <if(id)> filter to SELECT_AGGREGATED_EXPERIMENT_IDS for getById support * [OPIK-4384] [BE] Add missing 7-arg overload for getDatasetItemsWithExperimentItems Fix test compilation error from merge: the remote branch added callers with (UUID, List, null, null, List<SortingField>, String, String) signature which needs a bridge overload to the 9-arg method. * [OPIK-4384] [BE] Add conditional LIMIT push-up, missing CTE, and fix test precision - Add conditional LIMIT push-up in STREAM query: push LIMIT into CTE when only one branch (raw or aggregated) is active for performance - Add missing experiment_item_aggr_trace_scope CTE for aggregated branch - Add AggregatedExperimentCounts record for experiment-level branching - Fix MultiValueFeedbackScoresE2ETest precision assertion: use isEqualTo instead of isEqualByComparingTo to respect custom BigDecimal comparator * [OPIK-4384] [BE] Push OFFSET into top_dataset_items CTE and fix BigDecimal comparator in DatasetsResourceTest * [OPIK-4384] [BE] Add pass rate aggregation to experiment aggregates Add pass_rate, passed_count, and total_count columns to experiment_aggregates table and compute them during aggregation. Update ExperimentDAO queries to select these columns from both raw and aggregated paths, returning NULL for non-evaluation-suite experiments. * Fix format * Fix get by id * Fix mapping * Fix mapping * [OPIK-4384] [BE] Use pre-aggregated comments from aggregate tables with ISO 8601 date formatting Update retrieval queries in ExperimentDAO, DatasetItemVersionDAO, and ExperimentAggregatesDAO to read comments_array_agg as JSON String from aggregate tables instead of live-querying the comments table. Ensure UNION ALL type compatibility by wrapping raw paths with toJSONString() and formatting dates as ISO 8601 for proper Jackson deserialization. * [OPIK-4386] [BE] Increase debounceDelay in test config to prevent race condition The denormalization job was processing finished experiments during test execution with incomplete ClickHouse data, causing stale aggregated values to be returned instead of fresh raw computations. * [OPIK-4384] [BE] Use parameterized binding for dynamic sort keys and add deterministic tiebreaker - Replace literal string interpolation in getTopSortExpression with parameterized bind variables (sf.bindKey()) to prevent SQL injection - Remove fieldMapping filter from bindDynamicKeys so all dynamic keys are bound, including those used in the top_sorting SELECT expression - Add deterministic tiebreaker (id DESC / dataset_item_id DESC) to both the push-top-limit CTE and the main ORDER BY for consistent pagination - Fix experiment_items deduplication: use FINAL where DISTINCT was used and vice versa for consistency across query branches * [OPIK-4384] [BE] Add mixed-state aggregation test for UNION ALL hybrid Test creates 3 experiments, aggregates only 1, and queries all 3 to exercise the UNION ALL hybrid path where has_aggregated and has_raw are both true simultaneously. * [OPIK-4384] [BE] Add isNotEmpty assertions to parameterized filter tests Ensure filter scenarios actually match data by asserting content() is not empty before and after aggregation in all parameterized filter tests (find, findGroups, findGroupsAggregations). * [OPIK-4384] [BE] refactor: extract assertion helpers to remove duplication in ExperimentAggregatesIntegrationTest * [OPIK-4384] [BE] refactor: rename parseFlexibleInstant to parseInstant in FeedbackScoreMapper * [OPIK-4384] [BE] Make LIMIT unconditional in FIND query The LIMIT clause was gated on filter/sort flags, so plain paged requests (only limit/offset) at the outer query level would not emit LIMIT. Simplify to always emit LIMIT when the limit parameter is provided. * [OPIK-4384] [BE] Fix comment ordering assertion in tests ClickHouse groupUniqArray does not guarantee ordering, so comment assertions must use ignoringCollectionOrder to avoid flaky failures. * [OPIK-4384] [BE] Add branch conditionals to FIND_GROUPS/FIND_GROUPS_AGGREGATIONS and revert unconditional LIMIT - Wrap SELECT branches in FIND_GROUPS and FIND_GROUPS_AGGREGATIONS with <if(has_aggregated)>/<if(has_raw)> conditionals to skip unnecessary branches when all experiments are aggregated or all are raw - Add no-args getAggregationBranchCounts() overload for workspace-only pre-query (used by group/aggregation queries that lack experiment IDs) - Update executeQueryWithTargetProjects to run both pre-queries in parallel via Mono.zip - Revert commit 215a3f9 (unconditional LIMIT) which caused double LIMIT/OFFSET bug: CTE-level LIMIT + outer LIMIT made page 2+ return 0 results. The complex conditional is correct — outer LIMIT is only needed when post-CTE processing may alter the result set. * [OPIK-4384] [BE] Add branch conditionals to SELECT_DATASET_ITEM_VERSIONS_WITH_EXPERIMENT_ITEMS_COUNT Wrap the UNION ALL in the count query with <if(has_aggregated)>/<if(has_raw)> conditionals to skip unnecessary branches. Pass branch flags through getCountWithExperimentFilters from the existing pre-query results. * [OPIK-4384] [BE] Fix ClickHouse column resolution in COUNT query Alias dataset_item_id as di_id in the COUNT subquery branches to avoid column name ambiguity when ClickHouse 25.3's query analyzer resolves COUNT(DISTINCT dataset_item_id) through a LEFT JOIN with dataset_items_resolved which also has that column. * [OPIK-4384] [BE] Use pre-computed comments in STREAM query and fix UNION ALL type mismatch Aggregated branch now reads comments_array_agg directly from experiment_item_aggregates instead of doing an expensive JOIN to the comments table. Raw branch converts comments to JSON String via toJSONString(CAST(...)) so both branches output compatible types. * [OPIK-4384] [BE] Fix target_project_ids bind error in FIND_GROUPS aggregated branch * Fix issues * [OPIK-4387] [BE] Fix missing closing brace in ExperimentServiceTest

@nonnull

- Use per-workspace cursors in deleteSmallBatch via deleteForRetentionBounded instead of collapsing to min(cursor) across all workspaces (#1) - Add @nonnull on executeCatchUpCycle(now) parameter (#3) - Log when catch-up is disabled (#3) - Run all three tiers independently per cycle via Flux.concat instead of switchIfEmpty chain to prevent medium/large starvation (#4) - Return null cursor when velocity=0, marking catch-up done immediately (#5) - Preserve Instant directly instead of UUID round-trip in deleteOneChunk (#7) - Hoist computeSlidingWindowStart out of per-rule loop (#8) - Centralize extractInstant/compareUUID into RetentionUtils (#9) - Remove unnecessary @UseStringTemplateEngine from catch-up queries (#10) - Add explicit IS NOT NULL guard on catch_up_velocity queries (#11) - Drop unused cnt column from scout query (#14) - Fix Javadoc: 'oldest span ID' → 'oldest span time' in SpanDAO

@nonnull

* [OPIK-4891] [BE] Catch-up job for apply-to-past retention rules Progressive historical data deletion for rules with applyToPast=true. Estimates workspace span velocity at rule creation to triage into small/medium/large tiers with appropriate chunk sizes. Schema: - Add catch_up_velocity, catch_up_cursor, catch_up_done columns - Add idx_catch_up_pending composite index for catch-up queries Velocity estimation: - ClickHouse query: uniq(id) / weeks_active for spans below cutoff - Handles TOO_MANY_ROWS (code 158) by defaulting to 1M/week - Handles empty tables gracefully Catch-up tiers (configurable thresholds): - Small (<10K/week): batch up to 200, one-shot delete entire range - Medium (10K-100K/week): 10 most outdated, 7-day chunks each - Large (>100K/week): 1 most outdated, 2-day chunks Execution: - Runs after regular sliding-window pass in RetentionPolicyJob - Priority: small first (quick wins), then medium, then large - Cursor advances oldest→newest, marks done when reaching sliding window Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix large workspace chunk size: 2 days to 1 day Large workspaces (>100K spans/week) process one day per catch-up cycle, so each execution handles a manageable amount of data. * Address PR review: fix type cast, null safety, error handling - Fix Float64→Long ClassCastException: wrap velocity query with toUInt64() - Fix null cursor NPE in deleteSmallBatch: filter nulls before min() - Fix catch-up marking done on delete failure: remove onErrorResume, propagate errors so cursor/done only advances on success - Make markDone/updateCursor non-blocking: wrap in Mono.fromRunnable on boundedElastic to avoid blocking Reactor threads * Add config comments for catch-up settings * Return oldest span time from velocity estimation, add scouting - Velocity query now returns both spans_per_week and oldest_span_time - Cursor starts at the actual oldest data, not service start date - For huge workspaces (TOO_MANY_ROWS), scout month by month on traces table to find first day with data, avoiding months of no-op deletes - If a monthly scout also hits row limit, use that month start as cursor * Replace SQL string concatenation with @BindList in markCatchUpDoneBatch Avoids fragile raw SQL construction pattern. Uses JDBI's @BindList for parameterized IN clause, consistent with other DAOs in the codebase. * Address review: rename vars for clarity, hide internal fields, add safety comment - Rename upperBound/lowerBound to cutoffId/fromId in deleteSmallBatch for consistency with deleteOneChunk and DAO signatures - Hide catchUpVelocity and catchUpCursor from API response (internal); only catchUpDone remains public as user-facing progress indicator - Add comment explaining NULL cursor safety in catch-up DAO queries * Guard cursor >= upperBound, isolate catch-up errors, expose cursor in API - Skip delete and mark done if cursor already past sliding window boundary - Wrap catch-up cycle in onErrorResume so failures don't kill regular retention - Re-expose catchUpCursor in API (useful for users to see cleanup progress); catchUpVelocity remains hidden (internal implementation detail) * Revert scouting to simple blocking loop, improve schema comments - Revert scoutFirstDataCursor from Flux back to blocking while-loop. Rule creation is a rare admin op; reactive complexity not justified. - Improve catch_up_cursor and catch_up_done column comments to document cursor semantics (data before cursor has been deleted). * Add unit tests for TOO_MANY_ROWS velocity estimation fallback - RetentionRuleServiceVelocityTest: 6 tests covering the code 158 exception path with mocked SpanDAO/TraceDAO. Tests scouting month-by-month, dense month fallback, service start date fallback, and non-158 exception rethrow. - Remove large workspace integration test (max_rows_to_read profile setting also blocks normal inserts/deletes, making it impossible to trigger TOO_MANY_ROWS only on the estimation query) - Keep small workspace catch-up integration test and applyToPast=false test in RetentionPolicyServiceTest - Make estimateVelocity/scoutFirstDataCursor package-visible for testing * Mark catch-up done when scouting finds no historical data When the velocity estimation hits TOO_MANY_ROWS and scouting scans every month without finding data, return velocity=0 with null cursor so the rule is created with catchUpDone=true. Prevents hundreds of empty 1-day chunk DELETE cycles. * Bump migration to 000061, simplify index, split rollback - Rename migration from 000060 to 000061 (main advanced past 000060) - Simplify index to (catch_up_done, catch_up_velocity) since catch_up_done=false already implies enabled=true and apply_to_past=true - Split rollback into individual DROP COLUMN statements * Address review comments from thiagohora and baz - Use per-workspace cursors in deleteSmallBatch via deleteForRetentionBounded instead of collapsing to min(cursor) across all workspaces (#1) - Add @nonnull on executeCatchUpCycle(now) parameter (#3) - Log when catch-up is disabled (#3) - Run all three tiers independently per cycle via Flux.concat instead of switchIfEmpty chain to prevent medium/large starvation (#4) - Return null cursor when velocity=0, marking catch-up done immediately (#5) - Preserve Instant directly instead of UUID round-trip in deleteOneChunk (#7) - Hoist computeSlidingWindowStart out of per-rule loop (#8) - Centralize extractInstant/compareUUID into RetentionUtils (#9) - Remove unnecessary @UseStringTemplateEngine from catch-up queries (#10) - Add explicit IS NOT NULL guard on catch_up_velocity queries (#11) - Drop unused cnt column from scout query (#14) - Fix Javadoc: 'oldest span ID' → 'oldest span time' in SpanDAO * Remove scripts/.gitignore, lower disabled log to DEBUG - Remove unnecessary .gitignore in scripts/ (test CSVs are local only) - Lower catch-up disabled log from INFO to DEBUG to avoid 48 noisy log lines per day when catch-up is off --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

alexkuzmik added 8 commits May 22, 2023 19:37

Add tests for chain nodes and prompts

705aa00

Fix some lint errors

c6609eb

Implement chain, move time logic to Timer class, update tests

5215f79

Merge branch 'main' into CM-7071-llm-sdk-allow-users-to-logs-their-ch…

27c3e85

…ain-execution

Add proper support for category

d2b9e10

Add state.get_new_id()

775ec36

input_ and output_ metadata -> metadata, add more type hints

a9b5c3b

Add some type hints

cd51ba9

alexkuzmik self-assigned this May 24, 2023

alexkuzmik added 21 commits May 24, 2023 22:12

Fix lint errors

b2469f3

Add tests for Timer

28770ac

Small refactor

d7790a6

Change error text

29a415f

Rename input -> inputs in class/function signatures

0d1a009

Init test for node_data_to_dict function

f3b775e

Implement convert.node_data_to_dict

15cb92c

Use node_data_to_dict in ChainNode

91a516a

Implement node name logic

12a4221

Merge branch 'main' into CM-7071-llm-sdk-allow-users-to-logs-their-ch…

15c3794

…ain-execution

Merge branch 'main' into CM-7071-llm-sdk-allow-users-to-logs-their-ch…

ee29cd3

…ain-execution

Add name to node_data_to_dict

d739bd5

Add name to as_dict logic in prompt and node

42b0e24

Fix lint errors

ca12dbc

Reorganize some code

9be61dd

Remove node data to dict, move this logic to ChainNode.as_dict()

58e5a74

Rename group_stack -> context

962a4e9

Delete modules that are no longer needed. Rename some of the rest

192c702

Fix some lint errors

42d1fe7

Add start() to timer

94dbd27

Fix lint errors

feddd45

AndreiCautisanu mentioned this pull request Mar 5, 2026

[NA] [CI] fix: stabilize TypeScript SDK E2E tests and improve CI debugging #5521

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CM-7071] llm sdk allow users to logs their chain execution#7

[CM-7071] llm sdk allow users to logs their chain execution#7
alexkuzmik merged 54 commits intofeature/chainsfrom
CM-7071-llm-sdk-allow-users-to-logs-their-chain-execution

alexkuzmik commented May 24, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

alexkuzmik commented May 24, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant