[OPIK-5270] [BE] perf: optimize feedback score CTE pipeline to reduce memory overhead by andrescrz · Pull Request #6107 · comet-ml/opik

andrescrz · 2026-04-07T14:48:42Z

Details

Replace ROW_NUMBER window functions with ClickHouse-native LIMIT 1 BY for deduplication, and collapse 9 parallel groupArray calls into a single groupArray(tuple(...)) per feedback score chain. This eliminates window function buffer allocation and reduces array materialization from 9 arrays to 1 tuple array — the two biggest contributors to the 12+ GiB pipeline overhead that caused OOM on large projects.

Additionally, remove the redundant FINAL keyword from all feedback score table reads. Since LIMIT 1 BY already deduplicates to the latest row per partition key, FINAL (which forces a merge of all data parts at read time) is unnecessary and adds overhead. This aligns the remaining 5 DAOs with the 7 DAOs that already omitted FINAL.

Production benchmark on customer data (17M rows):

Before: 27.07 GiB peak memory (OOM crash at 29 GiB limit)
After: 5.10 GiB peak memory (81% reduction, query completes)

Optimization 1 — LIMIT 1 BY + tuple groupArray (12 files, 33 templates):

TraceDAO: 4 templates (trace + span feedback chains)
SpanDAO: 4 templates
ThreadDAO: 4 templates
KpiCardDAO: 3 templates
ProjectMetricsDAO: 3 templates
ExperimentDAO: 3 templates + 2 assertion_results templates
DatasetItemVersionDAO: 3 templates
DatasetItemDAO: 2 templates
ExperimentItemDAO: 1 template
ExperimentAggregatesDAO: 1 template + 1 assertion_results template
OptimizationDAO: 1 template
AnnotationQueueDAO: 1 template

Optimization 2 — Remove redundant FINAL (5 files, 24 occurrences):

KpiCardDAO: 6 occurrences
DatasetItemDAO: 6 occurrences
DatasetItemVersionDAO: 6 occurrences
ExperimentAggregatesDAO: 4 occurrences
AnnotationQueueDAO: 2 occurrences

Zero remaining ClickHouse ROW_NUMBER dedup patterns, arrayEnumerate index-based recombination, or feedback_scores FINAL / authored_feedback_scores FINAL across the entire codebase.

Change checklist

User facing
Documentation update

Issues

OPIK-5270

AI-WATERMARK

AI-WATERMARK: yes

Tools: Claude Code
Model(s): Claude Opus 4.6
Scope: full implementation
Human verification: code review + production ClickHouse benchmark (27 GiB OOM → 5 GiB success)

Testing

Compilation: mvn compile -DskipTests — clean
Spotless: mvn spotless:check — clean
Production benchmark: Ran exact OOM query from customer's ClickHouse against the same workspace/project (17M rows). Old query crashed at 27.07 GiB, new query completed at 5.10 GiB.
Existing test coverage: All 12 modified DAOs are covered by integration tests running against real ClickHouse:
- MultiValueFeedbackScoresE2ETest: multi-author dedup, value averaging, valueByAuthor map, reason concatenation — covers traces, spans, threads, experiments, optimizations
- GetTracesByProjectResourceTest: 6+ filter operators, sorting, exclude + filter interaction
- FindTraceThreadsResourceTest, ExperimentsResourceTest, KpiCardsResourceTest, ProjectMetricsResourceTest, DatasetsResourceTest, AnnotationQueuesResourceTest, OptimizationsResourceTest, ExperimentAggregatesIntegrationTest
Tests not run locally: Full mvn test suite requires Docker infrastructure. To be validated by CI.

Documentation

N/A

… memory overhead Replace ROW_NUMBER window functions with ClickHouse-native LIMIT 1 BY for deduplication, and collapse 9 parallel groupArray calls into a single groupArray(tuple(...)) per feedback score chain. This eliminates window function buffer allocation and reduces array materialization from 9 arrays to 1 tuple array — the two biggest contributors to the 12+ GiB pipeline overhead that caused OOM on large projects. Production benchmark on customer data (17M rows): - Before: 27.07 GiB peak memory (OOM crash at 29 GiB limit) - After: 5.10 GiB peak memory (81% reduction, query completes) Applied across all 12 DAO files (33 templates total): - TraceDAO: 4 templates (trace + span feedback chains) - SpanDAO: 4 templates - ThreadDAO: 4 templates - KpiCardDAO: 3 templates - ProjectMetricsDAO: 3 templates - ExperimentDAO: 3 templates + 2 assertion_results templates - DatasetItemVersionDAO: 3 templates - DatasetItemDAO: 2 templates - ExperimentItemDAO: 1 template - ExperimentAggregatesDAO: 1 template + 1 assertion_results template - OptimizationDAO: 1 template - AnnotationQueueDAO: 1 template Zero remaining ClickHouse ROW_NUMBER dedup patterns or arrayEnumerate index-based recombination across the entire codebase.

apps/opik-backend/src/main/java/com/comet/opik/domain/TraceDAO.java

apps/opik-backend/src/main/java/com/comet/opik/domain/ThreadDAO.java

LIMIT 1 BY already deduplicates to the latest row per partition key, making FINAL (which forces a merge of all data parts at read time) redundant and expensive. Remove it from the 5 remaining DAOs to match the pattern already used by TraceDAO, SpanDAO, ThreadDAO, ExperimentDAO, ExperimentItemDAO, OptimizationDAO, and ProjectMetricsDAO. Files: KpiCardDAO, DatasetItemDAO, DatasetItemVersionDAO, AnnotationQueueDAO, ExperimentAggregatesDAO (24 occurrences removed).

github-actions · 2026-04-07T15:13:15Z

Backend Tests - Integration Group 16

15 files 15 suites 5m 27s ⏱️
228 tests 225 ✅ 2 💤 0 ❌ 1 🔥
227 runs 225 ✅ 2 💤 0 ❌

For more details on these errors, see this check.

Results for commit 3c6c336.

apps/opik-backend/src/main/java/com/comet/opik/domain/KpiCardDAO.java

thiagohora

Good job!

… memory overhead (#6107) * [OPIK-5270] [BE] perf: optimize feedback score CTE pipeline to reduce memory overhead Replace ROW_NUMBER window functions with ClickHouse-native LIMIT 1 BY for deduplication, and collapse 9 parallel groupArray calls into a single groupArray(tuple(...)) per feedback score chain. This eliminates window function buffer allocation and reduces array materialization from 9 arrays to 1 tuple array — the two biggest contributors to the 12+ GiB pipeline overhead that caused OOM on large projects. Production benchmark on customer data (17M rows): - Before: 27.07 GiB peak memory (OOM crash at 29 GiB limit) - After: 5.10 GiB peak memory (81% reduction, query completes) Applied across all 12 DAO files (33 templates total): - TraceDAO: 4 templates (trace + span feedback chains) - SpanDAO: 4 templates - ThreadDAO: 4 templates - KpiCardDAO: 3 templates - ProjectMetricsDAO: 3 templates - ExperimentDAO: 3 templates + 2 assertion_results templates - DatasetItemVersionDAO: 3 templates - DatasetItemDAO: 2 templates - ExperimentItemDAO: 1 template - ExperimentAggregatesDAO: 1 template + 1 assertion_results template - OptimizationDAO: 1 template - AnnotationQueueDAO: 1 template Zero remaining ClickHouse ROW_NUMBER dedup patterns or arrayEnumerate index-based recombination across the entire codebase. * perf: remove redundant FINAL keyword from feedback score queries LIMIT 1 BY already deduplicates to the latest row per partition key, making FINAL (which forces a merge of all data parts at read time) redundant and expensive. Remove it from the 5 remaining DAOs to match the pattern already used by TraceDAO, SpanDAO, ThreadDAO, ExperimentDAO, ExperimentItemDAO, OptimizationDAO, and ProjectMetricsDAO. Files: KpiCardDAO, DatasetItemDAO, DatasetItemVersionDAO, AnnotationQueueDAO, ExperimentAggregatesDAO (24 occurrences removed).

github-actions bot added java Pull requests that update Java code Backend labels Apr 7, 2026

github-actions bot assigned andrescrz Apr 7, 2026

baz-reviewer bot reviewed Apr 7, 2026

View reviewed changes

apps/opik-backend/src/main/java/com/comet/opik/domain/TraceDAO.java Show resolved Hide resolved

apps/opik-backend/src/main/java/com/comet/opik/domain/ThreadDAO.java Show resolved Hide resolved

baz-reviewer bot approved these changes Apr 7, 2026

View reviewed changes

andrescrz marked this pull request as ready for review April 7, 2026 15:03

andrescrz requested a review from a team as a code owner April 7, 2026 15:03

andrescrz force-pushed the andrescrz/OPIK-5270-optimize-feedback-score-ctes branch from 1561914 to 3c6c336 Compare April 7, 2026 15:04

baz-reviewer bot reviewed Apr 7, 2026

View reviewed changes

apps/opik-backend/src/main/java/com/comet/opik/domain/KpiCardDAO.java Show resolved Hide resolved

baz-reviewer bot approved these changes Apr 7, 2026

View reviewed changes

thiagohora approved these changes Apr 7, 2026

View reviewed changes

andrescrz merged commit ff0a7d1 into main Apr 7, 2026
76 checks passed

andrescrz deleted the andrescrz/OPIK-5270-optimize-feedback-score-ctes branch April 7, 2026 15:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OPIK-5270] [BE] perf: optimize feedback score CTE pipeline to reduce memory overhead#6107

[OPIK-5270] [BE] perf: optimize feedback score CTE pipeline to reduce memory overhead#6107
andrescrz merged 2 commits intomainfrom
andrescrz/OPIK-5270-optimize-feedback-score-ctes

andrescrz commented Apr 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Apr 7, 2026

Uh oh!

Uh oh!

thiagohora left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

andrescrz commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Details

Change checklist

Issues

AI-WATERMARK

Testing

Documentation

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Apr 7, 2026

Backend Tests - Integration Group 16

Uh oh!

Uh oh!

thiagohora left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

andrescrz commented Apr 7, 2026 •

edited

Loading