Conversation
… memory overhead Replace ROW_NUMBER window functions with ClickHouse-native LIMIT 1 BY for deduplication, and collapse 9 parallel groupArray calls into a single groupArray(tuple(...)) per feedback score chain. This eliminates window function buffer allocation and reduces array materialization from 9 arrays to 1 tuple array — the two biggest contributors to the 12+ GiB pipeline overhead that caused OOM on large projects. Production benchmark on customer data (17M rows): - Before: 27.07 GiB peak memory (OOM crash at 29 GiB limit) - After: 5.10 GiB peak memory (81% reduction, query completes) Applied across all 12 DAO files (33 templates total): - TraceDAO: 4 templates (trace + span feedback chains) - SpanDAO: 4 templates - ThreadDAO: 4 templates - KpiCardDAO: 3 templates - ProjectMetricsDAO: 3 templates - ExperimentDAO: 3 templates + 2 assertion_results templates - DatasetItemVersionDAO: 3 templates - DatasetItemDAO: 2 templates - ExperimentItemDAO: 1 template - ExperimentAggregatesDAO: 1 template + 1 assertion_results template - OptimizationDAO: 1 template - AnnotationQueueDAO: 1 template Zero remaining ClickHouse ROW_NUMBER dedup patterns or arrayEnumerate index-based recombination across the entire codebase.
LIMIT 1 BY already deduplicates to the latest row per partition key, making FINAL (which forces a merge of all data parts at read time) redundant and expensive. Remove it from the 5 remaining DAOs to match the pattern already used by TraceDAO, SpanDAO, ThreadDAO, ExperimentDAO, ExperimentItemDAO, OptimizationDAO, and ProjectMetricsDAO. Files: KpiCardDAO, DatasetItemDAO, DatasetItemVersionDAO, AnnotationQueueDAO, ExperimentAggregatesDAO (24 occurrences removed).
1561914 to
3c6c336
Compare
Contributor
Backend Tests - Integration Group 16 15 files 15 suites 5m 27s ⏱️ For more details on these errors, see this check. Results for commit 3c6c336. |
andrescrz
added a commit
that referenced
this pull request
Apr 7, 2026
… memory overhead (#6107) * [OPIK-5270] [BE] perf: optimize feedback score CTE pipeline to reduce memory overhead Replace ROW_NUMBER window functions with ClickHouse-native LIMIT 1 BY for deduplication, and collapse 9 parallel groupArray calls into a single groupArray(tuple(...)) per feedback score chain. This eliminates window function buffer allocation and reduces array materialization from 9 arrays to 1 tuple array — the two biggest contributors to the 12+ GiB pipeline overhead that caused OOM on large projects. Production benchmark on customer data (17M rows): - Before: 27.07 GiB peak memory (OOM crash at 29 GiB limit) - After: 5.10 GiB peak memory (81% reduction, query completes) Applied across all 12 DAO files (33 templates total): - TraceDAO: 4 templates (trace + span feedback chains) - SpanDAO: 4 templates - ThreadDAO: 4 templates - KpiCardDAO: 3 templates - ProjectMetricsDAO: 3 templates - ExperimentDAO: 3 templates + 2 assertion_results templates - DatasetItemVersionDAO: 3 templates - DatasetItemDAO: 2 templates - ExperimentItemDAO: 1 template - ExperimentAggregatesDAO: 1 template + 1 assertion_results template - OptimizationDAO: 1 template - AnnotationQueueDAO: 1 template Zero remaining ClickHouse ROW_NUMBER dedup patterns or arrayEnumerate index-based recombination across the entire codebase. * perf: remove redundant FINAL keyword from feedback score queries LIMIT 1 BY already deduplicates to the latest row per partition key, making FINAL (which forces a merge of all data parts at read time) redundant and expensive. Remove it from the 5 remaining DAOs to match the pattern already used by TraceDAO, SpanDAO, ThreadDAO, ExperimentDAO, ExperimentItemDAO, OptimizationDAO, and ProjectMetricsDAO. Files: KpiCardDAO, DatasetItemDAO, DatasetItemVersionDAO, AnnotationQueueDAO, ExperimentAggregatesDAO (24 occurrences removed).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Details
Replace ROW_NUMBER window functions with ClickHouse-native LIMIT 1 BY for deduplication, and collapse 9 parallel groupArray calls into a single groupArray(tuple(...)) per feedback score chain. This eliminates window function buffer allocation and reduces array materialization from 9 arrays to 1 tuple array — the two biggest contributors to the 12+ GiB pipeline overhead that caused OOM on large projects.
Additionally, remove the redundant
FINALkeyword from all feedback score table reads. SinceLIMIT 1 BYalready deduplicates to the latest row per partition key,FINAL(which forces a merge of all data parts at read time) is unnecessary and adds overhead. This aligns the remaining 5 DAOs with the 7 DAOs that already omittedFINAL.Production benchmark on customer data (17M rows):
Optimization 1 — LIMIT 1 BY + tuple groupArray (12 files, 33 templates):
Optimization 2 — Remove redundant FINAL (5 files, 24 occurrences):
Zero remaining ClickHouse ROW_NUMBER dedup patterns, arrayEnumerate index-based recombination, or
feedback_scores FINAL/authored_feedback_scores FINALacross the entire codebase.Change checklist
Issues
AI-WATERMARK
AI-WATERMARK: yes
Testing
mvn compile -DskipTests— cleanmvn spotless:check— cleanMultiValueFeedbackScoresE2ETest: multi-author dedup, value averaging, valueByAuthor map, reason concatenation — covers traces, spans, threads, experiments, optimizationsGetTracesByProjectResourceTest: 6+ filter operators, sorting, exclude + filter interactionFindTraceThreadsResourceTest,ExperimentsResourceTest,KpiCardsResourceTest,ProjectMetricsResourceTest,DatasetsResourceTest,AnnotationQueuesResourceTest,OptimizationsResourceTest,ExperimentAggregatesIntegrationTestmvn testsuite requires Docker infrastructure. To be validated by CI.Documentation
N/A