Skip to content

[OPIK-5270] [BE] perf: optimize feedback score CTE pipeline to reduce memory overhead#6107

Merged
andrescrz merged 2 commits intomainfrom
andrescrz/OPIK-5270-optimize-feedback-score-ctes
Apr 7, 2026
Merged

[OPIK-5270] [BE] perf: optimize feedback score CTE pipeline to reduce memory overhead#6107
andrescrz merged 2 commits intomainfrom
andrescrz/OPIK-5270-optimize-feedback-score-ctes

Conversation

@andrescrz
Copy link
Copy Markdown
Member

@andrescrz andrescrz commented Apr 7, 2026

Details

Replace ROW_NUMBER window functions with ClickHouse-native LIMIT 1 BY for deduplication, and collapse 9 parallel groupArray calls into a single groupArray(tuple(...)) per feedback score chain. This eliminates window function buffer allocation and reduces array materialization from 9 arrays to 1 tuple array — the two biggest contributors to the 12+ GiB pipeline overhead that caused OOM on large projects.

Additionally, remove the redundant FINAL keyword from all feedback score table reads. Since LIMIT 1 BY already deduplicates to the latest row per partition key, FINAL (which forces a merge of all data parts at read time) is unnecessary and adds overhead. This aligns the remaining 5 DAOs with the 7 DAOs that already omitted FINAL.

Production benchmark on customer data (17M rows):

  • Before: 27.07 GiB peak memory (OOM crash at 29 GiB limit)
  • After: 5.10 GiB peak memory (81% reduction, query completes)

Optimization 1 — LIMIT 1 BY + tuple groupArray (12 files, 33 templates):

  • TraceDAO: 4 templates (trace + span feedback chains)
  • SpanDAO: 4 templates
  • ThreadDAO: 4 templates
  • KpiCardDAO: 3 templates
  • ProjectMetricsDAO: 3 templates
  • ExperimentDAO: 3 templates + 2 assertion_results templates
  • DatasetItemVersionDAO: 3 templates
  • DatasetItemDAO: 2 templates
  • ExperimentItemDAO: 1 template
  • ExperimentAggregatesDAO: 1 template + 1 assertion_results template
  • OptimizationDAO: 1 template
  • AnnotationQueueDAO: 1 template

Optimization 2 — Remove redundant FINAL (5 files, 24 occurrences):

  • KpiCardDAO: 6 occurrences
  • DatasetItemDAO: 6 occurrences
  • DatasetItemVersionDAO: 6 occurrences
  • ExperimentAggregatesDAO: 4 occurrences
  • AnnotationQueueDAO: 2 occurrences

Zero remaining ClickHouse ROW_NUMBER dedup patterns, arrayEnumerate index-based recombination, or feedback_scores FINAL / authored_feedback_scores FINAL across the entire codebase.

Change checklist

  • User facing
  • Documentation update

Issues

  • OPIK-5270

AI-WATERMARK

AI-WATERMARK: yes

  • Tools: Claude Code
  • Model(s): Claude Opus 4.6
  • Scope: full implementation
  • Human verification: code review + production ClickHouse benchmark (27 GiB OOM → 5 GiB success)

Testing

  • Compilation: mvn compile -DskipTests — clean
  • Spotless: mvn spotless:check — clean
  • Production benchmark: Ran exact OOM query from customer's ClickHouse against the same workspace/project (17M rows). Old query crashed at 27.07 GiB, new query completed at 5.10 GiB.
  • Existing test coverage: All 12 modified DAOs are covered by integration tests running against real ClickHouse:
    • MultiValueFeedbackScoresE2ETest: multi-author dedup, value averaging, valueByAuthor map, reason concatenation — covers traces, spans, threads, experiments, optimizations
    • GetTracesByProjectResourceTest: 6+ filter operators, sorting, exclude + filter interaction
    • FindTraceThreadsResourceTest, ExperimentsResourceTest, KpiCardsResourceTest, ProjectMetricsResourceTest, DatasetsResourceTest, AnnotationQueuesResourceTest, OptimizationsResourceTest, ExperimentAggregatesIntegrationTest
  • Tests not run locally: Full mvn test suite requires Docker infrastructure. To be validated by CI.

Documentation

N/A

… memory overhead

  Replace ROW_NUMBER window functions with ClickHouse-native LIMIT 1 BY
  for deduplication, and collapse 9 parallel groupArray calls into a
  single groupArray(tuple(...)) per feedback score chain. This eliminates
  window function buffer allocation and reduces array materialization
  from 9 arrays to 1 tuple array — the two biggest contributors to the
  12+ GiB pipeline overhead that caused OOM on large projects.

  Production benchmark on customer data (17M rows):
  - Before: 27.07 GiB peak memory (OOM crash at 29 GiB limit)
  - After: 5.10 GiB peak memory (81% reduction, query completes)

  Applied across all 12 DAO files (33 templates total):
  - TraceDAO: 4 templates (trace + span feedback chains)
  - SpanDAO: 4 templates
  - ThreadDAO: 4 templates
  - KpiCardDAO: 3 templates
  - ProjectMetricsDAO: 3 templates
  - ExperimentDAO: 3 templates + 2 assertion_results templates
  - DatasetItemVersionDAO: 3 templates
  - DatasetItemDAO: 2 templates
  - ExperimentItemDAO: 1 template
  - ExperimentAggregatesDAO: 1 template + 1 assertion_results template
  - OptimizationDAO: 1 template
  - AnnotationQueueDAO: 1 template

  Zero remaining ClickHouse ROW_NUMBER dedup patterns or arrayEnumerate
  index-based recombination across the entire codebase.
@github-actions github-actions bot added java Pull requests that update Java code Backend labels Apr 7, 2026
@andrescrz andrescrz marked this pull request as ready for review April 7, 2026 15:03
@andrescrz andrescrz requested a review from a team as a code owner April 7, 2026 15:03
LIMIT 1 BY already deduplicates to the latest row per partition key,
making FINAL (which forces a merge of all data parts at read time)
redundant and expensive. Remove it from the 5 remaining DAOs to match
the pattern already used by TraceDAO, SpanDAO, ThreadDAO, ExperimentDAO,
ExperimentItemDAO, OptimizationDAO, and ProjectMetricsDAO.

Files: KpiCardDAO, DatasetItemDAO, DatasetItemVersionDAO,
AnnotationQueueDAO, ExperimentAggregatesDAO (24 occurrences removed).
@andrescrz andrescrz force-pushed the andrescrz/OPIK-5270-optimize-feedback-score-ctes branch from 1561914 to 3c6c336 Compare April 7, 2026 15:04
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

Backend Tests - Integration Group 16

 15 files   15 suites   5m 27s ⏱️
228 tests 225 ✅ 2 💤 0 ❌ 1 🔥
227 runs  225 ✅ 2 💤 0 ❌

For more details on these errors, see this check.

Results for commit 3c6c336.

Copy link
Copy Markdown
Contributor

@thiagohora thiagohora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job!

@andrescrz andrescrz merged commit ff0a7d1 into main Apr 7, 2026
76 checks passed
@andrescrz andrescrz deleted the andrescrz/OPIK-5270-optimize-feedback-score-ctes branch April 7, 2026 15:29
andrescrz added a commit that referenced this pull request Apr 7, 2026
… memory overhead (#6107)

* [OPIK-5270] [BE] perf: optimize feedback score CTE pipeline to reduce memory overhead

  Replace ROW_NUMBER window functions with ClickHouse-native LIMIT 1 BY
  for deduplication, and collapse 9 parallel groupArray calls into a
  single groupArray(tuple(...)) per feedback score chain. This eliminates
  window function buffer allocation and reduces array materialization
  from 9 arrays to 1 tuple array — the two biggest contributors to the
  12+ GiB pipeline overhead that caused OOM on large projects.

  Production benchmark on customer data (17M rows):
  - Before: 27.07 GiB peak memory (OOM crash at 29 GiB limit)
  - After: 5.10 GiB peak memory (81% reduction, query completes)

  Applied across all 12 DAO files (33 templates total):
  - TraceDAO: 4 templates (trace + span feedback chains)
  - SpanDAO: 4 templates
  - ThreadDAO: 4 templates
  - KpiCardDAO: 3 templates
  - ProjectMetricsDAO: 3 templates
  - ExperimentDAO: 3 templates + 2 assertion_results templates
  - DatasetItemVersionDAO: 3 templates
  - DatasetItemDAO: 2 templates
  - ExperimentItemDAO: 1 template
  - ExperimentAggregatesDAO: 1 template + 1 assertion_results template
  - OptimizationDAO: 1 template
  - AnnotationQueueDAO: 1 template

  Zero remaining ClickHouse ROW_NUMBER dedup patterns or arrayEnumerate
  index-based recombination across the entire codebase.

* perf: remove redundant FINAL keyword from feedback score queries

LIMIT 1 BY already deduplicates to the latest row per partition key,
making FINAL (which forces a merge of all data parts at read time)
redundant and expensive. Remove it from the 5 remaining DAOs to match
the pattern already used by TraceDAO, SpanDAO, ThreadDAO, ExperimentDAO,
ExperimentItemDAO, OptimizationDAO, and ProjectMetricsDAO.

Files: KpiCardDAO, DatasetItemDAO, DatasetItemVersionDAO,
AnnotationQueueDAO, ExperimentAggregatesDAO (24 occurrences removed).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Backend java Pull requests that update Java code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants