[OPIK-4828] [BE] Fix thread view shows no data message but then eventually loads#5954
Conversation
Backend Tests - Integration Group 8 34 files 34 suites 9m 29s ⏱️ Results for commit 097d696. ♻️ This comment has been updated with latest results. |
...esources/liquibase/db-app-analytics/migrations/000077_add_thread_id_skip_index_to_traces.sql
Show resolved
Hide resolved
Backend Tests - Integration Group 6256 tests 256 ✅ 3m 49s ⏱️ Results for commit 2fe994b. ♻️ This comment has been updated with latest results. |
andrescrz
left a comment
There was a problem hiding this comment.
Let's double check some comments before moving forward.
apps/opik-backend/src/main/java/com/comet/opik/domain/ThreadDAO.java
Outdated
Show resolved
Hide resolved
apps/opik-backend/src/main/java/com/comet/opik/domain/ThreadDAO.java
Outdated
Show resolved
Hide resolved
apps/opik-backend/src/main/java/com/comet/opik/domain/ThreadDAO.java
Outdated
Show resolved
Hide resolved
apps/opik-backend/src/main/java/com/comet/opik/domain/ThreadDAO.java
Outdated
Show resolved
Hide resolved
…ThreadDetailsPanel - Remove keepPreviousData from useThreadById and useTracesList to avoid showing the previous thread's data when switching between threads - Change && to || in renderContent loading check so that a Loader is shown whenever either query is pending, not only when both are pending simultaneously (which caused No Data to appear if traces resolved first)
OPIK-4828: thread_id had no skip index, causing full partition scans when filtering traces by thread in the thread view. The bloom filter at 1% false positive rate allows ClickHouse to skip ~99% of irrelevant granules for thread_id lookups.
Add SETTINGS use_skip_indexes_if_final = 1 so the bloom filter on traces.thread_id is applied even when reading with FINAL, allowing ClickHouse to skip irrelevant granules during thread view loading.
…ables in thread queries Remove FINAL modifier from feedback_scores and authored_feedback_scores reads across all thread DAO queries. authored_feedback_scores is append-only so FINAL is unnecessary overhead; feedback_scores reads are deduplicated via ROW_NUMBER() window function downstream so FINAL is also redundant. Also optimise SELECT_TRACES_THREAD_BY_ID with a two-phase CTE approach: narrow to matching trace IDs first (with FINAL), then read full rows without FINAL using LIMIT 1 BY for deduplication. Benchmarked ~5x faster on production for threads with large trace counts.
… global config, fix redundant dedup
9805cff to
097d696
Compare
…0077_add_thread_id_skip_index_to_traces.sql
Backend Tests - Integration Group 15 26 files ±0 26 suites ±0 3m 15s ⏱️ -12s For more details on these errors, see this check. Results for commit 6fbbcef. ± Comparison against base commit a24607c. |
andrescrz
left a comment
There was a problem hiding this comment.
LGTM, one comment of something that I don't know if it was left behind intentionally or not.
| created_at, | ||
| last_updated_at | ||
| FROM trace_threads final | ||
| FROM trace_threads FINAL |
There was a problem hiding this comment.
Any particular reason to not to change this to dedupe with ORDER BY + LIMIT BY like you changed above?
There was a problem hiding this comment.
I didn't change this one because the tests showed that it was slower. I didn't understand exactly why, but there were times when LIMIT + ORDER was actually slower than FINAL
The other makes sense because it filters by thread_id, reducing the result set that needs sorting.
Details
Backend
FINALfromfeedback_scoresandauthored_feedback_scoresreads in all thread queries —authored_feedback_scoresis append-only soFINALis unnecessary;feedback_scoresdeduplication is handled downstream byROW_NUMBER()window functionSELECT_TRACES_THREAD_BY_IDwith two-phase CTE: narrow to matching trace IDs first (withFINAL), then read full rows withoutFINALusingLIMIT 1 BYfor deduplication. Benchmarked ~5× faster on production for threads with large trace countsuse_skip_indexes_if_finalsetting on queries that still useFINALscansFrontend
keepPreviousDatafrom both thread detail and traces queries inThreadDetailsPanel— this was causing stale data to be shown when switching between threads, giving the impression of a "No Data" flash before the previous thread's content was replaced&&to||so the loader is shown while either the thread or its traces are still fetching, preventing a partial render with no contentChange checklist
Issues
https://comet-ml.atlassian.net/browse/OPIK-4828
Testing
Benchmarked on production ClickHouse (
opik_prod) against a project with ~28k traces in a single thread:SELECT_TRACES_THREAD_BY_IDSELECT_TRACES_THREADS_BY_PROJECT_IDSThe list query (
SELECT_TRACES_THREADS_BY_PROJECT_IDS) shows no wall-clock improvement from removingFINALon feedback tables (noise-level ~2%), which is expected — the bottleneck is trace aggregation. TheFINALremovals reduce unnecessary merge overhead in the pipeline without correctness risk.Frontend: manually verified that switching between threads no longer shows stale content or a "No Data" flash.
Documentation
No documentation changes required.