Skip to content

Conversation

@mahadzaryab1
Copy link
Collaborator

@mahadzaryab1 mahadzaryab1 commented Dec 30, 2025

Description of the changes

  • Use the AggregatingMergeTree engine for deduplicating services and operations
  • Use the GROUP BY clause to deduplicate at query time instead of DISTINCT

How was this change tested?

  • CI

Checklist

Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
@mahadzaryab1 mahadzaryab1 added the changelog:experimental Change to an experimental part of the code label Dec 30, 2025
@mahadzaryab1 mahadzaryab1 requested a review from a team as a code owner December 30, 2025 21:21
@mahadzaryab1 mahadzaryab1 changed the title [fix][clickhouse] Use FINAL Modifier For Performing Merge [fix][clickhouse] Use FINAL Modifier For Performing Merge Dec 30, 2025
@codecov
Copy link

codecov bot commented Dec 30, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.53%. Comparing base (a6e3492) to head (25ecd42).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7808      +/-   ##
==========================================
+ Coverage   95.47%   95.53%   +0.06%     
==========================================
  Files         307      307              
  Lines       15892    15911      +19     
==========================================
+ Hits        15173    15201      +28     
+ Misses        564      558       -6     
+ Partials      155      152       -3     
Flag Coverage Δ
badger_v1 9.18% <ø> (-0.02%) ⬇️
badger_v2 1.93% <ø> (-0.01%) ⬇️
cassandra-4.x-v1-manual 13.58% <ø> (-0.03%) ⬇️
cassandra-4.x-v2-auto 1.92% <ø> (-0.01%) ⬇️
cassandra-4.x-v2-manual 1.92% <ø> (-0.01%) ⬇️
cassandra-5.x-v1-manual 13.58% <ø> (-0.03%) ⬇️
cassandra-5.x-v2-auto 1.92% <ø> (-0.01%) ⬇️
cassandra-5.x-v2-manual 1.92% <ø> (-0.01%) ⬇️
clickhouse 1.97% <ø> (+0.11%) ⬆️
elasticsearch-6.x-v1 17.54% <ø> (-0.04%) ⬇️
elasticsearch-7.x-v1 17.57% <ø> (-0.04%) ⬇️
elasticsearch-8.x-v1 17.73% <ø> (-0.04%) ⬇️
elasticsearch-8.x-v2 1.93% <ø> (-0.01%) ⬇️
elasticsearch-9.x-v2 1.93% <ø> (-0.01%) ⬇️
grpc_v1 8.84% <ø> (-0.02%) ⬇️
grpc_v2 1.93% <ø> (-0.01%) ⬇️
kafka-3.x-v2 1.93% <ø> (-0.01%) ⬇️
memory_v2 1.93% <ø> (-0.01%) ⬇️
opensearch-1.x-v1 17.62% <ø> (-0.04%) ⬇️
opensearch-2.x-v1 17.62% <ø> (-0.04%) ⬇️
opensearch-2.x-v2 1.93% <ø> (-0.01%) ⬇️
opensearch-3.x-v2 1.93% <ø> (-0.01%) ⬇️
query 1.93% <ø> (-0.01%) ⬇️
tailsampling-processor 0.55% <ø> (-0.01%) ⬇️
unittests 94.16% <ø> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@@ -224,28 +224,28 @@ LEFT JOIN trace_id_timestamps t ON s.trace_id = t.trace_id
WHERE 1=1`

const SelectServices = `
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const SelectServices = `
// We use FINAL to ensure ClickHouse fully merges the data before returning the result.
// See https://clickhouse.com/docs/sql-reference/statements/select/from#final-modifier
const SelectServices = `

however, according to Gemini we should be using distinct here, which is faster than forcing the merge logic.

also, why are you using ReplacingMergeTree and not AggregatingMergeTree? The latter is what we need here, and the query should use GROUP BY serviceName

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yurishkuro I believe that SELECT DISTINCT will keep any unique row whereas using FINAL leverages the deduplication by the MergeTree engine (taking the latest version). In this case, they end up being the same but I thought it made more sense to leverage the engine.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same with AggregatingMergeTree - the effect in this case is the same. AggregatingMergeTree is more for "aggregation"-based workflows like sum, avg, etc whereas ReplacingMergeTree is for deduplication. In either case, we need to use the FINAL keyword to perform the merge. The snippet below shows what the queries return with AggregatingMergeTree.

Mahads-MacBook-Air.local :) select * from services

SELECT *
FROM services

Query id: 790fbbe2-d30a-46b4-b24d-350991fe64d2

   ┌─name──────────┐
1. │ frontend      │
2. │ order-service │
3. │ user-service  │
   └───────────────┘
   ┌─name──────────┐
4. │ frontend      │
5. │ order-service │
6. │ user-service  │
   └───────────────┘

6 rows in set. Elapsed: 0.005 sec. 

Mahads-MacBook-Air.local :) select * from services final

SELECT *
FROM services
FINAL

Query id: 3fa57a0e-83e7-4127-9d6e-ca23069b30b4

   ┌─name──────────┐
1. │ frontend      │
2. │ order-service │
3. │ user-service  │
   └───────────────┘

3 rows in set. Elapsed: 0.008 sec. 

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, why are you using ReplacingMergeTree and not AggregatingMergeTree? The latter is what we need here, and the query should use GROUP BY serviceName

this is more important question. Gemini says it's more efficient

image

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image Image

mahadzaryab1 and others added 4 commits December 30, 2025 16:50
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
@mahadzaryab1 mahadzaryab1 changed the title [fix][clickhouse] Use FINAL Modifier For Performing Merge [fix][clickhouse] Optimize AggregatingMergeTree Queries Dec 31, 2025
@yurishkuro yurishkuro enabled auto-merge December 31, 2025 16:08
@mahadzaryab1 mahadzaryab1 changed the title [fix][clickhouse] Optimize AggregatingMergeTree Queries [fix][clickhouse] Optimize Service and Operation Retrieval Queries Dec 31, 2025
@yurishkuro yurishkuro added this pull request to the merge queue Dec 31, 2025
Merged via the queue into jaegertracing:main with commit b243cda Dec 31, 2025
59 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/storage changelog:experimental Change to an experimental part of the code enhancement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants