Add metrics for remote cluster state metadata transfer timeouts#20809
Add metrics for remote cluster state metadata transfer timeouts#20809zheliu2 wants to merge 2 commits into
Conversation
PR Reviewer Guide 🔍(Review updated until commit 8fc1f76)Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Latest suggestions up to 8fc1f76 Explore these optional code suggestions:
Previous suggestionsSuggestions up to commit 8e5d76c
Suggestions up to commit ce37bb9
|
…search-project#10687) Add index_metadata_upload_timeout_count and metadata_upload_timeout_count counters to RemoteUploadStats to track when metadata uploads to the remote store time out. These metrics are emitted from writeMetadataInParallel when the latch await exceeds the configured timeout, distinguishing between index metadata and global metadata (coordination, settings, templates, etc.) transfer timeouts. Signed-off-by: zheliu2 <770120041@qq.com>
ce37bb9 to
8e5d76c
Compare
|
Persistent review updated to latest commit 8e5d76c |
|
❌ Gradle check result for 8e5d76c: null Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: zheliu2 <770120041@qq.com>
|
Persistent review updated to latest commit 8fc1f76 |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #20809 +/- ##
============================================
+ Coverage 73.27% 73.30% +0.02%
- Complexity 72125 72200 +75
============================================
Files 5794 5794
Lines 329826 329854 +28
Branches 47596 47600 +4
============================================
+ Hits 241686 241791 +105
+ Misses 68723 68707 -16
+ Partials 19417 19356 -61 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@shwetathareja Can you help find someone to review this change? I'm curious if we want to keep adding data like this to the stats API instead of leveraging the telemetry framework. |
|
This PR is stalled because it has been open for 30 days with no activity. |
Summary
index_metadata_upload_timeout_countandmetadata_upload_timeout_countcounters toRemoteUploadStatsto emit metrics when metadata uploads to the remote store time outwriteMetadataInParallel, the code now inspects which upload tasks did not complete and increments the appropriate counter (index metadata vs global metadata)Test plan
testTimeoutWhileWritingMetadatatest updated with assertions verifying the new metric counters are incremented correctly on timeoutremote_uploadextended fields