-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[Draft] Identify stats for remote store feature #6789
Description
This is work in progress and we will keep adding more stats/metrics around remote store as we identify them.
Goal
Get visibility into remote store related operations. These stats would help in debugging an issue or monitor the cluster for potential issues. As we start ingesting data into remote store backed index, as a user, I would like to know if the segments and translog files are getting uploaded successfully to the configured remote store, if there are any failures, if the remote store is lagging etc.
Changes to existing APIs
- Index Stats API response should provide
remote_storeandremote_translogstats similar tostoreandtranslogstats - Cat Segments API should take a query parameter to provide details of segments in remote store
- Index Segments API should take a query parameter to provide details of segments in remote store
- Cat Recovery API should provide details on the recovery from remote store and remote translog
New APIs
Cat Remote Store
- Query Parameters
- Index Name - required
- Shard ID - optional
Remote Segment Store Stats
-
number of segment files that are uploaded to remote segment store
- Provides number of uploaded segments at the time of the API call
- This metric will not consider inactive segments
-
remote segment store lag with respect to local store
- number of segments
- Provides diff between number of segments on local and remote
- This will be used to understand if remote store is in sync with local or not
- size in bytes
- Provides diff between size of segments on local and those uploaded to remote.
- time in millis
- diff between creation time of last file created on local vs max creation time of file uploaded to remote store
- number of refresh checkpoints since the last successful upload
- number of segments
-
timestamp of last successful file upload
-
time taken to upload a segment file (total, avg, max, min, P90)
-
time taken to delete a segment file (total, avg, max, min, P90)
-
size of a segment file in bytes (avg, max, min, P90)
-
total upload failures
-
live/current upload failures
-
total delete failures
-
live/current delete failures
-
total successful uploads
-
total successful deletes
-
time spent in remote store uploads during refresh (total, avg, max, min, P90)
Remote Translog stats
- Mostly same as above (will add translog specific stats below)
Metadata
Metadata
Assignees
Labels
Type
Projects
Status