-
Notifications
You must be signed in to change notification settings - Fork 229
Description
Hey @tdunning,
This is a re-posting of the now closed #55. I'm interested in re-opening the exploration into how to properly decay a t-digest. The t-digest is attractive due to its high precision and relatively compact serialization with no range configuration. It is a perfect fit for approximate quantile estimation of large data sets. It seems to me that it could additionally be a good data structure to track activity on a server.
The common data structure people reach for in the context of server monitoring is the HDR histogram. The especially nice thing about the histogram in this setting is that it is robust to non-uniform sampling rate. A common monitoring architecture today (using something like https://prometheus.io/) will collect a snapshot of values from servers periodically where the collecting server chooses the timestamp. Histograms can report cumulative values and then can subtract the previous timestamp's bucket counts from the current to get a view of the data between the two.
The t-digest doesn't offer this convenient subtraction mechanism making it more difficult to use in this sort of setting. One solution people sometimes offer is to use collection as a trigger to reset the t-digest. This is problematic in the face of more than one collector (as is the recommendation for HA prometheus).
An alternative to only reporting the cumulative distribution would be to sample a distribution which represents the previous trailing period. I've experimented with this a bit and it seems roughly reasonable though with some unexpected behavior as few new points are recorded and the count decays away.
The approach in the blog post linked to #55 feels both high in terms of overhead and it is not completely obvious to me exactly how to map that idea on to the t-digest. Have you given this topic more thought? Do you have a hunch on good approaches?