Skip to content

Time decay of t-digest #127

@ajwerner

Description

@ajwerner

Hey @tdunning,

This is a re-posting of the now closed #55. I'm interested in re-opening the exploration into how to properly decay a t-digest. The t-digest is attractive due to its high precision and relatively compact serialization with no range configuration. It is a perfect fit for approximate quantile estimation of large data sets. It seems to me that it could additionally be a good data structure to track activity on a server.

The common data structure people reach for in the context of server monitoring is the HDR histogram. The especially nice thing about the histogram in this setting is that it is robust to non-uniform sampling rate. A common monitoring architecture today (using something like https://prometheus.io/) will collect a snapshot of values from servers periodically where the collecting server chooses the timestamp. Histograms can report cumulative values and then can subtract the previous timestamp's bucket counts from the current to get a view of the data between the two.

The t-digest doesn't offer this convenient subtraction mechanism making it more difficult to use in this sort of setting. One solution people sometimes offer is to use collection as a trigger to reset the t-digest. This is problematic in the face of more than one collector (as is the recommendation for HA prometheus).

An alternative to only reporting the cumulative distribution would be to sample a distribution which represents the previous trailing period. I've experimented with this a bit and it seems roughly reasonable though with some unexpected behavior as few new points are recorded and the count decays away.

The approach in the blog post linked to #55 feels both high in terms of overhead and it is not completely obvious to me exactly how to map that idea on to the t-digest. Have you given this topic more thought? Do you have a hunch on good approaches?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions