-
Notifications
You must be signed in to change notification settings - Fork 974
Making tracing SDK metrics aware #381
Description
Making Tracing API metrics aware
One interesting aspect of Open Telemetry is the goal to provide tracing, metrics, and later a logging API. Right now these APIs are fairly separate so as an end-user or library owner I would need to write code to add a span and some more code to add a metric. This is not different that I would need to do today with OpenCensus or with OpenTracing and a metrics library (Prometheus, DropWizard, etc.), but I think OT would allow this to be simplified due to it's all in one nature.
With the metrics and API that takes a Measurement it feels like the integration with Spans would come fairly natural.
Why
Single code point extension, get metrics for "free".
As a end-user I want to write a little code as possible. Instead of couple of lines for metrics, a few more for tracing and one for logging, if I can get it all done in a single place that would be helpful.
Increased uptake of metrics library
Easily getting metrics from spans would be one further path that we could increase the uptake of the metrics library. Switching from OpenCensus and OpenTelemetry is pretty much required for tracing, but metrics doesn't have the same forced migration and will face a much more mature and larger ecosystem.
Span exploration
Currently the naming of Spans is rather close to that of using labels in metrics names, it ends up being an explosion in names that all have a different path, instead of having a HTTP request span that you can easily filter for properties you are interested in.
Filtering and sampling of Traces and Spans using metric data
One of the longer term interesting aspects of this closer relationship will be that metrics could be used by spans/traces do decide if a span should be after the trace is complete/tail-based sampling. Since the metrics name is know (or the same as the span name) as well as labels used for the metric you would be able to easily query a metrics db to get for example the 99%tile for the last 24 hour and explicitly store all of those metrics as well as any randomly sampled traces.
How
Others will probably have better suggestion, but an initial idea would be to have the constructor/builder pattern simply allow, generateMetric(boolean) as part of building the Span.
As for labels I think it would be interesting to discuss if labels should be part of the span. One could think of it as being different layers of contextual data for a span. Top layer would be Resources (process level labels), Labels (general request labels), and Attributed (request unique (or high cardinality) labels). Metrics and spans would then be exported with these as appropriate.
Naming of spans could be the same as the metric for easy association and lookup, but for backwards compatibility we could allow a Operation Name the constructed through a templated string that could read a label, such as {{label.method}}:{{label:parameterized_url}} for a HTTP request span.