Add SDK span processors spec#205
Conversation
| @@ -0,0 +1,61 @@ | |||
| # Span processor | |||
There was a problem hiding this comment.
How to register a Span processor?
Do we allow multiple processors? (is there any guarantee on the ordering).
There was a problem hiding this comment.
I think we could guarantee the order. I have added this:
Span processors can be registered via a method on SDK Tracer. The registered processors are invoked in the same order as they were registered.
There was a problem hiding this comment.
Thanks @pavolloffay for answering the 2nd question.
Regarding the 1st question "How to register a Span processor", do you intend to cover it in a separate PR for the Tracer API change? Do we expect both register and unregister (or only allow to register during the Tracer initialization/ctor)?
There was a problem hiding this comment.
Currently I implemented this with only Register and can be done at any moment.
There was a problem hiding this comment.
I have already documented the registration step. Here is the copy but please comment on the code if something should be fixed.
Span processors can be registered directly on SDK Tracer. The processors are invoked in the same order as they were registered.
specification/sdk-span-processors.md
Outdated
|
|
||
| * `exporter` - the exporter where the spans are pushed. | ||
| * `maxQueueSize` - the maximum size of the queue. After the size is reached spans are dropped. | ||
| * `scheduledDelayMilllis` - the delay interval between two consecutive exports. |
There was a problem hiding this comment.
Another option is exportInterval, actualDelayMilllis = max(0, exportInterval - timeTakenToExport).
Wish to discuss here and see which one is preferred.
There was a problem hiding this comment.
One scenario is "I want to export traces/logs/metrics every 5 seconds" and "I don't expect to export data every 7 seconds due to the fact that export operation itself is using 2 seconds in environment XYZ".
There was a problem hiding this comment.
One scenario is "I want to export traces/logs/metrics every 5 seconds" and "I don't expect to export data every 7 seconds due to the fact that export operation itself is using 2 seconds in environment XYZ".
The export should not be called concurrently, this is from exporter spec:
Export() will never be called concurrently for the same exporter instance. Export() can be called again only after the current call returns.
|
|
||
| **Configurable parameters:** | ||
|
|
||
| * `exporter` - the exporter where the spans are pushed. |
There was a problem hiding this comment.
How to register multiple exporters?
There was a problem hiding this comment.
At least in Java there is MultiSpanExporter exporter implementation which exports data to multiple exporters.
There was a problem hiding this comment.
@bogdandrutu shall we add MultiSpanExporter to the spec? I think it's an implementation detail.
There was a problem hiding this comment.
I agree, let's not be too restrictive about the implementation.
specification/sdk-span-processors.md
Outdated
|
|
||
| ## Interface definition | ||
|
|
||
| ### OnStart(SpanData) |
There was a problem hiding this comment.
The main point it to make clear that passed object exposes data of the Span.
tracing-api doc references it https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/api-tracing.md#spandata. It's a readable span object.
There was a problem hiding this comment.
The span processors are invoked only on sampled spans.
Although this is what the exporters defined by the SDK will do, I don't feel is really a requirement - after all, sampling exists as a hint (as of now).
specification/sdk-span-processors.md
Outdated
| **Configurable parameters:** | ||
|
|
||
| * `exporter` - the exporter where the spans are pushed. | ||
| * `maxQueueSize` - the maximum size of the queue. After the size is reached spans are dropped. |
There was a problem hiding this comment.
Do we have any default values for these params?
There was a problem hiding this comment.
There are defaults in Java https://github.com/open-telemetry/opentelemetry-java/blob/master/sdk/src/main/java/io/opentelemetry/sdk/trace/export/BatchSampledSpansProcessor.java#L92. I think we could use the same across languages.
There was a problem hiding this comment.
Does it make sense to add those here?
There was a problem hiding this comment.
That seems like the wrong level of abstraction for the spec. Most other defaults and constants aren't documented here unless they're defined in other places, e.g. the W3C header length limits.
|
|
||
| ### Simple processor | ||
|
|
||
| The implementation of `SpanProcessor` that passes ended span directly to the configured `SpanExporter`. |
There was a problem hiding this comment.
What's the relationship between processors and exporters? It seems like we're using processors to (1) expose span start and end hooks to vendors and (2) batch spans to send to the exporter. Why use the same component for both?
If processors can't modify the spans that they pass to the exporters, why not call the processors and exporters in parallel?
There was a problem hiding this comment.
If processors can't modify the spans that they pass to the exporters, why not call the processors and exporters in parallel?
From exporter spec:
Allow implementing helpers as composable components that use the same chainable Exporter interface. SDK authors are encouraged to implement common functionality such as queuing, batching, tagging, etc. as helpers. This functionality will be applicable regardless of what protocol exporter is used.
The processor can be considered as a helper for the exporter. Perhaps we should document this in the first section. The exporter should only export and do not batch.
There was a problem hiding this comment.
In my mind the model was this:
SDK -> SpanProcessor (an implementation that batches all Spans) -> SpanExporter(List<SpanData/SpanProto>).
SpanExporter is one of the functionality that we can implement with the SpanProcessor, but for example some implementation may implement a SpanProcessor that adds extra attributes onStart.
There was a problem hiding this comment.
@bogdandrutu Maybe it would be worth adding a note here, to mention that (custom) SpanProcessors can actually modify the received Span?
specification/sdk-span-processors.md
Outdated
| The implementation of the `SpanProcessor` that batches ended spans and pushes them to the configured `SpanExporter`. | ||
|
|
||
| First the spans are added to a synchronized queue, then exported to the exporter pipeline in batches. | ||
| When the queue gets half full a preemptive notification is sent to the worker thread to wake up and start a new export cycle. |
There was a problem hiding this comment.
Does this mean we export faster than the scheduled interval when the queue is half full?
In any case this sounds to me like documentation for the java implementation. What about something like "The implementation is responsible for managing the span queue and sending batches of spans to the exporters."?
There was a problem hiding this comment.
Yes, this text is originally from javadoc. We can make it more generic and give the implementors more freedom how the queue is managed and when it is flushed.
specification/sdk-span-processors.md
Outdated
|
|
||
| **Parameters:** | ||
|
|
||
| * `SpanData` - a readable span object. |
There was a problem hiding this comment.
I don't know if it's kosher to change the spec to suit pending RFCs, but open-telemetry/oteps#8 suggests this might have to be changed to "span" soon.
There was a problem hiding this comment.
open-telemetry/oteps#8 It talks about removing it from the API, it might happen that it will be moved to SDK spec no?
There was a problem hiding this comment.
But in that case it sounds like we shouldn't reference it from the spec.
There was a problem hiding this comment.
something like SpanData can exists at the SDK level, there needs to be some sort of common concrete type that exporters can use. Having it at the API though added a more surface area and complexity though.
There was a problem hiding this comment.
At this moment (in Java) we actually get a low-level interface that returns the (read-only) protobuf representation of a Span. Wondering if that is, well, low-level ;)
|
We recently discussed batched span processors in the javascript sig and some concerns came up. Batching can have a lot of parameters (backoff, retry, dropping, etc.), so our worry is that trying to provide an out of the box batching solution won't work for some vendors, requiring the user to do more upfront, vendor specific configuration. We decided in our sig meeting to send spans to all exporters on end, and span processors are a place where users/vendors can hook into for extra sauce. I personally don't mind making vendors write more code if it means users will be able to write less. I'd really like to learn more about the thought process and previous discussion since #155 doesn't have any information. |
|
The (backoff, retry, dropping, etc.) seems more related to the retry mechanism than the batching. I think the point here is to document something generic enough which will work across most languages.
+1 perhaps @bogdandrutu or @SergeyKanzhelev could provide some details of what was discussed before. |
That makes me wonder whether different languages might actually need to provide different strategies here, depending on the thread/execution model... |
|
I know that we are trying to document the current SDK behavior. But I really think we need to do more work to define the requirements for the SDK. It's hard to judge a design proposal such as this, without understanding the intended use cases. |
bogdandrutu
left a comment
There was a problem hiding this comment.
Overall it looks good, just some more thoughts that I have about this.
specification/sdk-span-processors.md
Outdated
|
|
||
| ## Interface definition | ||
|
|
||
| ### OnStart(SpanData) |
There was a problem hiding this comment.
In order to allow performance improvements and lazy initialized the SpanData I would suggest we go with a ReadableSpan which may expose some properties directly and a toSpanData or toSpanProto. The reason is that you don't want to construct that object all the time unless someone really needs it. Maybe a LazySpanData or something similar achieves the same thing.
Other fields that I have in mind may be information like isOutOfBand added in open-telemetry/oteps#8. So I think we need to expose an interface here that allows us to also expose other data.
There was a problem hiding this comment.
Also I heard that some implementation of the SpanProcessor may want to add attributes to the Span itself or events. So it may be interesting to here if the Span itself should be exposed (maybe the SdkSpan that can have extra capabilities).
There was a problem hiding this comment.
I didn't use ReadableSpan because it is not defined anywhere, whereas SpanData is defined in tracing spec.
I will rename SpanData to just Span. The comment in the parameters will says:
Span- a readable span object.
|
|
||
| ### Simple processor | ||
|
|
||
| The implementation of `SpanProcessor` that passes ended span directly to the configured `SpanExporter`. |
There was a problem hiding this comment.
In my mind the model was this:
SDK -> SpanProcessor (an implementation that batches all Spans) -> SpanExporter(List<SpanData/SpanProto>).
SpanExporter is one of the functionality that we can implement with the SpanProcessor, but for example some implementation may implement a SpanProcessor that adds extra attributes onStart.
|
|
||
| First the spans are added to a synchronized queue, then exported to the exporter pipeline in batches. | ||
| The implementation is responsible for managing the span queue and sending batches of spans to the exporters. | ||
| This processor can cause high contention in a very high traffic service. |
There was a problem hiding this comment.
Would like to understand more about high contention. Is this because of the queue insertion operation? There are lock-free queue implementations which doesn't have high contention even under heavy traffic.
Another question - do we expect the queue to guarantee FIFO?
There was a problem hiding this comment.
even a lock-free queue can cause false sharing which is also very bad and hard to quantify.
batching does not require to guarantee fifo, and also can drop spans when high traffic (based on a config). For an example of this see https://github.com/open-telemetry/opentelemetry-java/blob/master/sdk/src/main/java/io/opentelemetry/sdk/trace/export/BatchSampledSpansProcessor.java
We also execute the export on a different thread.
specification/sdk-span-processors.md
Outdated
| Span processor is an interface which allows hooks for span start and end method invocations. | ||
| The span processors are invoked only on sampled spans. This interface can be used as a helper for span exporter to batch and convert spans see [sdk-exporter-spec](sdk-exporter.md). | ||
|
|
||
| Span processors can be registered directly on SDK Tracer. The processors are invoked in the same order as they were registered. |
There was a problem hiding this comment.
Where is this interface defined?
There was a problem hiding this comment.
I would want to see something like Tracer.RegisterSpanProcessor covered in this PR.
There was a problem hiding this comment.
I think that should belong to the tracer SDK spec, which we don't have at the moment.
6a4b4e2 to
47b4bbb
Compare
|
@bogdandrutu @reyang @c24t I have updated the PR. The only change was to remove |
|
|
||
| **Configurable parameters:** | ||
|
|
||
| * `exporter` - the exporter where the spans are pushed. |
There was a problem hiding this comment.
I agree, let's not be too restrictive about the implementation.
| @@ -0,0 +1,80 @@ | |||
| # Span processor | |||
|
|
|||
| Span processor is an interface which allows hooks for span start and end method invocations. | |||
There was a problem hiding this comment.
Maybe an answer:
(a) if I must make a copy of the data, it's a Processor
(b) if I can take a reference to the data, it's an Exporter
In my understanding, it's rather the other way round: A Span-processor gets a mutable reference to a single Span synchronously. A exporter on the other hand may get one or more read-only Spans (or some data transfer object to which the Span was converted) delayed, asynchronously or never (queue full).
|
|
||
| **Parameters:** | ||
|
|
||
| * `Span` - a readable span object. |
There was a problem hiding this comment.
I would rather like the Span processor to get a mutable reference to the same span object that is started/ended. I imagine a common use-case for span processors would be to attach context information like thread/coroutine ID, etc.
Although it is true that the Processor (at least in OnEnd) should also be able to read the data, which may or may not be possible with a sdk.Span object. Maybe we must specify an additional Span interface with getters?
There was a problem hiding this comment.
Maybe we must specify an additional Span interface with getters?
The SpanData could be considered as a readable interface, however it is going to be removed #215
There was a problem hiding this comment.
I believe we eliminated SpanData from the user-facing API. I could believe it's still meant as a part of the SDK processor API.
Signed-off-by: Pavol Loffay <ploffay@redhat.com>
Signed-off-by: Pavol Loffay <ploffay@redhat.com>
Signed-off-by: Pavol Loffay <ploffay@redhat.com>
Signed-off-by: Pavol Loffay <ploffay@redhat.com>
Signed-off-by: Pavol Loffay <ploffay@redhat.com>
Signed-off-by: Pavol Loffay <ploffay@redhat.com>
Signed-off-by: Pavol Loffay <ploffay@redhat.com>
Signed-off-by: Pavol Loffay <ploffay@redhat.com>
Signed-off-by: Pavol Loffay <ploffay@redhat.com>
9a0d656 to
de3a388
Compare
|
Can anybody merge this API or pinpoint what should be improved or what is missing. cc) @open-telemetry/specs-approvers @bogdandrutu @yurishkuro @SergeyKanzhelev |
* Add SDK span processors spec Signed-off-by: Pavol Loffay <ploffay@redhat.com> * Fixes Signed-off-by: Pavol Loffay <ploffay@redhat.com> * Update review comments Signed-off-by: Pavol Loffay <ploffay@redhat.com> * Small fixes Signed-off-by: Pavol Loffay <ploffay@redhat.com> * Remove span data Signed-off-by: Pavol Loffay <ploffay@redhat.com> * Remove sampled Signed-off-by: Pavol Loffay <ploffay@redhat.com> * Add is recording events Signed-off-by: Pavol Loffay <ploffay@redhat.com> * change style of the links Signed-off-by: Pavol Loffay <ploffay@redhat.com> * Add ascii diagram Signed-off-by: Pavol Loffay <ploffay@redhat.com> * Smaller improvements based on review comments Signed-off-by: Pavol Loffay <ploffay@redhat.com> * Fix typo Signed-off-by: Pavol Loffay <ploffay@redhat.com>
Caught by: -Wpessimizing-move
* Add SDK span processors spec Signed-off-by: Pavol Loffay <ploffay@redhat.com> * Fixes Signed-off-by: Pavol Loffay <ploffay@redhat.com> * Update review comments Signed-off-by: Pavol Loffay <ploffay@redhat.com> * Small fixes Signed-off-by: Pavol Loffay <ploffay@redhat.com> * Remove span data Signed-off-by: Pavol Loffay <ploffay@redhat.com> * Remove sampled Signed-off-by: Pavol Loffay <ploffay@redhat.com> * Add is recording events Signed-off-by: Pavol Loffay <ploffay@redhat.com> * change style of the links Signed-off-by: Pavol Loffay <ploffay@redhat.com> * Add ascii diagram Signed-off-by: Pavol Loffay <ploffay@redhat.com> * Smaller improvements based on review comments Signed-off-by: Pavol Loffay <ploffay@redhat.com> * Fix typo Signed-off-by: Pavol Loffay <ploffay@redhat.com>
Co-authored-by: Joao Grassi <joao@joaograssi.com> Co-authored-by: Trask Stalnaker <trask.stalnaker@gmail.com> Co-authored-by: Armin Ruech <armin.ruech@dynatrace.com>
Resolves #155
Resolves #54
Block when the queue is full processorwill be added in a separate PR.Signed-off-by: Pavol Loffay ploffay@redhat.com