Skip to content

Commit 2fb8d06

Browse files
Lms24sentrivanabuenaflor
authored
feat(develop/span-first): Add implementation guidelines page (#15717)
This PR adds a new "Implementation Guidelines" page to the Spans SDK develop section. Two purposes: - Provide a guideline how to approach implementing span-first - Document (temporary) decisions and open questions Anyone working on span first is encouraged to update this doc at any time! --------- Co-authored-by: Ivana Kellyer <[email protected]> Co-authored-by: Giancarlo Buenaflor <[email protected]>
1 parent 8beb5cf commit 2fb8d06

File tree

1 file changed

+130
-0
lines changed

1 file changed

+130
-0
lines changed
Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
---
2+
title: Implementation Guidelines
3+
sidebar_order: 10
4+
---
5+
6+
<Alert level="warning">
7+
🚧 This document is work in progress.
8+
The steps and suggestions in this document primarily serve as a means to document what SDKs so far have been doing when implementing Span-First.
9+
This page also serves as a place to document (temporary) decisions, trade-offs, considerations, etc.
10+
</Alert>
11+
12+
<Alert>
13+
This document uses key words such as "MUST", "SHOULD", and "MAY" as defined in [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt) to indicate requirement levels.
14+
</Alert>
15+
16+
This document provides guidelines for implementing Span-First in SDKs. This is purposefully NOT a full specification. For exact specifications, refer to the other pages under [Spans](..).
17+
18+
## How To Approach Span-First in SDKs
19+
20+
If you're implementing Span-First (as a PoC) in your SDK, take an iterative approach in which you implement the functionality incrementally. Here's a rough suggestion for iterations.
21+
22+
1. Add the Span v2 Envelope (type), serialization logic and any utilities necessary to support sending a new envelope. See [Span Protocol](../span-protocol) for more details.
23+
2. Add the top-level `traceLifecycle` (or `trace_lifecycle`) SDK init option which controls if traces should be sent as transactions or as spans (v2).
24+
- The allowed values for this option MUST be `'static'` and `'stream'`.
25+
- By default, the SDK MUST send traces as transactions (`'static'`). Span-First MUST be an opt-in feature.
26+
- Continue with adding Span-First logic which MUST only be applied if `traceLifecycle` is set to `'stream'`.
27+
3. As an initial PoC, leave your current transaction APIs in place and convert the transaction event to a v2 spans array to be sent in the new envelope.
28+
- At this point, you can already start sending spans in batches (i.e. in multiple envelopes) to send more than 1000 spans at once. The maximum number of spans per envelope MUST be limited to 1000 and an envelope MUST only contain spans from one trace (as the trace envelope header is shared).
29+
4. If applicable to your SDK, add new Span APIs to start spans. See [Span API](../span-api) for more details.
30+
- Most importantly, add the simplest possible `start_span` API that leaves much control to users.
31+
- Follow up with optional, more convenient APIs later.
32+
- This new API MUST only be used in conjunction with the new `traceLifecycle` option and therefore only emit new spans (no transactions).
33+
- This new API MUST NOT expose any old transaction properties or concepts like (`op`, `description`, `tags`, etc).
34+
- TBD: Some SDKs already have `startSpan` or similar APIs. The migration path is still TBD but a decision can be made at a later stage.
35+
5. Implement the `captureSpan` [single-span processing pipeline](#single-span-processing-pipeline)
36+
- Either reuse existing heuristics (e.g. flush when segment span ends) or build a simple span buffer to flush spans (e.g. similar to the existing buffers for logs or metrics).
37+
- Implementing the more complex [Telemetry Buffer](./../../telemetry-buffer) can happen at a later stage.
38+
6. Achieve data parity with the existing transaction events.
39+
- Ensure that the data added by SDK integrations, event processors, etc. to transaction events is also added to the spans (see [Event Processors](#tbd-event-processors)).
40+
- Most additional data MUST only be added to the segment span. See [Common Attributes](../span-protocol/#common-attribute-keys) for attributes that MUST be added to every span.
41+
- Mental model: All data our SDKs _automatically_ add to a transaction, MUST also be added to the segment span.
42+
7. Implement the span telemetry buffer for proper, weighted span flushing. See [Span Buffer](#span-buffer) for more details.
43+
8. (Optional) Depending on necessity, drop support for sending traces as transactions in the next major release. From this point on, the SDK will by default send spans (v2) only and therefore will no longer be compatible with current self-hosted Sentry installations.
44+
45+
46+
## Span APIs
47+
48+
To do: This section needs a few guidelines and implementation hints, including:
49+
- how to set a span active and remove it from the scope once it ends
50+
- languages having to deal with async context management
51+
- edge cases (e.g. adding a span with an explicit parent span that already ended)
52+
53+
## Single-Span Processing Pipeline
54+
55+
SDKs MUST expose a `captureSpan` API that takes a single span once it ends, and then processes and enqueues it into the span buffer. In most cases, this API SHOULD be exposed as a method on the `Client`. SDKs (e.g. JS Browser) MAY chose a different location if necessary.
56+
57+
Here's a rough overview of what `captureSpan` should do in which order:
58+
59+
1. Accept any span that already ended (i.e. has an `end_timestamp`)
60+
2. Obtain the current, isolation and global scopes and merge the scope data.
61+
3. Apply [common span attributes](../span-protocol/#common-attribute-keys) from the client and the merged scope data to every span.
62+
4. Apply the merged scope data (including scope attributes) to the span IFF it is a segment span.
63+
5. Apply any span processing hooks (i.e. event processor replacements) to the span.
64+
6. Apply the `before_send_span` callback to the span.
65+
7. Enqueue the span into the span buffer.
66+
67+
The `captureSpan` pipeline MUST NOT
68+
- drop any span
69+
- buffer spans before enqueuing them
70+
71+
### [TMP solution] Span Filtering
72+
73+
For the moment, we settled on `ignore_spans` being applied prior to span start. This means that the `captureSpan` pipeline doesn't have to handle filtering spans. However, there are some drawbacks with this approach, most prominently:
74+
- Not being able to filter on span names or data that is added/updated post span start
75+
- Not being able to filter entire segments (e.g. `http.server` segments for bot requests resulting in 404 errors)
76+
77+
We might revisit this, which could require changes to the single-span processing pipeline.
78+
79+
For now, this means though:
80+
- Whenever `ignore_spans` is applied, SDKs MUST NOT start an actual span. Instead, they SHOULD start a No-op ("non-recording") span, which has no influence on the trace hierarchy.
81+
- SDKS MUST record client outcomes for ignored spans
82+
- SDKs MUST apply `ignore_spans` to every span if at all possible (POTel SDKs are excepted, but encouraged to do so as well)
83+
84+
### [TBD] Event Processors
85+
86+
Given that spans no longer are events (as opposed to transactions), they don't go through our event processors, which are exensively used throughout the SDKs (clients, integrations) but also by users.
87+
Instead, we need to find another way for users or integrations to enrich and mutate spans.
88+
89+
For user-facing migration, we should try to solve every use case with `ignore_spans` (for filtering) and `before_send_span` (for enrichment, data scrubbing and span mutation).
90+
91+
For SDK-internal processing, we're still evaluating the preferred approach but there are two main options:
92+
93+
1. Expose new APIs for integrations (and secondarily users) to process a span.
94+
For example via SDK lifecycle hooks (implemented in the JS SDK).
95+
Every integration would have to listen to this hook and apply its logic to spans.
96+
SDKs need to add a subscriber to the hook everywhere where they currently add an event processor.
97+
- Pro: Clear separation and semantics
98+
- Pro: Easy to implement and maintain
99+
- Con: Leads to a lot of duplication whenever event processors apply to more than transaction events (these we can eventually drop once span-first becomes the default)
100+
- Con: Users have to rewrite their event processors or perhaps their integrations. Not many users write their own processors but they definitely exist. Also 3rd party published integrations would be affected.
101+
2. Construct a pseudo-event from the span and invoke event processors during `captureSpan`.
102+
Once the processors were applied, back-merge the modified pseudo event into the span.
103+
- Pro: Less duplication of code
104+
- Pro: No/less need to rewrite existing instrumentations/integrations to support span-first
105+
- Con: Because of the single-span processing approach, we cannot add child spans to the pseudo event. Even if we somehow made this possible, we have no guarantee that the entire span tree would be present. Similarly to the [span filtering implications](#tmp-solution-span-filtering).
106+
- Con: back-merging is complex and might not be able to cover every aspect
107+
- Con: Very obscure behaviour (to us and users) and contradicts our commitment to move away from events in the future.
108+
109+
SDK authors working on Span-First are encouraged to evaluate both options, try them out and provide perspective as well as better solutions.
110+
111+
## Span Buffer
112+
113+
This section is intentionally short because all buffering specification is being added to the [Telemetry Buffer](../../telemetry-buffer) page.
114+
115+
Some rough pointers:
116+
- Given that SDKs SHOULD materialize and freeze the DSC as late as possible, the span buffer SHOULD enqueue span instances and at _flush time_ serialize them to JSON.
117+
Before serialization, the span buffer SHOULD materialize and freeze the DSC on the segment span if not already done so.
118+
This ensures that the `trace` envelope header has the most up to date data from the DSC (e.g. relevant for `transaction` names in the DSC).
119+
- SDKs SHOULD follow one of the backend, mobile or browser telemetry buffer specifications.
120+
- It is expected and fine to implement the proper, weighted buffering logic as a final step in the Span-First project.
121+
Intermediate buffers MAY be simpler, for example disregard the priority logic and just buffer until a certain span length, size or time interval is reached.
122+
## Release
123+
124+
The initial PoC implementation of Span-First **SHOULD** be released in a **minor version** of the SDK.
125+
126+
- This feature is entirely opt-in via `traceLifecycle = 'stream'` and therefore does **not** introduce breaking changes to existing users.
127+
- The default tracing behavior (transaction-based) MUST remain unchanged until Span-First becomes the default in a future major release.
128+
- Release notes and user facing documentation SHOULD clearly describe:
129+
- the availability of Span-First behind the opt-in flag
130+
- any known limitations

0 commit comments

Comments
 (0)