Conversation
|
|
||
| | Field | Type | Description | | ||
| | ----- | ---- | ----------- | | ||
| | resource_profiles | repeated [`ResourceProfiles`](#message-resourceprofiles) | An array of `ResourceProfiles`.<br><br>For data coming from an SDK profiler this array will typically contain one element. Host-level profilers will usually create one `ResourceProfiles` per container, as well as an additional one grouping all samples from non-containerized processes. Other resource groupings are possible as well and clarified via [`Resource`](../resource/README.md) attributes and semantic conventions.<br><br>Tools that visualize profiles SHOULD prefer displaying `resource_profiles[0].scope_profiles[0].profiles[0]` by default. | |
There was a problem hiding this comment.
I tried to implement a consistent scheme with linking that avoids multiple hyperlinks to the same message in the same sentence (redundancy) but otherwise strives for complete coverage.
| | period | int64 | The number of events between sampled occurrences. | | ||
| | profile_id | bytes | A globally unique identifier for the profile (16-byte array) that MAY be used for deduplication and signal correlation. An ID with all zeroes is considered invalid. This field is optional; an ID may be assigned to an ID-less profile in a later step. | | ||
| | dropped_attributes_count | uint32 | The number of attributes that were discarded. Attributes can be discarded because their keys are too long or because there are too many attributes. If 0, no attributes were dropped. | | ||
| | original_payload_format | string | The original payload format. See [Known values](./README.md#known-values) for allowed formats. It MUST be set together with `original_payload` or both left unset. [optional] | |
There was a problem hiding this comment.
Regarding '[optional]', I'm only using it to mark fields that are otherwise not described as optional.
Co-authored-by: Roger Coll <roger.coll@elastic.co>
Co-authored-by: Fabrizio Ferri-Benedetti <fabri.ferribenedetti@elastic.co>
| The protocol specification is defined in the | ||
| [profiles.proto](https://github.com/open-telemetry/opentelemetry-proto/blob/main/opentelemetry/proto/profiles/v1development/profiles.proto) | ||
| protobuf file and is based on the [pprof](https://github.com/google/pprof/tree/main/proto) format. | ||
| This means that pprof and other established profiling formats can, in most cases, |
There was a problem hiding this comment.
RE "in most cases" - I think for pprof we agreed to have a specific guarantee that it's "always".
| Sample -. "1-n" .-> KeyValueAndUnit | ||
| Sample -. "n-1" .-> Link | ||
|
|
||
| Stack -. "1-n" .-> Location |
There was a problem hiding this comment.
We probably discussed this before, but should "1-n" here be "n-n"? Otherwise I find it inconsistent:
- For Sample -> Link we say "n-1" meaning that multiple samples could point to the same link object in the dictionary.
- But the dictionary part is true for Stack -> Location too, plus the stack message contains multiple locations.
There was a problem hiding this comment.
I updated the Sample -> KeyValueAndUnit, Location -> KeyValueAndUnit, Mapping -> KeyValueAndUnit relationships to be n-n too.
|
|
||
| ### Message referencing | ||
|
|
||
| Most OpenTelemetry signals use direct embedding: a span in a trace embeds |
There was a problem hiding this comment.
"by value" vs "by reference" might be a useful metaphor.
aalexand
left a comment
There was a problem hiding this comment.
This looks great, thanks for this work!
|
|
||
| ## Notable differences compared to other signals | ||
|
|
||
| ### Message referencing |
There was a problem hiding this comment.
I wonder if it would make sense to move some of the motivation description from the dictionary section below to this section so that the "why" is clearer earlier.
There was a problem hiding this comment.
I'll condense some of it down to 2-3 lines as an introduction and guiding principle.
| with the profiles-specific extension of string reference fields for keys and values | ||
| (see [next section](#dictionary-use-in-keyvalue)). | ||
|
|
||
| 2. **[`KeyValueAndUnit`](#message-keyvalueandunit) attributes**: |
There was a problem hiding this comment.
Do we want to mention that we might consider making the unit available as a part of the regular KeyValue type in OTel? Maybe not since no decision was made.
There was a problem hiding this comment.
I think we shouldn't mention it here yet to avoid creating expectations.
There was a problem hiding this comment.
I agree, let's avoid mentioning it, since it is highly speculative at this point. It will create more questions than provide answers.
| stacks, locations, functions and associated metadata. It modifies and annotates | ||
| pprof Profile with OpenTelemetry specific fields. | ||
|
|
||
| Note that whilst some fields in this message retain the name and field id from pprof |
There was a problem hiding this comment.
Hmm, I found this paragraph confusing and I think we should remove it.
I think we deleted the ID fields so this part is out of date.
Some field names are maybe indeed inherited from pprof but we also fixed many of them such as make plural compared to pprof, and where we do reuse names I think it's because we think they make sense not because we are trying to map pprof names exactly.
|
|
||
| Measurements encoded in this format should follow the following conventions: | ||
|
|
||
| - Consumers should treat unset optional fields as if they had been |
There was a problem hiding this comment.
I would mention specifically something like "we discourage using the protobuf presence semantics, even if available in the given instantiation of the protobuf generated API".
|
|
||
| - Consumers should treat unset optional fields as if they had been | ||
| set with their default value. | ||
| - When possible, measurements should be stored in "unsampled" form |
There was a problem hiding this comment.
Do we want to state this stronger? Client-side unsampling is error-prone in case the sampling rate changes.
| - When possible, measurements should be stored in "unsampled" form | ||
| that is most useful to humans. There should be enough | ||
| information present to determine the original sampled values. | ||
| - The profile is represented as a set of samples, where each sample |
There was a problem hiding this comment.
This bullet doesn't quite belong to the "conventions" list as it's more about the data model than a convention?
Similarly for the two bullets below. I'd try to separate maybe the data model clarifications from conventions that are less captured in the protobuf definition.
| | period_type | [`ValueType`](#message-valuetype) | The kind of events between sampled occurrences. For example `["cpu", "cycles"]` or `["heap", "bytes"]`. | | ||
| | period | int64 | The number of events between sampled occurrences. | | ||
| | profile_id | bytes | A globally unique identifier for the profile (16-byte array) that MAY be used for deduplication and signal correlation. An ID with all zeroes is considered invalid. This field is optional; an ID may be assigned to an ID-less profile in a later step. | | ||
| | dropped_attributes_count | uint32 | The number of attributes that were discarded. Attributes can be discarded because their keys are too long or because there are too many attributes. If 0, no attributes were dropped. | |
There was a problem hiding this comment.
FWIW this is a field that I'm still unsure about whether we need it. I don't know of a practical use case for it. I think we largely inherited it from other signals.
There was a problem hiding this comment.
I'll leave it in for now, we can update if we reconsider it.
There was a problem hiding this comment.
This is normally used when limits are imposed at the producer (or a processor) that causes attributes to be dropped (removed) to make sure the number of attributes does not exceed the limit. I would imagine a profile producer/processor can also have the same logic, although at the Profile level this may not have much practical application.
|
|
||
| ### Message `Stack` | ||
|
|
||
| A stack trace encoded as a list of locations (leaf first). |
There was a problem hiding this comment.
Maybe we could give an example for the "leaf first" order nuance.
| | Field | Type | Description | | ||
| | ----- | ---- | ----------- | | ||
| | function_index | int32 | Reference to a [`Function`](#message-function) in [`ProfilesDictionary.function_table`](#message-profilesdictionary). | | ||
| | line | int64 | Line number in source code. 0 means unset. | |
There was a problem hiding this comment.
Maybe clarify "1-based"? It can be implied from "0 means unset" but I'd state it more explicitly.
|
|
||
| ### From other signals to profiles | ||
|
|
||
| Other signals can reference a profile using the `profile_id` field on the |
There was a problem hiding this comment.
profile_id is optional though, maybe make a note?
Co-authored-by: Alexey Alexandrov <aalexand@users.noreply.github.com>
|
@tigrannajaryan Can you please take another look? It'd be great to have this (and the parent PR here) merged before or close to the Alpha announcement. |
tigrannajaryan
left a comment
There was a problem hiding this comment.
None of my comments are blocking.
| allows other `_index` fields pointing into the dictionary to use `0` to | ||
| indicate "null" / "not set". |
There was a problem hiding this comment.
I think this is a bit confusing. Using index 0 does not mean the value is "not set", it means the value is set to the corresponding "zero value".
There was a problem hiding this comment.
Yes, but the semantics is "not set", this is the main purpose of the convention.
| with the profiles-specific extension of string reference fields for keys and values | ||
| (see [next section](#dictionary-use-in-keyvalue)). | ||
|
|
||
| 2. **[`KeyValueAndUnit`](#message-keyvalueandunit) attributes**: |
There was a problem hiding this comment.
I agree, let's avoid mentioning it, since it is highly speculative at this point. It will create more questions than provide answers.
| (see [next section](#dictionary-use-in-keyvalue)). | ||
|
|
||
| 2. **[`KeyValueAndUnit`](#message-keyvalueandunit) attributes**: | ||
| a Profiles-specific encoding inherited from pprof (where it is known |
There was a problem hiding this comment.
I am not sure we need to reference pprof everywhere. It is important to acknowledge it was an inspiration and give credit once, but I don't know if it helps with explanation to have references to corresponding pprof parts in this doc.
Perhaps we should have a separate Otel<->Pprof correspondence doc (like we have for example Otel<->Prometheus), where we can go into significantly more detail if necessary.
There was a problem hiding this comment.
I'll take it out as the references to pprof are indeed numerous. I previously proposed having an appendix document where we go into more detail around historical decisions but this could also live in the existing pprof.md document we have.
| and referenced by index from [`Profile`](#message-profile), | ||
| [`Sample`](#message-sample), [`Mapping`](#message-mapping) and [`Location`](#message-location) | ||
| messages. In addition to a key and value they carry an optional unit | ||
| field, allowing attributes such as `"allocation_size": 128 bytes` to |
There was a problem hiding this comment.
Nit: we are using UCUM for units, right? So that would be "allocation_size": 128 By to be more precise.
There was a problem hiding this comment.
I think the example we give here is meant to focus/clarify being able to split out / refer to the unit explicitly (instead of storing it inline or not storing it at all and relying on semconv). From that POV, it's best seen as an abstract case, not really suggesting a particular encoding related to the unit. Leaving it as "bytes" I think is more clarifying relating to this purpose.
On the other hand it's confusing given the UCUM reference later, so I'll switch it and add a link to clarify.
|
|
||
| | Field | Type | Description | | ||
| | ----- | ---- | ----------- | | ||
| | resource_profiles | repeated [`ResourceProfiles`](#message-resourceprofiles) | An array of `ResourceProfiles`.<br><br>For data coming from an SDK profiler this array will typically contain one element. Host-level profilers will usually create one `ResourceProfiles` per container, as well as an additional one grouping all samples from non-containerized processes. Other resource groupings are possible as well and clarified via [`Resource`](../resource/README.md) attributes and semantic conventions.<br><br>Tools that visualize profiles SHOULD prefer displaying `resource_profiles[0].scope_profiles[0].profiles[0]` by default. | |
There was a problem hiding this comment.
SHOULD prefer displaying
resource_profiles[0].scope_profiles[0].profiles[0]by default
Can you clarify why this preference?
There was a problem hiding this comment.
I think this comes from pprof, maybe @aalexand can clarify
There was a problem hiding this comment.
pprof has a different convention - there is a dedicated default_sample_type field and if it's not set the last sample type is the default.
For the Profiles OTLP we thought that this "last by default" convention is too non-obvious and there needs to be some recommended default so that the producers can know where to put the most important / valuable profile, so we said it's first.
Some more details were previously written down here where we talked about the interoperability of this nuance with pprof.
| | period_type | [`ValueType`](#message-valuetype) | The kind of events between sampled occurrences. For example `["cpu", "cycles"]` or `["heap", "bytes"]`. | | ||
| | period | int64 | The number of events between sampled occurrences. | | ||
| | profile_id | bytes | A globally unique identifier for the profile (16-byte array) that MAY be used for deduplication and signal correlation. An ID with all zeroes is considered invalid. This field is optional; an ID may be assigned to an ID-less profile in a later step. | | ||
| | dropped_attributes_count | uint32 | The number of attributes that were discarded. Attributes can be discarded because their keys are too long or because there are too many attributes. If 0, no attributes were dropped. | |
There was a problem hiding this comment.
This is normally used when limits are imposed at the producer (or a processor) that causes attributes to be dropped (removed) to make sure the number of attributes does not exceed the limit. I would imagine a profile producer/processor can also have the same logic, although at the Profile level this may not have much practical application.
| | profile_id | bytes | A globally unique identifier for the profile (16-byte array) that MAY be used for deduplication and signal correlation. An ID with all zeroes is considered invalid. This field is optional; an ID may be assigned to an ID-less profile in a later step. | | ||
| | dropped_attributes_count | uint32 | The number of attributes that were discarded. Attributes can be discarded because their keys are too long or because there are too many attributes. If 0, no attributes were dropped. | | ||
| | original_payload_format | string | The original payload format. See [Known values](./README.md#known-values) for allowed formats. It MUST be set together with `original_payload` or both left unset. [optional] | | ||
| | original_payload | bytes | The original payload bytes. It MUST be set together with `original_payload_format` or both left unset. [optional] | |
There was a problem hiding this comment.
Do we want to explain how an original payload can end up here?
There was a problem hiding this comment.
Good catch, I'll add a few lines to clarify.
| | ----- | ---- | ----------- | | ||
| | key_strindex | int32 | Index into [`ProfilesDictionary.string_table`](#message-profilesdictionary). | | ||
| | value | [`AnyValue`](../common/README.md) | The value of the attribute. | | ||
| | unit_strindex | int32 | Index into [`ProfilesDictionary.string_table`](#message-profilesdictionary). 0 indicates implicit (by semantic convention) or undefined unit. | |
There was a problem hiding this comment.
Do we want to say the string is in UCUM format?
There was a problem hiding this comment.
As UCUM is the standard across OTel, I'll make an explicit reference to it here.
CC: @open-telemetry/profiling-maintainers @open-telemetry/profiling-approvers
Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com>
|
@christos68k can you update PR title/description to use "data format" instead of "data model". |
Done |
|
@jsuereth please take a look. |
|
@open-telemetry/specs-approvers PTAL |
Changes
Adds a data format description for OpenTelemetry profiles.
I took some content from the old OTEP but decided to leave all of the historical information out, both for clarity/focus and to improve readability (a technical specification doesn't read very well by nature).
We could add additional references and expand on design decisions in a future
data-format-appendix.mddocument.Also see: #4932 (This PR is a follow-up)
CC: @theletterf @tigrannajaryan @open-telemetry/profiling-maintainers @open-telemetry/profiling-approvers
For non-trivial changes, follow the change proposal process.
CHANGELOG.mdfile updated for non-trivial changes[chore]in the PR title to skip the changelog check