Add profiles data format by christos68k · Pull Request #4965 · open-telemetry/opentelemetry-specification

christos68k · 2026-03-19T13:14:52Z

Changes

Adds a data format description for OpenTelemetry profiles.

I took some content from the old OTEP but decided to leave all of the historical information out, both for clarity/focus and to improve readability (a technical specification doesn't read very well by nature).

We could add additional references and expand on design decisions in a future data-format-appendix.md document.

Also see: #4932 (This PR is a follow-up)

CC: @theletterf @tigrannajaryan @open-telemetry/profiling-maintainers @open-telemetry/profiling-approvers

For non-trivial changes, follow the change proposal process.

Related issues: #7874
CHANGELOG.md file updated for non-trivial changes
- For trivial changes, include [chore] in the PR title to skip the changelog check

christos68k · 2026-03-19T13:54:28Z

specification/profiles/data-format.md

+
+| Field | Type | Description |
+| ----- | ---- | ----------- |
+| resource_profiles | repeated [`ResourceProfiles`](#message-resourceprofiles) | An array of `ResourceProfiles`.<br><br>For data coming from an SDK profiler this array will typically contain one element. Host-level profilers will usually create one `ResourceProfiles` per container, as well as an additional one grouping all samples from non-containerized processes. Other resource groupings are possible as well and clarified via [`Resource`](../resource/README.md) attributes and semantic conventions.<br><br>Tools that visualize profiles SHOULD prefer displaying `resource_profiles[0].scope_profiles[0].profiles[0]` by default. |


I tried to implement a consistent scheme with linking that avoids multiple hyperlinks to the same message in the same sentence (redundancy) but otherwise strives for complete coverage.

christos68k · 2026-03-19T13:55:14Z

specification/profiles/data-format.md

+| period | int64 | The number of events between sampled occurrences. |
+| profile_id | bytes | A globally unique identifier for the profile (16-byte array) that MAY be used for deduplication and signal correlation. An ID with all zeroes is considered invalid. This field is optional; an ID may be assigned to an ID-less profile in a later step. |
+| dropped_attributes_count | uint32 | The number of attributes that were discarded. Attributes can be discarded because their keys are too long or because there are too many attributes. If 0, no attributes were dropped. |
+| original_payload_format | string | The original payload format. See [Known values](./README.md#known-values) for allowed formats. It MUST be set together with `original_payload` or both left unset. [optional] |


Regarding '[optional]', I'm only using it to mark fields that are otherwise not described as optional.

specification/profiles/data-model.md

specification/profiles/data-format.md

specification/profiles/data-model.md

Co-authored-by: Roger Coll <roger.coll@elastic.co>

Co-authored-by: Fabrizio Ferri-Benedetti <fabri.ferribenedetti@elastic.co>

aalexand · 2026-03-20T21:24:51Z

specification/profiles/data-format.md

+The protocol specification is defined in the
+[profiles.proto](https://github.com/open-telemetry/opentelemetry-proto/blob/main/opentelemetry/proto/profiles/v1development/profiles.proto)
+protobuf file and is based on the [pprof](https://github.com/google/pprof/tree/main/proto) format.
+This means that pprof and other established profiling formats can, in most cases,


RE "in most cases" - I think for pprof we agreed to have a specific guarantee that it's "always".

aalexand · 2026-03-20T21:28:35Z

specification/profiles/data-format.md

+    Sample -. "1-n" .-> KeyValueAndUnit
+    Sample -. "n-1" .-> Link
+
+    Stack -. "1-n" .-> Location


We probably discussed this before, but should "1-n" here be "n-n"? Otherwise I find it inconsistent:

For Sample -> Link we say "n-1" meaning that multiple samples could point to the same link object in the dictionary.

But the dictionary part is true for Stack -> Location too, plus the stack message contains multiple locations.

I updated the Sample -> KeyValueAndUnit, Location -> KeyValueAndUnit, Mapping -> KeyValueAndUnit relationships to be n-n too.

aalexand · 2026-03-20T21:29:33Z

specification/profiles/data-format.md

+
+### Message referencing
+
+Most OpenTelemetry signals use direct embedding: a span in a trace embeds


"by value" vs "by reference" might be a useful metaphor.

aalexand

This looks great, thanks for this work!

aalexand · 2026-03-20T21:30:32Z

specification/profiles/data-format.md

+
+## Notable differences compared to other signals
+
+### Message referencing


I wonder if it would make sense to move some of the motivation description from the dictionary section below to this section so that the "why" is clearer earlier.

I'll condense some of it down to 2-3 lines as an introduction and guiding principle.

aalexand · 2026-03-20T21:32:38Z

specification/profiles/data-format.md

+   with the profiles-specific extension of string reference fields for keys and values
+   (see [next section](#dictionary-use-in-keyvalue)).
+
+2. **[`KeyValueAndUnit`](#message-keyvalueandunit) attributes**:


Do we want to mention that we might consider making the unit available as a part of the regular KeyValue type in OTel? Maybe not since no decision was made.

I think we shouldn't mention it here yet to avoid creating expectations.

I agree, let's avoid mentioning it, since it is highly speculative at this point. It will create more questions than provide answers.

Sounds good.

aalexand · 2026-03-20T21:37:26Z

specification/profiles/data-format.md

+stacks, locations, functions and associated metadata. It modifies and annotates
+pprof Profile with OpenTelemetry specific fields.
+
+Note that whilst some fields in this message retain the name and field id from pprof


Hmm, I found this paragraph confusing and I think we should remove it.

I think we deleted the ID fields so this part is out of date.

Some field names are maybe indeed inherited from pprof but we also fixed many of them such as make plural compared to pprof, and where we do reuse names I think it's because we think they make sense not because we are trying to map pprof names exactly.

aalexand · 2026-03-20T21:38:37Z

specification/profiles/data-format.md

+
+Measurements encoded in this format should follow the following conventions:
+
+- Consumers should treat unset optional fields as if they had been


I would mention specifically something like "we discourage using the protobuf presence semantics, even if available in the given instantiation of the protobuf generated API".

aalexand · 2026-03-20T21:39:57Z

specification/profiles/data-format.md

+
+- Consumers should treat unset optional fields as if they had been
+  set with their default value.
+- When possible, measurements should be stored in "unsampled" form


Do we want to state this stronger? Client-side unsampling is error-prone in case the sampling rate changes.

aalexand · 2026-03-20T21:40:54Z

specification/profiles/data-format.md

+- When possible, measurements should be stored in "unsampled" form
+  that is most useful to humans.  There should be enough
+  information present to determine the original sampled values.
+- The profile is represented as a set of samples, where each sample


This bullet doesn't quite belong to the "conventions" list as it's more about the data model than a convention?

Similarly for the two bullets below. I'd try to separate maybe the data model clarifications from conventions that are less captured in the protobuf definition.

aalexand · 2026-03-20T21:43:07Z

specification/profiles/data-format.md

+| period_type | [`ValueType`](#message-valuetype) | The kind of events between sampled occurrences. For example `["cpu", "cycles"]` or `["heap", "bytes"]`. |
+| period | int64 | The number of events between sampled occurrences. |
+| profile_id | bytes | A globally unique identifier for the profile (16-byte array) that MAY be used for deduplication and signal correlation. An ID with all zeroes is considered invalid. This field is optional; an ID may be assigned to an ID-less profile in a later step. |
+| dropped_attributes_count | uint32 | The number of attributes that were discarded. Attributes can be discarded because their keys are too long or because there are too many attributes. If 0, no attributes were dropped. |


FWIW this is a field that I'm still unsure about whether we need it. I don't know of a practical use case for it. I think we largely inherited it from other signals.

I'll leave it in for now, we can update if we reconsider it.

This is normally used when limits are imposed at the producer (or a processor) that causes attributes to be dropped (removed) to make sure the number of attributes does not exceed the limit. I would imagine a profile producer/processor can also have the same logic, although at the Profile level this may not have much practical application.

aalexand · 2026-03-20T21:45:49Z

specification/profiles/data-format.md

+
+### Message `Stack`
+
+A stack trace encoded as a list of locations (leaf first).


Maybe we could give an example for the "leaf first" order nuance.

aalexand · 2026-03-20T21:46:49Z

specification/profiles/data-format.md

+| Field | Type | Description |
+| ----- | ---- | ----------- |
+| function_index | int32 | Reference to a [`Function`](#message-function) in [`ProfilesDictionary.function_table`](#message-profilesdictionary). |
+| line | int64 | Line number in source code. 0 means unset. |


Maybe clarify "1-based"? It can be implied from "0 means unset" but I'd state it more explicitly.

aalexand · 2026-03-20T21:48:22Z

specification/profiles/data-format.md

+
+### From other signals to profiles
+
+Other signals can reference a profile using the `profile_id` field on the


profile_id is optional though, maybe make a note?

Co-authored-by: Alexey Alexandrov <aalexand@users.noreply.github.com>

christos68k · 2026-03-24T16:30:30Z

@tigrannajaryan Can you please take another look? It'd be great to have this (and the parent PR here) merged before or close to the Alpha announcement.

tigrannajaryan

None of my comments are blocking.

tigrannajaryan · 2026-03-24T17:52:36Z

specification/profiles/data-format.md

+  allows other `_index` fields pointing into the dictionary to use `0` to
+  indicate "null" / "not set".


I think this is a bit confusing. Using index 0 does not mean the value is "not set", it means the value is set to the corresponding "zero value".

Yes, but the semantics is "not set", this is the main purpose of the convention.

specification/profiles/data-format.md

tigrannajaryan · 2026-03-24T17:55:26Z

specification/profiles/data-format.md

+   with the profiles-specific extension of string reference fields for keys and values
+   (see [next section](#dictionary-use-in-keyvalue)).
+
+2. **[`KeyValueAndUnit`](#message-keyvalueandunit) attributes**:


I agree, let's avoid mentioning it, since it is highly speculative at this point. It will create more questions than provide answers.

tigrannajaryan · 2026-03-24T17:58:07Z

specification/profiles/data-format.md

+   (see [next section](#dictionary-use-in-keyvalue)).
+
+2. **[`KeyValueAndUnit`](#message-keyvalueandunit) attributes**:
+   a Profiles-specific encoding inherited from pprof (where it is known


I am not sure we need to reference pprof everywhere. It is important to acknowledge it was an inspiration and give credit once, but I don't know if it helps with explanation to have references to corresponding pprof parts in this doc.

Perhaps we should have a separate Otel<->Pprof correspondence doc (like we have for example Otel<->Prometheus), where we can go into significantly more detail if necessary.

I'll take it out as the references to pprof are indeed numerous. I previously proposed having an appendix document where we go into more detail around historical decisions but this could also live in the existing pprof.md document we have.

tigrannajaryan · 2026-03-24T17:58:47Z

specification/profiles/data-format.md

+   and referenced by index from [`Profile`](#message-profile),
+   [`Sample`](#message-sample), [`Mapping`](#message-mapping) and [`Location`](#message-location)
+   messages. In addition to a key and value they carry an optional unit
+   field, allowing attributes such as `"allocation_size": 128 bytes` to


Nit: we are using UCUM for units, right? So that would be "allocation_size": 128 By to be more precise.

I think the example we give here is meant to focus/clarify being able to split out / refer to the unit explicitly (instead of storing it inline or not storing it at all and relying on semconv). From that POV, it's best seen as an abstract case, not really suggesting a particular encoding related to the unit. Leaving it as "bytes" I think is more clarifying relating to this purpose.

On the other hand it's confusing given the UCUM reference later, so I'll switch it and add a link to clarify.

tigrannajaryan · 2026-03-24T18:00:52Z

specification/profiles/data-format.md

+
+| Field | Type | Description |
+| ----- | ---- | ----------- |
+| resource_profiles | repeated [`ResourceProfiles`](#message-resourceprofiles) | An array of `ResourceProfiles`.<br><br>For data coming from an SDK profiler this array will typically contain one element. Host-level profilers will usually create one `ResourceProfiles` per container, as well as an additional one grouping all samples from non-containerized processes. Other resource groupings are possible as well and clarified via [`Resource`](../resource/README.md) attributes and semantic conventions.<br><br>Tools that visualize profiles SHOULD prefer displaying `resource_profiles[0].scope_profiles[0].profiles[0]` by default. |


SHOULD prefer displaying resource_profiles[0].scope_profiles[0].profiles[0] by default

Can you clarify why this preference?

I think this comes from pprof, maybe @aalexand can clarify

pprof has a different convention - there is a dedicated default_sample_type field and if it's not set the last sample type is the default.

For the Profiles OTLP we thought that this "last by default" convention is too non-obvious and there needs to be some recommended default so that the producers can know where to put the most important / valuable profile, so we said it's first.

Some more details were previously written down here where we talked about the interoperability of this nuance with pprof.

tigrannajaryan · 2026-03-24T18:06:09Z

specification/profiles/data-format.md

+| period_type | [`ValueType`](#message-valuetype) | The kind of events between sampled occurrences. For example `["cpu", "cycles"]` or `["heap", "bytes"]`. |
+| period | int64 | The number of events between sampled occurrences. |
+| profile_id | bytes | A globally unique identifier for the profile (16-byte array) that MAY be used for deduplication and signal correlation. An ID with all zeroes is considered invalid. This field is optional; an ID may be assigned to an ID-less profile in a later step. |
+| dropped_attributes_count | uint32 | The number of attributes that were discarded. Attributes can be discarded because their keys are too long or because there are too many attributes. If 0, no attributes were dropped. |


This is normally used when limits are imposed at the producer (or a processor) that causes attributes to be dropped (removed) to make sure the number of attributes does not exceed the limit. I would imagine a profile producer/processor can also have the same logic, although at the Profile level this may not have much practical application.

tigrannajaryan · 2026-03-24T18:06:53Z

specification/profiles/data-format.md

+| profile_id | bytes | A globally unique identifier for the profile (16-byte array) that MAY be used for deduplication and signal correlation. An ID with all zeroes is considered invalid. This field is optional; an ID may be assigned to an ID-less profile in a later step. |
+| dropped_attributes_count | uint32 | The number of attributes that were discarded. Attributes can be discarded because their keys are too long or because there are too many attributes. If 0, no attributes were dropped. |
+| original_payload_format | string | The original payload format. See [Known values](./README.md#known-values) for allowed formats. It MUST be set together with `original_payload` or both left unset. [optional] |
+| original_payload | bytes | The original payload bytes. It MUST be set together with `original_payload_format` or both left unset. [optional] |


Do we want to explain how an original payload can end up here?

Good catch, I'll add a few lines to clarify.

tigrannajaryan · 2026-03-24T18:11:44Z

specification/profiles/data-format.md

+| ----- | ---- | ----------- |
+| key_strindex | int32 | Index into [`ProfilesDictionary.string_table`](#message-profilesdictionary). |
+| value | [`AnyValue`](../common/README.md) | The value of the attribute. |
+| unit_strindex | int32 | Index into [`ProfilesDictionary.string_table`](#message-profilesdictionary). 0 indicates implicit (by semantic convention) or undefined unit. |


Do we want to say the string is in UCUM format?

As UCUM is the standard across OTel, I'll make an explicit reference to it here.

CC: @open-telemetry/profiling-maintainers @open-telemetry/profiling-approvers

Ack, this makes sense.

Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com>

tigrannajaryan · 2026-03-25T15:08:45Z

@christos68k can you update PR title/description to use "data format" instead of "data model".

christos68k · 2026-03-25T15:26:25Z

@christos68k can you update PR title/description to use "data format" instead of "data model".

Done

tigrannajaryan · 2026-04-01T15:07:36Z

@jsuereth please take a look.

tigrannajaryan · 2026-04-02T15:03:06Z

@open-telemetry/specs-approvers PTAL

Add profiles data model

89258ef

christos68k requested review from a team as code owners March 19, 2026 13:14

This was referenced Mar 19, 2026

Update profiles specification README.md and pprof.md #4932

Open

Add documentation for Profiling open-telemetry/opentelemetry.io#7874

Closed

christos68k added 2 commits March 19, 2026 09:25

Update changelog

4bb9343

Update optional fields

a8c0996

christos68k commented Mar 19, 2026

View reviewed changes

Consistent capitalization when referring to OTLP Profiles

66b117d

tigrannajaryan reviewed Mar 19, 2026

View reviewed changes

specification/profiles/data-model.md Outdated Show resolved Hide resolved

theletterf approved these changes Mar 20, 2026

View reviewed changes

specification/profiles/data-format.md Outdated Show resolved Hide resolved

rogercoll approved these changes Mar 20, 2026

View reviewed changes

specification/profiles/data-model.md Outdated Show resolved Hide resolved

specification/profiles/data-model.md Outdated Show resolved Hide resolved

christos68k and others added 6 commits March 20, 2026 17:52

Apply suggestions from code review

0df9bb5

Co-authored-by: Roger Coll <roger.coll@elastic.co>

Rename data-model.md to data-format.md

9b20157

Rename 'data model' to 'data format'

95665dc

Minor update to wording

4bb8781

Update specification/profiles/data-format.md

68d0152

Co-authored-by: Fabrizio Ferri-Benedetti <fabri.ferribenedetti@elastic.co>

Update data-format.md link in mappings.md

f737e3e

aalexand reviewed Mar 20, 2026

View reviewed changes

aalexand approved these changes Mar 20, 2026

View reviewed changes

Address review feedback

841ad4d

Co-authored-by: Alexey Alexandrov <aalexand@users.noreply.github.com>

tigrannajaryan approved these changes Mar 24, 2026

View reviewed changes

christos68k and others added 2 commits March 24, 2026 14:57

Address review feedback

19759d2

Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com>

Add clarification for original_payload

2ce2e77

christos68k requested a review from jsuereth March 24, 2026 19:06

christos68k added 2 commits March 24, 2026 15:09

Fix wording

32f4328

Updating wording according to texlint

aea0a06

tigrannajaryan self-assigned this Mar 25, 2026

christos68k changed the title ~~Add profiles data model~~ Add profiles data format Mar 25, 2026

Minor update

2e1deb9

christos68k mentioned this pull request Mar 25, 2026

Minor documentation updates to profiles.proto open-telemetry/opentelemetry-proto#780

Open

aalexand approved these changes Apr 3, 2026

View reviewed changes


		### Message referencing

		Most OpenTelemetry signals use direct embedding: a span in a trace embeds


		## Notable differences compared to other signals

		### Message referencing


		Measurements encoded in this format should follow the following conventions:

		- Consumers should treat unset optional fields as if they had been


		### Message `Stack`

		A stack trace encoded as a list of locations (leaf first).


		### From other signals to profiles

		Other signals can reference a profile using the `profile_id` field on the

		allows other `_index` fields pointing into the dictionary to use `0` to
		indicate "null" / "not set".

Conversation

christos68k commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

christos68k Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aalexand left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

christos68k Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

christos68k commented Mar 24, 2026

Uh oh!

tigrannajaryan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

christos68k Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

christos68k commented Mar 19, 2026 •

edited

Loading

christos68k Mar 19, 2026 •

edited

Loading

christos68k Mar 21, 2026 •

edited

Loading

christos68k Mar 24, 2026 •

edited

Loading

christos68k Mar 24, 2026 •

edited

Loading

christos68k Mar 24, 2026 •

edited

Loading