feat(occurrences): New ingest by thetruecpaul · Pull Request #109180 · getsentry/sentry

thetruecpaul · 2026-02-24T11:05:20Z

This PR updates the ingest flow for Occurrences on EAP.

CHANGES:

Moves from a "everything in event_data" model to an allowlist model.
Unifies tags & contexts into attrs.
Processes exceptions in a series of related arrays, rather than as a thicket of nested mappings and arrays

SEE ALSO:

Search issues processor: https://github.com/getsentry/snuba/blob/master/snuba/datasets/processors/search_issues_processor.py
Errors processor: https://github.com/getsentry/snuba/blob/bc66a617af47de8e0fe573c2ac17c9f378c523c1/rust_snuba/src/processors/errors.rs#L32

I tried to match the existing behavior of those processors as closely as possible.

TEST PLAN:

pytest -s tests/snuba/search/test_eap_occurrences.py

src/sentry/eventstream/item_helpers.py

shashjar

Mostly reviewed the high-level approach so far, and left some comments. I think the main point of open discussion I have is what to do with None values - I'm leaning towards omitting rather than stringifying them, which would also require some updates in the processors (mainly validating whether fields/values exist before encoding them into the result data). After we discuss these points more I can come back to review the processor changes.

shashjar · 2026-02-24T18:05:59Z

src/sentry/eventstream/item_helpers.py

+    encoded_data: dict[str, AnyValue] = {}
+
+    # 1: ALLOWLIST OF DIRECT COPIES
+    simple_allowlist = {


Should we define a top-level constant for this allowlist?

Also, event_id was previously in the ignore list — is there a reason we're including it now?

shashjar · 2026-02-24T18:09:00Z

src/sentry/eventstream/item_helpers.py

+def _encode_attributes(
+    event: Event | GroupEvent,
+    event_data: Mapping[str, Any],
+    ignore_fields: set[str] | None = None,


Do we actually still need the ability to explicitly ignore some fields by using this parameter? If not, we may be able to simplify this and just remove ignore_fields altogether and update the tests to match

shashjar · 2026-02-24T18:11:44Z

src/sentry/eventstream/item_helpers.py

        attributes=_encode_attributes(
-            event, event_data, ignore_fields={"event_id", "timestamp", "tags", "spans", "'spans'"}
+            event,
+            event_data,


nit: could we move _encode_attributes / _build_occurrence_attributes / _encode_value / _encode_value_recursive above all the extract helpers in this file? up to you but i think that's more readable to me

shashjar · 2026-02-24T18:13:16Z

src/sentry/eventstream/item_helpers.py

-def _encode_attributes(
-    event: Event | GroupEvent, event_data: Mapping[str, Any], ignore_fields: set[str] | None = None
+def _extract_from_event(
+    event: Event | GroupEvent, event_data: Mapping[str, Any]


event_data param not used in the _extract_from_event function, can be removed

shashjar · 2026-02-24T18:16:45Z

src/sentry/eventstream/item_helpers.py

+            encoded_data[key] = _encode_value(event_data[key])
+
+    # 2: SIMPLE RENAMES FROM EXISTING DATA
+    renames: tuple[tuple[str, str], ...] = (  # tuple of old, new pairs


this might be good to extract out into a constant as well

also: to confirm my understanding, these 3 keys are also effectively in the allowlist, they're just processed separately since they need the renaming step?

shashjar · 2026-02-24T18:20:25Z

src/sentry/eventstream/item_helpers.py

        )
+    elif value is None:
+        return AnyValue(
+            string_value="None"


Based on Pierre's response that EAP will not natively support null values, I wonder if we should just prevent None values from being encoded across the board

So, the problem here is primarily the exception arrays. Exceptions in the errors Snuba table today (link) are — essentially — maps of standard keys to nullable values.

We can't do maps in EAP. So thought it would be good to instead encode these as equally-sized arrays (so the item at index N is the same frame across all the arrays).

... but that runs into a problem when we can't encode a null value.

Shouldn't we just skip the key entirely if the value is none?

shashjar · 2026-02-24T18:24:32Z

src/sentry/eventstream/item_helpers.py

@@ -1,6 +1,8 @@
-from collections.abc import Mapping
+import ipaddress


Looks like some tests in tests/sentry/eventstream/test_item_helpers.py need to be updated, and is it possible to add some additional coverage for _encode_value_recursive and the processors?

Yup, that's what I'm working on this morning.

shashjar · 2026-02-24T18:26:58Z

src/sentry/eventstream/item_helpers.py

+    if isinstance(value, Mapping):
+        out: dict[str, AnyValue] = {}
+        for subkey, subvalue in value.items():
+            out.update(_encode_value_recursive(".".join([key, subkey]), subvalue))


Same as warden comment: looks like we call this function as _encode_value_recursive("", contexts) from _extract_tags_and_contexts. So there will end up being a leading dot since the outermost key is considered to be ""

shashjar · 2026-02-24T18:28:39Z

src/sentry/eventstream/item_helpers.py

If we decide we want to consider None a real value and not ignore it, I wonder if we should do that across the board and remove this filter in the list/tuple case and also below in the dict case

shashjar · 2026-02-24T18:36:06Z

src/sentry/eventstream/item_helpers.py

+    out = {}
+    if event.group_id:
+        out["group_id"] = _encode_value(event.group_id)
+    if isinstance(event, GroupEvent):


Should we verify that event.group.first_seen.timestamp() exists here as well?

shashjar · 2026-02-24T19:25:37Z

src/sentry/eventstream/item_helpers.py

+def _extract_time_data(event_data: Mapping[str, Any]) -> Mapping[str, AnyValue]:
+    if "timestamp" not in event_data:
+        return {}
+    return {"timestamp_ms": _encode_value(event_data["timestamp"] * 1000)}


Is the timestamp field an int or double in EAP? Do we need to do any casting here?

shashjar · 2026-02-24T19:26:41Z

src/sentry/eventstream/item_helpers.py

+def _extract_tags_and_contexts(event_data: Mapping[str, Any]) -> Mapping[str, AnyValue]:
+    # These may be overwritten by promoted tags.
+    out = {
+        "release": _encode_value(event_data.get("release")),


I see "release" is both here and in the simple allowlist - I think one can be removed?

shashjar · 2026-02-24T19:35:08Z

src/sentry/eventstream/item_helpers.py

+                pass
+
+        out["user_email"] = _encode_value(user_data["email"])
+        out["user_id"] = _encode_value(user_data["user_id"])


Are you sure this key is "user_id"? I see it defined as "id" in src/sentry/interfaces/user.py

Based off of https://github.com/getsentry/snuba/blob/master/snuba/datasets/processors/search_issues_processor.py#L127

src/sentry/eventstream/item_helpers.py

cursor

Cursor Bugbot has reviewed your changes and found 3 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

src/sentry/eventstream/item_helpers.py

cursor · 2026-02-27T17:38:03Z

src/sentry/eventstream/item_helpers.py

+        "level",
+        "resource_id",
+        "message",
+        "release",


Duplicate "release" in allowlist and processor

Low Severity

"release" appears in the simple_allowlist at step 1 and is also set by _extract_tags_and_contexts at step 3. The processor's value always overwrites the allowlist value via encoded_data.update(), making the allowlist entry redundant. One of these two sources for release can be removed.

Additional Locations (1)

src/sentry/eventstream/item_helpers.py#L164-L165

src/sentry/eventstream/item_helpers.py

shashjar

I think some of the bot review comments are valid but other than that LGTM!

wedamija

I want to make sure that this change won't break any attributes sent from here:

sentry/src/sentry/monitors/logic/incident_occurrence.py

Lines 203 to 221 in 278feca

    
           event_data = { 
        
               "contexts": {"monitor": get_monitor_environment_context(monitor_env)}, 
        
               "environment": monitor_env.get_environment().name, 
        
               "event_id": occurrence.event_id, 
        
               "fingerprint": [incident.grouphash], 
        
               "platform": "other", 
        
               "project_id": monitor_env.monitor.project_id, 
        
               # This is typically the time that the checkin that triggered the 
        
               # occurrence was written to relay, otherwise it is when we detected a 
        
               # missed or timeout. 
        
               "received": received.isoformat(), 
        
               "sdk": None, 
        
               "tags": { 
        
                   "monitor.id": str(monitor_env.monitor.guid), 
        
                   "monitor.slug": str(monitor_env.monitor.slug), 
        
                   "monitor.incident": str(incident.id), 
        
               }, 
        
               "timestamp": current_timestamp.isoformat(), 
        
           }

wedamija · 2026-03-02T21:00:15Z

src/sentry/eventstream/item_helpers.py

    if event.group_id:
-        attributes["group_id"] = AnyValue(int_value=event.group_id)
+        out["group_id"] = event.group_id
+    if isinstance(event, GroupEvent):


Both Event and GroupEvent should have group available

wedamija · 2026-03-02T21:07:05Z

src/sentry/eventstream/item_helpers.py

+    promotions = {
+        "sentry:release": "release",
+        "environment": "environment",
+        "sentry:user": "user",
+        "sentry:dist": "dist",
+        "profile.profile_id": "profile_id",
+        "replay.replay_id": "replay_id",
    }


These used to be explicit columns in the errors table, how is this promotion concept handled in EAP?

wedamija · 2026-03-02T21:14:55Z

src/sentry/eventstream/item_helpers.py

        )
+    elif value is None:
+        return AnyValue(
+            string_value="None"


Shouldn't we just skip the key entirely if the value is none?

src/sentry/eventstream/item_helpers.py

github-actions · 2026-03-10T15:58:27Z

Backend Test Failures

Failures on 68a3d2b in this run:

tests/sentry/profiles/test_task.py::DeobfuscationViaSymbolicator::test_inline_resolving — log

tests/sentry/profiles/test_task.py:683: in test_inline_resolving
    assert android_profile["profile"]["methods"] == [
E   AssertionError: assert [{'class_name...andler', ...}] == [{'class_name...andler', ...}]
E     
E     At index 0 diff: {'class_name': 'io.sentry.sample.-$$Lambda$r3Avcbztes2hicEObh02jjhQqd4', 'name': 'onClick', 'signature': '()', 'source_file': '-.java', 'source_line': 2, 'data': {'deobfuscation_status': 'deobfuscated'}} != {'class_name': 'io.sentry.sample.-$$Lambda$r3Avcbztes2hicEObh02jjhQqd4', 'data': {'deobfuscation_status': 'deobfuscated'}, 'name': 'onClick', 'signature': '()', 'source_file': None, 'source_line': 2}
E     
E     Full diff:
E       [
E           {
E               'class_name': 'io.sentry.sample.-$$Lambda$r3Avcbztes2hicEObh02jjhQqd4',
E               'data': {
E                   'deobfuscation_status': 'deobfuscated',
E               },
E               'name': 'onClick',
E               'signature': '()',
E     -         'source_file': None,
E     ?                        ^^^^
E     +         'source_file': '-.java',
E     ?                        ^^^^^^^^
E               'source_line': 2,
E           },
E           {
E               'class_name': 'io.sentry.sample.MainActivity',
E               'data': {
E                   'deobfuscation_status': 'deobfuscated',
E               },
E               'inline_frames': [
E                   {
E                       'class_name': 'io.sentry.sample.MainActivity',
E                       'data': {
E                           'deobfuscation_status': 'deobfuscated',
E                       },
E                       'name': 'onClickHandler',
E                       'signature': '()',
E                       'source_file': 'MainActivity.java',
E                       'source_line': 40,
E                   },
E                   {
E                       'class_name': 'io.sentry.sample.MainActivity',
E                       'data': {
E                           'deobfuscation_status': 'deobfuscated',
E                       },
E                       'name': 'foo',
E                       'signature': '()',
E                       'source_file': 'MainActivity.java',
E                       'source_line': 44,
E                   },
E                   {
E                       'class_name': 'io.sentry.sample.MainActivity',
E                       'data': {
... (14 more lines)

tests/sentry/profiles/test_task.py::DeobfuscationViaSymbolicator::test_basic_resolving — log

tests/sentry/profiles/test_task.py:627: in test_basic_resolving
    assert android_profile["profile"]["methods"] == [
E   AssertionError: assert [{'class_name...oolean', ...}] == [{'class_name...oolean', ...}]
E     
E     At index 0 diff: {'class_name': 'org.slf4j.helpers.Util$ClassContextSecurityManager', 'name': 'getClassContext', 'signature': '()', 'source_file': 'Util.java', 'source_line': 67, 'data': {'deobfuscation_status': 'deobfuscated'}} != {'data': {'deobfuscation_status': 'deobfuscated'}, 'name': 'getClassContext', 'class_name': 'org.slf4j.helpers.Util$ClassContextSecurityManager', 'signature': '()', 'source_file': 'Something.java', 'source_line': 67}
E     
E     Full diff:
E       [
E           {
E               'class_name': 'org.slf4j.helpers.Util$ClassContextSecurityManager',
E               'data': {
E                   'deobfuscation_status': 'deobfuscated',
E               },
E               'name': 'getClassContext',
E               'signature': '()',
E     -         'source_file': 'Something.java',
E     ?                         ^^^^ - ^^
E     +         'source_file': 'Util.java',
E     ?                         ^  ^
E               'source_line': 67,
E           },
E           {
E               'class_name': 'org.slf4j.helpers.Util$ClassContextSecurityManager',
E               'data': {
E                   'deobfuscation_status': 'deobfuscated',
E               },
E               'name': 'getExtraClassContext',
E               'signature': '(): boolean',
E     -         'source_file': 'Else.java',
E     ?                         ^ --
E     +         'source_file': 'Util.java',
E     ?                         ^^^
E               'source_line': 69,
E           },
E       ]

This PR updates the ingest flow for Occurrences on EAP. CHANGES: * Moves from a "everything in event_data" model to an allowlist model. * Unifies tags & contexts into attrs. * Processes exceptions in a series of related arrays, rather than as a thicket of nested mappings and arrays SEE ALSO: * Search issues processor: https://github.com/getsentry/snuba/blob/master/snuba/datasets/processors/search_issues_processor.py * Errors processor: https://github.com/getsentry/snuba/blob/bc66a617af47de8e0fe573c2ac17c9f378c523c1/rust_snuba/src/processors/errors.rs#L32 I tried to match the existing behavior of those processors as closely as possible. TEST PLAN: ``` pytest -s tests/snuba/search/test_eap_occurrences.py ```

thetruecpaul requested a review from a team February 24, 2026 11:05

thetruecpaul requested a review from a team as a code owner February 24, 2026 11:05

github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Feb 24, 2026

vercel bot deployed to Preview February 24, 2026 11:07 View deployment

sentry bot reviewed Feb 24, 2026

View reviewed changes

src/sentry/eventstream/item_helpers.py Outdated Show resolved Hide resolved

src/sentry/eventstream/item_helpers.py Outdated Show resolved Hide resolved

sentry-warden bot reviewed Feb 24, 2026

View reviewed changes

src/sentry/eventstream/item_helpers.py Outdated Show resolved Hide resolved

cursor bot reviewed Feb 24, 2026

View reviewed changes

src/sentry/eventstream/item_helpers.py Outdated Show resolved Hide resolved

src/sentry/eventstream/item_helpers.py Outdated Show resolved Hide resolved

src/sentry/eventstream/item_helpers.py Outdated Show resolved Hide resolved

src/sentry/eventstream/item_helpers.py Show resolved Hide resolved

shashjar reviewed Feb 24, 2026

View reviewed changes

thetruecpaul force-pushed the cpaul/eap_occurrences/ingest_v_2 branch from 2b79279 to bd77a8f Compare February 27, 2026 17:29

thetruecpaul requested a review from a team as a code owner February 27, 2026 17:29

vercel bot deployed to Preview February 27, 2026 17:32 View deployment

sentry bot reviewed Feb 27, 2026

View reviewed changes

src/sentry/eventstream/item_helpers.py Outdated Show resolved Hide resolved

cursor bot reviewed Feb 27, 2026

View reviewed changes

thetruecpaul force-pushed the cpaul/eap_occurrences/ingest_v_2 branch 2 times, most recently from 846aeec to aaf7a95 Compare February 27, 2026 17:49

vercel bot deployed to Preview February 27, 2026 17:52 View deployment

sentry-warden bot reviewed Feb 27, 2026

View reviewed changes

src/sentry/eventstream/item_helpers.py Show resolved Hide resolved

shashjar approved these changes Feb 27, 2026

View reviewed changes

wedamija reviewed Mar 2, 2026

View reviewed changes

thetruecpaul force-pushed the cpaul/eap_occurrences/ingest_v_2 branch from aaf7a95 to 5f70eb0 Compare March 10, 2026 15:35

vercel bot deployed to Preview March 10, 2026 15:38 View deployment

sentry bot reviewed Mar 10, 2026

View reviewed changes

src/sentry/eventstream/item_helpers.py Show resolved Hide resolved

src/sentry/eventstream/item_helpers.py Show resolved Hide resolved

thetruecpaul force-pushed the cpaul/eap_occurrences/ingest_v_2 branch from 5f70eb0 to fabc1bd Compare March 10, 2026 17:45

vercel bot deployed to Preview March 10, 2026 17:49 View deployment

thetruecpaul force-pushed the cpaul/eap_occurrences/ingest_v_2 branch from fabc1bd to a9a11cd Compare March 10, 2026 18:02

vercel bot deployed to Preview March 10, 2026 18:05 View deployment

thetruecpaul merged commit e2b45a5 into master Mar 10, 2026
76 checks passed

thetruecpaul deleted the cpaul/eap_occurrences/ingest_v_2 branch March 10, 2026 18:32

sentry-release-bot bot mentioned this pull request Mar 15, 2026

publish: getsentry/sentry@26.3.0 getsentry/publish#7450

Closed

3 tasks

		@@ -1,6 +1,8 @@
		from collections.abc import Mapping
		import ipaddress

	event_data = {
	"contexts": {"monitor": get_monitor_environment_context(monitor_env)},
	"environment": monitor_env.get_environment().name,
	"event_id": occurrence.event_id,
	"fingerprint": [incident.grouphash],
	"platform": "other",
	"project_id": monitor_env.monitor.project_id,
	# This is typically the time that the checkin that triggered the
	# occurrence was written to relay, otherwise it is when we detected a
	# missed or timeout.
	"received": received.isoformat(),
	"sdk": None,
	"tags": {
	"monitor.id": str(monitor_env.monitor.guid),
	"monitor.slug": str(monitor_env.monitor.slug),
	"monitor.incident": str(incident.id),
	},
	"timestamp": current_timestamp.isoformat(),
	}

Uh oh!

Conversation

thetruecpaul commented Feb 24, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shashjar left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cursor bot Feb 27, 2026

Choose a reason for hiding this comment

Duplicate "release" in allowlist and processor

Uh oh!

Uh oh!

shashjar left a comment

Choose a reason for hiding this comment

Uh oh!

wedamija left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Backend Test Failures

github-actions bot commented Mar 10, 2026 •

edited

Loading