Conversation
shashjar
left a comment
There was a problem hiding this comment.
Mostly reviewed the high-level approach so far, and left some comments. I think the main point of open discussion I have is what to do with None values - I'm leaning towards omitting rather than stringifying them, which would also require some updates in the processors (mainly validating whether fields/values exist before encoding them into the result data). After we discuss these points more I can come back to review the processor changes.
| encoded_data: dict[str, AnyValue] = {} | ||
|
|
||
| # 1: ALLOWLIST OF DIRECT COPIES | ||
| simple_allowlist = { |
There was a problem hiding this comment.
Should we define a top-level constant for this allowlist?
Also, event_id was previously in the ignore list — is there a reason we're including it now?
| def _encode_attributes( | ||
| event: Event | GroupEvent, | ||
| event_data: Mapping[str, Any], | ||
| ignore_fields: set[str] | None = None, |
There was a problem hiding this comment.
Do we actually still need the ability to explicitly ignore some fields by using this parameter? If not, we may be able to simplify this and just remove ignore_fields altogether and update the tests to match
| attributes=_encode_attributes( | ||
| event, event_data, ignore_fields={"event_id", "timestamp", "tags", "spans", "'spans'"} | ||
| event, | ||
| event_data, |
There was a problem hiding this comment.
nit: could we move _encode_attributes / _build_occurrence_attributes / _encode_value / _encode_value_recursive above all the extract helpers in this file? up to you but i think that's more readable to me
| def _encode_attributes( | ||
| event: Event | GroupEvent, event_data: Mapping[str, Any], ignore_fields: set[str] | None = None | ||
| def _extract_from_event( | ||
| event: Event | GroupEvent, event_data: Mapping[str, Any] |
There was a problem hiding this comment.
event_data param not used in the _extract_from_event function, can be removed
| encoded_data[key] = _encode_value(event_data[key]) | ||
|
|
||
| # 2: SIMPLE RENAMES FROM EXISTING DATA | ||
| renames: tuple[tuple[str, str], ...] = ( # tuple of old, new pairs |
There was a problem hiding this comment.
this might be good to extract out into a constant as well
There was a problem hiding this comment.
also: to confirm my understanding, these 3 keys are also effectively in the allowlist, they're just processed separately since they need the renaming step?
| ) | ||
| elif value is None: | ||
| return AnyValue( | ||
| string_value="None" |
There was a problem hiding this comment.
Based on Pierre's response that EAP will not natively support null values, I wonder if we should just prevent None values from being encoded across the board
There was a problem hiding this comment.
So, the problem here is primarily the exception arrays. Exceptions in the errors Snuba table today (link) are — essentially — maps of standard keys to nullable values.
We can't do maps in EAP. So thought it would be good to instead encode these as equally-sized arrays (so the item at index N is the same frame across all the arrays).
... but that runs into a problem when we can't encode a null value.
There was a problem hiding this comment.
Shouldn't we just skip the key entirely if the value is none?
| @@ -1,6 +1,8 @@ | |||
| from collections.abc import Mapping | |||
| import ipaddress | |||
There was a problem hiding this comment.
Looks like some tests in tests/sentry/eventstream/test_item_helpers.py need to be updated, and is it possible to add some additional coverage for _encode_value_recursive and the processors?
There was a problem hiding this comment.
Yup, that's what I'm working on this morning.
| if isinstance(value, Mapping): | ||
| out: dict[str, AnyValue] = {} | ||
| for subkey, subvalue in value.items(): | ||
| out.update(_encode_value_recursive(".".join([key, subkey]), subvalue)) |
There was a problem hiding this comment.
Same as warden comment: looks like we call this function as _encode_value_recursive("", contexts) from _extract_tags_and_contexts. So there will end up being a leading dot since the outermost key is considered to be ""
There was a problem hiding this comment.
If we decide we want to consider None a real value and not ignore it, I wonder if we should do that across the board and remove this filter in the list/tuple case and also below in the dict case
| out = {} | ||
| if event.group_id: | ||
| out["group_id"] = _encode_value(event.group_id) | ||
| if isinstance(event, GroupEvent): |
There was a problem hiding this comment.
Should we verify that event.group.first_seen.timestamp() exists here as well?
| def _extract_time_data(event_data: Mapping[str, Any]) -> Mapping[str, AnyValue]: | ||
| if "timestamp" not in event_data: | ||
| return {} | ||
| return {"timestamp_ms": _encode_value(event_data["timestamp"] * 1000)} |
There was a problem hiding this comment.
Is the timestamp field an int or double in EAP? Do we need to do any casting here?
| def _extract_tags_and_contexts(event_data: Mapping[str, Any]) -> Mapping[str, AnyValue]: | ||
| # These may be overwritten by promoted tags. | ||
| out = { | ||
| "release": _encode_value(event_data.get("release")), |
There was a problem hiding this comment.
I see "release" is both here and in the simple allowlist - I think one can be removed?
| pass | ||
|
|
||
| out["user_email"] = _encode_value(user_data["email"]) | ||
| out["user_id"] = _encode_value(user_data["user_id"]) |
There was a problem hiding this comment.
Are you sure this key is "user_id"? I see it defined as "id" in src/sentry/interfaces/user.py
There was a problem hiding this comment.
2b79279 to
bd77a8f
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| "level", | ||
| "resource_id", | ||
| "message", | ||
| "release", |
There was a problem hiding this comment.
Duplicate "release" in allowlist and processor
Low Severity
"release" appears in the simple_allowlist at step 1 and is also set by _extract_tags_and_contexts at step 3. The processor's value always overwrites the allowlist value via encoded_data.update(), making the allowlist entry redundant. One of these two sources for release can be removed.
Additional Locations (1)
846aeec to
aaf7a95
Compare
shashjar
left a comment
There was a problem hiding this comment.
I think some of the bot review comments are valid but other than that LGTM!
wedamija
left a comment
There was a problem hiding this comment.
I want to make sure that this change won't break any attributes sent from here:
sentry/src/sentry/monitors/logic/incident_occurrence.py
Lines 203 to 221 in 278feca
| if event.group_id: | ||
| attributes["group_id"] = AnyValue(int_value=event.group_id) | ||
| out["group_id"] = event.group_id | ||
| if isinstance(event, GroupEvent): |
There was a problem hiding this comment.
Both Event and GroupEvent should have group available
| promotions = { | ||
| "sentry:release": "release", | ||
| "environment": "environment", | ||
| "sentry:user": "user", | ||
| "sentry:dist": "dist", | ||
| "profile.profile_id": "profile_id", | ||
| "replay.replay_id": "replay_id", | ||
| } |
There was a problem hiding this comment.
These used to be explicit columns in the errors table, how is this promotion concept handled in EAP?
| ) | ||
| elif value is None: | ||
| return AnyValue( | ||
| string_value="None" |
There was a problem hiding this comment.
Shouldn't we just skip the key entirely if the value is none?
aaf7a95 to
5f70eb0
Compare
Backend Test FailuresFailures on
|
5f70eb0 to
fabc1bd
Compare
This PR updates the ingest flow for Occurrences on EAP. CHANGES: * Moves from a "everything in event_data" model to an allowlist model. * Unifies tags & contexts into attrs. * Processes exceptions in a series of related arrays, rather than as a thicket of nested mappings and arrays SEE ALSO: * Search issues processor: https://github.com/getsentry/snuba/blob/master/snuba/datasets/processors/search_issues_processor.py * Errors processor: https://github.com/getsentry/snuba/blob/bc66a617af47de8e0fe573c2ac17c9f378c523c1/rust_snuba/src/processors/errors.rs#L32 I tried to match the existing behavior of those processors as closely as possible. TEST PLAN: ``` pytest -s tests/snuba/search/test_eap_occurrences.py ```
fabc1bd to
a9a11cd
Compare


This PR updates the ingest flow for Occurrences on EAP.
CHANGES:
SEE ALSO:
I tried to match the existing behavior of those processors as closely as possible.
TEST PLAN: