Skip to content

feat(preprod): Track distribution state on PreprodArtifact#109062

Merged
runningcode merged 9 commits intomasterfrom
no/eme-842-distribution-state-tracking
Feb 26, 2026
Merged

feat(preprod): Track distribution state on PreprodArtifact#109062
runningcode merged 9 commits intomasterfrom
no/eme-842-distribution-state-tracking

Conversation

@runningcode
Copy link
Contributor

@runningcode runningcode commented Feb 23, 2026

Mirror the existing size analysis NOT_RAN pattern for build distribution. When the update endpoint evaluates distribution eligibility, it records why distribution was skipped (quota, feature disabled, filtered) so the frontend can display the reason in build details.

Changes:

  • Add installable_app_error_code and installable_app_error_message fields to PreprodArtifact
  • Add InstallableAppErrorCode enum (UNKNOWN, NO_QUOTA, SKIPPED, PROCESSING_ERROR)
  • Update endpoint evaluates should_run_distribution and records the skip reason (mirroring the should_run_size pattern)
  • DistributionInfo in the build details API exposes error_code and error_message

Scoped out (follow-up PRs):

  • Listing endpoint filters for error-code artifacts
  • Dedicated endpoint for launchpad to report distribution errors (mirroring ProjectPreprodSizeEndpoint)

Refs EME-842

@linear
Copy link

linear bot commented Feb 23, 2026

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Feb 23, 2026
@github-actions
Copy link
Contributor

This PR has a migration; here is the generated SQL for src/sentry/preprod/migrations/0027_add_distribution_state_fields.py

for 0027_add_distribution_state_fields in preprod

--
-- Add field distribution_state to preprodartifact
--
ALTER TABLE "sentry_preprodartifact" ADD COLUMN "distribution_state" integer NULL CHECK ("distribution_state" >= 0);
--
-- Add field distribution_skip_reason to preprodartifact
--
ALTER TABLE "sentry_preprodartifact" ADD COLUMN "distribution_skip_reason" varchar(32) NULL;

@runningcode runningcode marked this pull request as ready for review February 23, 2026 12:43
@runningcode runningcode requested a review from a team as a code owner February 23, 2026 12:43
@runningcode runningcode force-pushed the no/eme-842-distribution-state-tracking branch from 0986852 to ca28903 Compare February 23, 2026 15:42
@runningcode runningcode changed the base branch from master to no/eme-842-distribution-state-model February 23, 2026 15:42
runningcode added a commit that referenced this pull request Feb 24, 2026
…842) (#109075)

## Summary

- Add `installable_app_error_code` and `installable_app_error_message`
columns to `PreprodArtifact`
- Add `InstallableAppErrorCode` enum (UNKNOWN, NO_QUOTA, SKIPPED)
following the existing `ErrorCode` pattern on `PreprodArtifact`
- Migration: `0027_add_distribution_state_fields`

Uses an error-code model instead of a state field so that
`installable_app_file_id` implicitly encodes success, avoiding ambiguous
state combinations:

- `file_id` set + no error → success
- no `file_id` + error code set → skipped/failed
- no `file_id` + no error → pending / not yet determined

Split out from #109062 to land the schema change independently.

EME-842
Base automatically changed from no/eme-842-distribution-state-model to master February 24, 2026 13:47
Record PENDING/NOT_RAN/COMPLETED in the update endpoint instead of
discarding the skip reason. Set COMPLETED when installable file is
assembled. Expose state/skip_reason in DistributionInfo API response.
Exclude NOT_RAN builds from list-builds and builds endpoints. Only
transition to COMPLETED from PENDING and guard against retry overwrites.
@runningcode runningcode force-pushed the no/eme-842-distribution-state-tracking branch from ca28903 to e6abc19 Compare February 24, 2026 13:50
…(EME-842)

Replace the distribution state machine (PENDING/COMPLETED/NOT_RAN) with
error-only tracking via InstallableAppErrorCode. Success is now implicit
(installable_app_file_id is set), and we only record why distribution
was skipped (NO_QUOTA, SKIPPED, PROCESSING_ERROR).
wedamija pushed a commit that referenced this pull request Feb 24, 2026
…842) (#109075)

## Summary

- Add `installable_app_error_code` and `installable_app_error_message`
columns to `PreprodArtifact`
- Add `InstallableAppErrorCode` enum (UNKNOWN, NO_QUOTA, SKIPPED)
following the existing `ErrorCode` pattern on `PreprodArtifact`
- Migration: `0027_add_distribution_state_fields`

Uses an error-code model instead of a state field so that
`installable_app_file_id` implicitly encodes success, avoiding ambiguous
state combinations:

- `file_id` set + no error → success
- no `file_id` + error code set → skipped/failed
- no `file_id` + no error → pending / not yet determined

Split out from #109062 to land the schema change independently.

EME-842
… test (EME-842)

Clear installable_app_error_code and installable_app_error_message in
reset_artifact_data so reruns can re-evaluate distribution eligibility.

Move error fields test from the builds list endpoint (which excludes
error-code artifacts) to the build details endpoint where the fields
are actually returned.
…842)

Restructure the distribution guard so explicit error codes from
launchpad always overwrite the current state (enabling reruns to
update a previous decision), while the implicit should_run_distribution
check only runs when no decision has been made yet.

Revert the reset_artifact_data change since launchpad can now
overwrite state directly on rerun callbacks.
"installable_app_error_message",
"date_updated",
]
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can_run_distro, distro_skip_reason = and below seems correct.

requested_features.append(PreprodFeature.BUILD_DISTRIBUTION)
# Always accept explicit distribution state from launchpad so reruns
# can overwrite a previous decision.
if "installable_app_error_code" in data:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the correct endpoint to handle this.
The size one for example is handled here:
https://github.com/getsentry/sentry/blob/5205f1bcbc54415de4a8c371bea9cd754c3321e8/src/sentry/preprod/api/endpoints/project_preprod_size.py

project_preprod_artifact_update() is for the initial processing - not for uploading size/distro date or for updating the errors for the.

The flow is like:

monolith                                                        launchpad 

assemble()
task()
                                               kafka ----->  initial processing
project_preprod_artifact_update <-----  (returns req features)
                                                                       compute_size
                                                         <-----   upload size json
                                                                       compute_distro
                                                         <-----   upload distro build

for success and:

monolith                                                        launchpad 

assemble()
task()
                                               kafka ----->  initial processing
project_preprod_artifact_update <-----  (returns req features)
                                                                       compute_size
         ProjectPreprodSizeEndpoint<-----   set size error
                                                                       compute_distro
                                                         <-----   ?

for failure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My diagram looks like line noise but hopefully that makes some sense, if not lets talk.

Image

)
.prefetch_related("preprodartifactsizemetrics_set")
.filter(project_id__in=project_ids, date_added__gte=cutoff)
.exclude(installable_app_error_code__isnull=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here - I'm not sure we want to hide these

Copy link
Contributor

@chromy chromy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments, maybe we could start just with NOT_RAN and leave the other two parts:

  • updating list builds
  • adding an endpoint for updating the distro error_code / error_message

to follow up PRs? I think keeping the PRs smaller might help.

@runningcode
Copy link
Contributor Author

Thanks for the diagrams! I will just do the NOT_RAN part here and then I'll DM you if I have questions about the diagrams.

Copy link
Contributor

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable autofix in the Cursor dashboard.

"installable_app_error_message",
"date_updated",
]
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retry guard incomplete, may set erroneous error code

Medium Severity

The guard if head_artifact.installable_app_error_code is None: only prevents re-evaluation when distribution was previously skipped (error code set). When distribution was previously allowed, installable_app_error_code remains None, so on a launchpad retry the check re-runs. If conditions changed between calls (e.g., quota consumed by the first distribution request), an error code like NO_QUOTA gets written to an artifact that already had distribution initiated — creating an inconsistent state where the build details API shows both a successful distribution (is_installable=True) and an error code simultaneously.

Fix in Cursor Fix in Web

Comment on lines +439 to +448
if distro_skip_reason == "quota":
error_code = PreprodArtifact.InstallableAppErrorCode.NO_QUOTA
error_message = "Distribution quota exceeded"
elif distro_skip_reason == "disabled":
error_code = PreprodArtifact.InstallableAppErrorCode.SKIPPED
error_message = "Distribution disabled for this project"
else:
error_code = PreprodArtifact.InstallableAppErrorCode.SKIPPED
error_message = "Distribution filtered out by project settings"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: When an artifact is rerun, the reset_artifact_data function does not clear distribution-specific error fields, leaving stale error data after a successful retry.
Severity: MEDIUM

Suggested Fix

Update the reset_artifact_data function in preprod_artifact_rerun_analysis.py to also set installable_app_error_code and installable_app_error_message to None. This will ensure that stale error information is cleared before a rerun attempt.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: src/sentry/preprod/api/endpoints/project_preprod_artifact_update.py#L439-L448

Potential issue: When an artifact distribution fails and is subsequently rerun, the
function `reset_artifact_data` is called. This function clears general artifact error
fields like `error_code` and `error_message`, but it fails to clear
distribution-specific error fields, namely `installable_app_error_code` and
`installable_app_error_message`. If the rerun is successful, the distribution task sets
`installable_app_file_id`, but the stale error fields from the previous failed attempt
persist. This results in an inconsistent state where the artifact appears to have both
succeeded and failed simultaneously.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, might be good to add this to:

def reset_artifact_data(preprod_artifact: PreprodArtifact) -> None:

Remove listing endpoint filters and explicit error code handling
from the update endpoint. The update endpoint now only evaluates
distribution eligibility (should_run_distribution) and records
skip reasons. Error reporting from launchpad and listing filters
will be added in follow-up PRs via a dedicated endpoint.
Copy link
Contributor

@chromy chromy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@runningcode runningcode merged commit 35f2b20 into master Feb 26, 2026
57 checks passed
@runningcode runningcode deleted the no/eme-842-distribution-state-tracking branch February 26, 2026 15:11
runningcode added a commit that referenced this pull request Mar 5, 2026
Add a dedicated endpoint (`PUT
.../files/preprodartifacts/{id}/distribution/`) for launchpad to report
distribution processing errors back to the monolith. This mirrors the
existing `ProjectPreprodSizeEndpoint` pattern.

When launchpad encounters a distribution failure (unsupported artifact
type, invalid code signature, simulator build), it needs a way to set
`installable_app_error_code` and `installable_app_error_message` on the
artifact so the frontend can display the reason. Previously, the only
option was the general `update` endpoint which marks the entire artifact
as failed — but distribution errors shouldn't affect the artifact's
overall state.

Follow-up to #109062. The launchpad side that calls this endpoint is in
getsentry/launchpad#567.

Refs EME-842, EME-422

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: getsantry[bot] <66042841+getsantry[bot]@users.noreply.github.com>
JonasBa pushed a commit that referenced this pull request Mar 5, 2026
Add a dedicated endpoint (`PUT
.../files/preprodartifacts/{id}/distribution/`) for launchpad to report
distribution processing errors back to the monolith. This mirrors the
existing `ProjectPreprodSizeEndpoint` pattern.

When launchpad encounters a distribution failure (unsupported artifact
type, invalid code signature, simulator build), it needs a way to set
`installable_app_error_code` and `installable_app_error_message` on the
artifact so the frontend can display the reason. Previously, the only
option was the general `update` endpoint which marks the entire artifact
as failed — but distribution errors shouldn't affect the artifact's
overall state.

Follow-up to #109062. The launchpad side that calls this endpoint is in
getsentry/launchpad#567.

Refs EME-842, EME-422

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: getsantry[bot] <66042841+getsantry[bot]@users.noreply.github.com>
@github-actions github-actions bot locked and limited conversation to collaborators Mar 14, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants