Skip to content

fix(custom-resources): waiter state machine retry fails with ExecutionAlreadyExists#35988

Merged
mergify[bot] merged 23 commits intoaws:mainfrom
newlinedeveloper:fix/custom-resources-waiter-retry-execution-name
Dec 11, 2025
Merged

fix(custom-resources): waiter state machine retry fails with ExecutionAlreadyExists#35988
mergify[bot] merged 23 commits intoaws:mainfrom
newlinedeveloper:fix/custom-resources-waiter-retry-execution-name

Conversation

@newlinedeveloper
Copy link
Copy Markdown
Contributor

Description

Fixes an issue where retrying a CloudFormation deployment that uses a custom resource with an async waiter fails with ExecutionAlreadyExists error.

Root Cause

The custom resource provider framework uses CloudFormation's RequestId as the Step Functions execution name when starting the waiter state machine. When CloudFormation retries a failed deployment, it reuses the same RequestId. Since Step Functions execution names must be unique for 90 days, subsequent retry attempts fail with ExecutionAlreadyExists.

Solution

Removed the name parameter from the startExecution call, allowing Step Functions to auto-generate unique execution names. This is the recommended approach per the AWS Step Functions StartExecution API Reference, where the name parameter is optional and Step Functions will automatically generate a universally unique identifier (UUID) as the execution name if not provided.

Changes

  • Removed name: resourceEvent.RequestId from the waiter state machine execution call in framework.ts
  • Updated log statement to remove the name field
  • Added unit test to verify that name is not included in the startExecution call

Testing

  • Added unit test waiter state machine execution does not include name field (allows retries) to verify the fix
  • All existing unit tests pass
  • Verified that the mock assertion checks for name being undefined

Related Issue

Fixes #35957

Verification

The fix was verified by:

  1. Running unit tests to ensure the name field is not included
  2. Confirming that existing tests continue to pass
  3. The change aligns with AWS Step Functions best practices for execution naming

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@github-actions github-actions bot added beginning-contributor [Pilot] contributed between 0-2 PRs to the CDK bug This issue is a bug. effort/medium Medium work item – several days of effort p1 labels Nov 8, 2025
@aws-cdk-automation aws-cdk-automation requested a review from a team November 8, 2025 11:14
Copy link
Copy Markdown
Collaborator

@aws-cdk-automation aws-cdk-automation left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This review is outdated)

@newlinedeveloper
Copy link
Copy Markdown
Contributor Author

Exemption Request

This fix is in runtime code (Lambda function execution) and does not change CloudFormation templates or infrastructure. The existing integration tests verify infrastructure creation, which is unaffected by this change. Unit tests provide comprehensive coverage of the runtime behavior change.

@aws-cdk-automation aws-cdk-automation added the pr-linter/exemption-requested The contributor has requested an exemption to the PR Linter feedback. label Nov 8, 2025
@newlinedeveloper newlinedeveloper force-pushed the fix/custom-resources-waiter-retry-execution-name branch from 2a0d935 to 6d329d8 Compare November 8, 2025 13:29
@vvigilante
Copy link
Copy Markdown

alternatively we could forward the request id from the lambda. That should never repeat.

@Abogical Abogical self-assigned this Nov 12, 2025
@Abogical
Copy link
Copy Markdown
Member

Abogical commented Nov 13, 2025

I have confirmed that this PR fixes the issue.

@Abogical Abogical added pr-linter/exempt-integ-test The PR linter will not require integ test changes pr-linter/exemption-requested The contributor has requested an exemption to the PR Linter feedback. and removed pr-linter/exemption-requested The contributor has requested an exemption to the PR Linter feedback. labels Nov 13, 2025
@aws-cdk-automation aws-cdk-automation dismissed their stale review November 13, 2025 14:57

✅ Updated pull request passes all PRLinter validations. Dismissing previous PRLinter review.

@mrgrain
Copy link
Copy Markdown
Contributor

mrgrain commented Nov 17, 2025

Integration test failure are expected due to the changed asset. They are not caused by the new integ-runner engine. You'll need to work with your PR reviewer to update all snapshots. For framework changes like this, I'd typically recommend that a CDK team member is doing this for you.

@newlinedeveloper
Copy link
Copy Markdown
Contributor Author

Hi @Abogical @pahud , Need review and approval for this PR to be closed . Thanks

Copy link
Copy Markdown
Member

@Abogical Abogical left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the linter has passed but you'll need to update the integ test snapshots. See https://github.com/aws/aws-cdk/blob/main/INTEGRATION_TESTS.md#running-integration-tests

yarn integ-runner --directory packages/@aws-cdk --update-on-failed

@Abogical Abogical had a problem deploying to deployment-integ-test December 10, 2025 12:49 — with GitHub Actions Failure
@Abogical
Copy link
Copy Markdown
Member

I've updated the snapshots to your PR branch directly.

@Abogical
Copy link
Copy Markdown
Member

@newlinedeveloper
There are other snapshots to be uploaded but I don't have access to push LFS files to your fork. Can you merge this PR which will push the changes to your fork? newlinedeveloper#1

@aws-cdk-automation aws-cdk-automation added the pr/needs-maintainer-review This PR needs a review from a Core Team Member label Dec 10, 2025
@Abogical Abogical had a problem deploying to deployment-integ-test December 10, 2025 16:07 — with GitHub Actions Failure
Abogical
Abogical previously approved these changes Dec 10, 2025
@aws-cdk-automation aws-cdk-automation removed the pr/needs-maintainer-review This PR needs a review from a Core Team Member label Dec 10, 2025
@mergify mergify bot dismissed Abogical’s stale review December 11, 2025 10:05

Pull request has been modified.

@aws-cdk-automation aws-cdk-automation added the pr/needs-maintainer-review This PR needs a review from a Core Team Member label Dec 11, 2025
@Abogical Abogical removed the pr/needs-integration-tests-deployment Requires the PR to deploy the integration test snapshots. label Dec 11, 2025
Copy link
Copy Markdown
Member

@Abogical Abogical left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, only 2 snapshots failed to deploy with the new snapshot changes by the Gtihub workflow. Deploying them locally however works:

  • packages/@aws-cdk-testing/framework-integ/test/aws-codebuild/test/integ.project-fleet.js
  • packages/@aws-cdk-testing/framework-integ/test/aws-dynamodb/test/integ.global.js

@aws-cdk-automation aws-cdk-automation removed the pr/needs-maintainer-review This PR needs a review from a Core Team Member label Dec 11, 2025
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Dec 11, 2025

Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@mergify mergify bot merged commit 36ea606 into aws:main Dec 11, 2025
40 of 46 checks passed
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Dec 11, 2025

Merge Queue Status

✅ The pull request has been merged at 551220d

This pull request spent 5 seconds in the queue, with no time running CI.
The checks were run in-place.

Required conditions to merge

@github-actions
Copy link
Copy Markdown
Contributor

Comments on closed issues and PRs are hard for our team to see.
If you need help, please open a new issue that references this one.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 11, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

beginning-contributor [Pilot] contributed between 0-2 PRs to the CDK bug This issue is a bug. effort/medium Medium work item – several days of effort p1 pr-linter/exempt-integ-test The PR linter will not require integ test changes pr-linter/exemption-requested The contributor has requested an exemption to the PR Linter feedback.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CustomResource Provider: WaiterStateMachine can't start when stack deployment is retried

6 participants