fix(custom-resources): waiter state machine retry fails with ExecutionAlreadyExists#35988
Conversation
|
Exemption Request This fix is in runtime code (Lambda function execution) and does not change CloudFormation templates or infrastructure. The existing integration tests verify infrastructure creation, which is unaffected by this change. Unit tests provide comprehensive coverage of the runtime behavior change. |
2a0d935 to
6d329d8
Compare
|
alternatively we could forward the request id from the lambda. That should never repeat. |
packages/aws-cdk-lib/custom-resources/test/provider-framework/runtime.test.ts
Outdated
Show resolved
Hide resolved
|
I have confirmed that this PR fixes the issue. |
✅ Updated pull request passes all PRLinter validations. Dismissing previous PRLinter review.
…ustom-resources-waiter-retry-execution-name
Pull request has been modified.
|
Integration test failure are expected due to the changed asset. They are not caused by the new integ-runner engine. You'll need to work with your PR reviewer to update all snapshots. For framework changes like this, I'd typically recommend that a CDK team member is doing this for you. |
…ustom-resources-waiter-retry-execution-name
Abogical
left a comment
There was a problem hiding this comment.
the linter has passed but you'll need to update the integ test snapshots. See https://github.com/aws/aws-cdk/blob/main/INTEGRATION_TESTS.md#running-integration-tests
yarn integ-runner --directory packages/@aws-cdk --update-on-failed
|
I've updated the snapshots to your PR branch directly. |
|
@newlinedeveloper |
update snapshots again
Pull request has been modified.
There was a problem hiding this comment.
For the record, only 2 snapshots failed to deploy with the new snapshot changes by the Gtihub workflow. Deploying them locally however works:
packages/@aws-cdk-testing/framework-integ/test/aws-codebuild/test/integ.project-fleet.jspackages/@aws-cdk-testing/framework-integ/test/aws-dynamodb/test/integ.global.js
|
Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork). |
Merge Queue Status✅ The pull request has been merged at 551220d This pull request spent 5 seconds in the queue, with no time running CI. Required conditions to merge
|
|
Comments on closed issues and PRs are hard for our team to see. |
Description
Fixes an issue where retrying a CloudFormation deployment that uses a custom resource with an async waiter fails with
ExecutionAlreadyExistserror.Root Cause
The custom resource provider framework uses CloudFormation's
RequestIdas the Step Functions execution name when starting the waiter state machine. When CloudFormation retries a failed deployment, it reuses the sameRequestId. Since Step Functions execution names must be unique for 90 days, subsequent retry attempts fail withExecutionAlreadyExists.Solution
Removed the
nameparameter from thestartExecutioncall, allowing Step Functions to auto-generate unique execution names. This is the recommended approach per the AWS Step Functions StartExecution API Reference, where thenameparameter is optional and Step Functions will automatically generate a universally unique identifier (UUID) as the execution name if not provided.Changes
name: resourceEvent.RequestIdfrom the waiter state machine execution call inframework.tsnamefieldnameis not included in thestartExecutioncallTesting
waiter state machine execution does not include name field (allows retries)to verify the fixnamebeing undefinedRelated Issue
Fixes #35957
Verification
The fix was verified by:
namefield is not includedBy submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license