-
Notifications
You must be signed in to change notification settings - Fork 4.5k
aws-s3-deployment: unexpected file tampering where a multiline JSON file gets manipulated to a single line after being uploaded to S3 #35050
Description
Describe the bug
We have a CDK package which leverages aws-s3-deployment to upload local files (each of which has multiline JSON representing an EMR cluster template) to a predefined S3 location. For each uploaded file, we expect that the content in S3 is identical to what's on the local disk.
What's the problem
We compute the md5 value for a local file, and then compare it against the Etag value of the corresponding remote file that's uploaded to S3 by aws-s3-deployment. We've never found a mismatch in the past 3 years until we consume a recent version (v2.185.0) of aws-cdk-lib.
Root cause
In 2025, there were non-backward compatible changes made to aws-s3-deployment/bucket-deployment-handler/index.py to address #22661.
However, this introduced a new bug: we got unexpected file tampering where a multiline JSON file gets manipulated to a single line after being uploaded to S3. As a result, the md5 value we compute for the corresponding local file doesn't match the Etag of the corresponding remote file in S3.
Regression Issue
- Select this option if this issue appears to be a regression.
Last Known Working CDK Library Version
v2.184.1
Expected Behavior
We have a CDK package which leverages aws-s3-deployment to upload local files (each of which has multiline JSON representing an EMR cluster template) to a predefined S3 location. For each uploaded file, we expect that the content in S3 is identical to what's on the local disk. Specifically, we compute the md5 value for a local file, and then compare it against the Etag value of the corresponding remote file that's uploaded to S3 by aws-s3-deployment. We expect the md5 value matches the Etag value, as observed in the past 3 years.
Current Behavior
We observe unexpected file tampering where a multiline JSON file gets manipulated to a single line after being uploaded to S3. As a result, the md5 value computed from the local file doesn't match the Etag of the corresponding remote file that's uploaded to S3 by aws-s3-deployment.
Reproduction Steps
- CDK code
const localPath = "lib/assets/cluster-templates"
const bucket = "my.s3.bucket"
const remotePath = "my/remote/path"
new s3Deployment.BucketDeployment(this, `UploadAssets-${index}`, {
sources: [s3Deployment.Source.asset(localPath)],
destinationBucket: bucket,
destinationKeyPrefix: remotePath
});
- Local file
$md5 lib/assets/cluster-templates/RegularCluster.json
MD5 (lib/assets/cluster-templates/RegularCluster.json) = 1fe2c0c8b4e999eef7a5e4bcf341f8a5
- Remote file downloaded from S3: the following md5 matches the Etag provided by S3
$md5 ~/Downloads/RegularCluster.json
MD5 (~/Downloads/RegularCluster.json) = 77d7275f8d5aa6d089fb2e55c0ef5628
- Trimmed local file: I have to manipulate the content in order to match the above Etag
$cat lib/assets/cluster-templates/RegularCluster.json | jq -c | sed 's/,\([^ ]\)/, \1/g; s/:\([^/ ]\)/: \1/g; s/sys: /sys:/g' | tr -d '\n' | md5
77d7275f8d5aa6d089fb2e55c0ef5628
Possible Solution
Consider to introduce a flag in the BucketDeployment construct to support the old behavior, i.e., the file content should NOT be tampered when we leverage aws-s3-deployment to upload files from a local path to a remote S3 location.
Additional Information/Context
No response
AWS CDK Library version (aws-cdk-lib)
2.186.0
AWS CDK CLI version
2.1020.2
Node.js Version
23.1.0
OS
mac 15.5 (Sequoia)
Language
TypeScript
Language Version
No response
Other information
No response