-
Notifications
You must be signed in to change notification settings - Fork 3k
Aws: Add Iceberg version to UserAgent in S3 requests #9963
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi @nastra, I've seen you reviewing PRs in the AWS space before, I was wondering if you could take a look at this (or suggest other reviewers who review S3-related code changes?). Thanks! |
| config -> | ||
| config.putAdvancedOption( | ||
| SdkAdvancedClientOption.USER_AGENT_PREFIX, | ||
| "s3fileio " + IcebergBuild.fullVersion())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you might also need to update
| .build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also maybe worth changing the user agent to "s3fileio/" + EnvironmentContext.get(). That way if you're running Spark/Flink, you'll also get information about that engine and its version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, these are great callouts!
I factored this lambda out into a separate "mutator" like it is done for the other configs and included a unit test that at least calls the code. I tried to introduce some assertions to check if the correct String is set as User Agent but it felt like I was just re-implementing the original code path with captors so I abandoned that effort.
| } | ||
| } | ||
|
|
||
| private static final String S3_FILE_IO_SIGNATURE = "s3fileio/" + EnvironmentContext.get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be defined at the top of the file (after all the other public static final vars) . Also signature seems rather confusing to me. Why not just use S3_FILE_IO_USER_AGENT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, thanks! In my mind signature/fingerprint a software leaves after it touched something mean the same thing, but the term signature also has uses in cryptography and AuthN so I get the confusion.
I agree that user agent will be more descriptive here, changed it now.
This allows developers to monitor which version of Iceberg they have deployed to a cluster (for example, S3 Access Logs contain the user agent field).
|
Looks like the link checker got throttled (HTTP 429s from Medium). Is this a known failure mode? |
This isn't currently a known issue. Could you open an issue for this? I've also seen other PRs fail on this unfortunately (cc @Fokko) |
amogh-jahagirdar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I think this is a helpful feature to add, and the changes look good to me.
|
Opened #10038 to track. |
This allows developers to monitor which version of Iceberg they have deployed to a cluster (for example, S3 Access Logs contain the user agent field). Co-authored-by: Geza Csenger <gccsenge@amazon.com>
This allows developers to monitor which version of Iceberg they have deployed to a cluster (for example, through the S3 Access Logs, which contain the user agent field).