Skip to content

Conversation

@CsengerG
Copy link
Contributor

This allows developers to monitor which version of Iceberg they have deployed to a cluster (for example, through the S3 Access Logs, which contain the user agent field).

@github-actions github-actions bot added the AWS label Mar 15, 2024
@CsengerG
Copy link
Contributor Author

Hi @nastra, I've seen you reviewing PRs in the AWS space before, I was wondering if you could take a look at this (or suggest other reviewers who review S3-related code changes?). Thanks!

config ->
config.putAdvancedOption(
SdkAdvancedClientOption.USER_AGENT_PREFIX,
"s3fileio " + IcebergBuild.fullVersion()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you might also need to update

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also maybe worth changing the user agent to "s3fileio/" + EnvironmentContext.get(). That way if you're running Spark/Flink, you'll also get information about that engine and its version

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, these are great callouts!

I factored this lambda out into a separate "mutator" like it is done for the other configs and included a unit test that at least calls the code. I tried to introduce some assertions to check if the correct String is set as User Agent but it felt like I was just re-implementing the original code path with captors so I abandoned that effort.

}
}

private static final String S3_FILE_IO_SIGNATURE = "s3fileio/" + EnvironmentContext.get();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be defined at the top of the file (after all the other public static final vars) . Also signature seems rather confusing to me. Why not just use S3_FILE_IO_USER_AGENT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, thanks! In my mind signature/fingerprint a software leaves after it touched something mean the same thing, but the term signature also has uses in cryptography and AuthN so I get the confusion.

I agree that user agent will be more descriptive here, changed it now.

This allows developers to monitor which version of Iceberg they have
deployed to a cluster (for example, S3 Access Logs contain the user agent
field).
@nastra nastra requested a review from amogh-jahagirdar March 25, 2024 14:38
@CsengerG
Copy link
Contributor Author

Looks like the link checker got throttled (HTTP 429s from Medium). Is this a known failure mode?

@nastra
Copy link
Contributor

nastra commented Mar 25, 2024

Looks like the link checker got throttled (HTTP 429s from Medium). Is this a known failure mode?

This isn't currently a known issue. Could you open an issue for this? I've also seen other PRs fail on this unfortunately (cc @Fokko)

Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I think this is a helpful feature to add, and the changes look good to me.

@amogh-jahagirdar
Copy link
Contributor

Since the failures were trottling errors from another service, I retried and they succeeded. If this is a consistent problem now in our CI, let's create a separate issue to track. Merging the change. Thanks for the contribution @CsengerG and the review @nastra

@CsengerG
Copy link
Contributor Author

Opened #10038 to track.

@CsengerG CsengerG deleted the user-agent branch March 25, 2024 20:20
zachdisc pushed a commit to zachdisc/iceberg that referenced this pull request Dec 23, 2024
This allows developers to monitor which version of Iceberg they have
deployed to a cluster (for example, S3 Access Logs contain the user agent
field).

Co-authored-by: Geza Csenger <gccsenge@amazon.com>
parthchandra pushed a commit to parthchandra/iceberg that referenced this pull request Oct 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants