Skip to content

Conversation

@munendrasn
Copy link
Contributor

When ResolvingFileIo is invoked without conf set (which can happen when catalog or fileIo is explicitly created), it throws NPE.

java.lang.NullPointerException
	at org.apache.iceberg.io.ResolvingFileIO.lambda$io$1(ResolvingFileIO.java:179)
	at java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1705)
	at org.apache.iceberg.io.ResolvingFileIO.io(ResolvingFileIO.java:176)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.apache.iceberg.common.DynMethods$UnboundMethod.invokeChecked(DynMethods.java:62)
	at org.apache.iceberg.common.DynMethods$UnboundMethod.invoke(DynMethods.java:74)
	at org.apache.iceberg.common.DynMethods$BoundMethod.invoke(DynMethods.java:171)

The workaround is set to Configuration always.
For example, while creating the RESTCatalog like below but the order need to be maintained

RESTCatalog catalog = new RESTCatalog();
catalog.setConf(new Configuration());
catalog.initialize("rest", Map.of());

Exception HadoopFileIO, conf is not required or unused for other FileIo impl hence, handling with soft check, other option was to have strict check and fail if it is not set here

@munendrasn
Copy link
Contributor Author

@nastra FYI

impl,
key -> {
Configuration conf = hadoopConf.get();
Configuration conf = Optional.ofNullable(hadoopConf).map(Supplier::get).orElse(null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that ResolvingFileIO is implementing Configurable from Hadoop it actually requires the conf to be set via setConf but I agree that this is a bit problematic for FileIO instances that don't need this, like S3FileIO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do feel like it would be better to add this null handling to getConf() directly and then just call getConf() here

Copy link
Contributor

@nastra nastra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one minor comment to move the null handling code. @amogh-jahagirdar can you double-check the changes here as well please?

@munendrasn
Copy link
Contributor Author

@nastra Thank you for the review, please let me know if I should moved the null check to getConf or should we create new Configuration if not set?

@nastra
Copy link
Contributor

nastra commented Aug 19, 2024

@nastra Thank you for the review, please let me know if I should moved the null check to getConf or should we create new Configuration if not set?

@munendrasn yes I think it makes sense to move the null handling code into getConf(see also my other comment on this)

@munendrasn
Copy link
Contributor Author

@nastra
Updated, reverted the last commit in the PR fe256b4

@amogh-jahagirdar amogh-jahagirdar changed the title Fix NPE in Resolving FileIo when conf is not set Core,AWS: Fix NPE in ResolvingFileIO when HadoopConf is not set Aug 22, 2024
Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this change, I think it's the right behavior to just set a null conf instead of a strict failure if a HadoopConf is not set. The fact that ResolvingFileIO implements HadoopConfigurable does not mean a HadoopConf must be set, it just means that option is available and will be applied if it's specified.

@amogh-jahagirdar
Copy link
Contributor

THanks @munendrasn and thanks @nastra for the review.

impl,
key -> {
Configuration conf = hadoopConf.get();
Configuration conf = getConf();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I double checked that the CatalogUtil.loadFileIO calls below are null protected as well, the s3fileIO test will exercise that API so that gives confidence too.

@amogh-jahagirdar amogh-jahagirdar merged commit cbd71eb into apache:main Aug 22, 2024
@munendrasn
Copy link
Contributor Author

Thank you @nastra and @amogh-jahagirdar for the review

@munendrasn munendrasn deleted the resolving-file-io-npe branch August 22, 2024 04:52
zachdisc pushed a commit to zachdisc/iceberg that referenced this pull request Dec 23, 2024
zhongyujiang pushed a commit to zhongyujiang/iceberg that referenced this pull request Apr 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants