Skip to content

Conversation

@aokolnychyi
Copy link
Contributor

@aokolnychyi aokolnychyi commented Nov 1, 2024

This PR adds referenced_data_file to our delete metadata as per the design doc for improving position deletes.

This PR cannot be merged until the vote is closed. This work is part of #11122.

@aokolnychyi aokolnychyi changed the title API, Core: Add data file reference to DeleteFile [WIP] API, Core: Add data file reference to DeleteFile Nov 1, 2024
case 15:
return wrapped.sortOrderId();
case 16:
if (wrapped instanceof DeleteFile) {
Copy link
Contributor Author

@aokolnychyi aokolnychyi Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a big fan of this instanceof given that it is called in a tight loop. We need it as IndexedDataFile is used for both data and deletes. At the same time, I am not sure it is reasonable to add IndexedDeleteFile for the sake of handling 3 different fields.

Any thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would define referencedDataFile on ContentFile and have DataFile always return null. That's how we handle content in v1/v2. DataFile returns a constant value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to avoid unnecessary methods in DataFile. The same goes for content offset and size, which will only be applicable to deletes. Let me think.

Copy link
Contributor Author

@aokolnychyi aokolnychyi Nov 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better not to expose these methods on DataFile:

  • Avoids reliance on validation outside of this block to ensure these fields are null for data files. We had bugs before that allowed persisting equality field IDs for data files.
  • Avoids unnecessary methods that can be invoked but have no meaning, which we have to explain.

case 17:
this.referencedDataFile = value != null ? value.toString() : null;
return;
case 18:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes the ordinal of fileOrdinal. This normally doesn't matter as we read with a projection. We also changed it a couple of times before. Anything I missed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as both reads and writes are updated, it should be fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for confirming!

@aokolnychyi aokolnychyi changed the title [WIP] API, Core: Add data file reference to DeleteFile API, Core: Add data file reference to DeleteFile Nov 2, 2024
@aokolnychyi aokolnychyi merged commit d9b9768 into apache:main Nov 2, 2024
@aokolnychyi
Copy link
Contributor Author

Thanks for reviewing, @rdblue @amogh-jahagirdar!

zachdisc pushed a commit to zachdisc/iceberg that referenced this pull request Dec 23, 2024
cogwirrel pushed a commit to cogwirrel/iceberg that referenced this pull request Aug 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants