Skip to content

Conversation

@zeroshade
Copy link
Member

Rationale for this change

Addressing the comments in #278 (comment) to allow for optimizing reads by skipping entire pages and leveraging the offset index if it exists.

What changes are included in this PR?

Deprecating the old NewColumnChunkReader and NewPageReader methods as they really aren't safe to use outside of the package, and have proved difficult to evolve without breaking changes. Instead users should rely on using the RowGroupReader to perform the creation of the column readers and page readers, which is generally what is done by consumers already.

Adding SeekToRow method on the ColumnChunkReader to allow skipping to a particular row in the column chunk (which also allows quickly resetting back to the beginning of a column!) along with SeekToPageWithRow method on the page reader. Also updates the Skip method to properly skip rows in a repeated column, not just values.

Are these changes tested?

Yes, tests are included.

Are there any user-facing changes?

Just the new methods. The deprecated methods are not removed currently.

@zeroshade zeroshade merged commit 6dc6926 into apache:main Feb 20, 2025
23 checks passed
@zeroshade zeroshade deleted the seek-to-row branch February 20, 2025 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants