Skip to content

GH-48234: [C++][Parquet] Fix overly strict check for BIT_PACKED levels byte size#48235

Merged
pitrou merged 1 commit intoapache:mainfrom
pitrou:gh48234-fix-pq-check
Nov 25, 2025
Merged

GH-48234: [C++][Parquet] Fix overly strict check for BIT_PACKED levels byte size#48235
pitrou merged 1 commit intoapache:mainfrom
pitrou:gh48234-fix-pq-check

Conversation

@pitrou
Copy link
Copy Markdown
Member

@pitrou pitrou commented Nov 24, 2025

Rationale for this change

The original change from data_size to data_size - 4 was made by me in #6848 and it was probably based on a misunderstanding, or just blindly copying the similar check for Encoding::RLE (which, unlike Encoding::BIT_PACKED, does have a 4-byte length header).

Are these changes tested?

Yes, by existing CI tests.

Are there any user-facing changes?

No, just a potential bugfix for an unlikely case.

… levels byte size

The original change from data_size to data_size - 4 was made by me in apache#6848 and it was probably based on a misunderstanding, or just blindly copying the similar check for Encoding::RLE (which, unlike Encoding::BIT_PACKED, does have a 4-byte length header).
@pitrou pitrou marked this pull request as ready for review November 24, 2025 09:35
@pitrou pitrou requested a review from wgtmac as a code owner November 24, 2025 09:35
@pitrou pitrou requested a review from mapleFU November 24, 2025 09:35
@mapleFU
Copy link
Copy Markdown
Member

mapleFU commented Nov 24, 2025

How was this found? Does someone meet this or we have came across this?

@pitrou
Copy link
Copy Markdown
Member Author

pitrou commented Nov 24, 2025

Yes, I forgot to mention: this was found out in this PR: https://github.com/apache/arrow/pull/48205/files#r2553142118

@pitrou
Copy link
Copy Markdown
Member Author

pitrou commented Nov 24, 2025

@github-actions crossbow submit -g cpp

@mapleFU
Copy link
Copy Markdown
Member

mapleFU commented Nov 24, 2025

I give a quick glance and this is ok to me, but I may need some time to find out why it doesn't trigger any bug previously...

@github-actions
Copy link
Copy Markdown

Revision: 0390720

Submitted crossbow builds: ursacomputing/crossbow @ actions-d4511141a8

Task Status
example-cpp-minimal-build-static GitHub Actions
example-cpp-minimal-build-static-system-dependency GitHub Actions
example-cpp-tutorial GitHub Actions
test-build-cpp-fuzz GitHub Actions
test-conda-cpp GitHub Actions
test-conda-cpp-valgrind GitHub Actions
test-cuda-cpp-ubuntu-22.04-cuda-11.7.1 GitHub Actions
test-debian-12-cpp-amd64 GitHub Actions
test-debian-12-cpp-i386 GitHub Actions
test-fedora-42-cpp GitHub Actions
test-ubuntu-22.04-cpp GitHub Actions
test-ubuntu-22.04-cpp-20 GitHub Actions
test-ubuntu-22.04-cpp-bundled GitHub Actions
test-ubuntu-22.04-cpp-emscripten GitHub Actions
test-ubuntu-22.04-cpp-no-threading GitHub Actions
test-ubuntu-24.04-cpp GitHub Actions
test-ubuntu-24.04-cpp-bundled-offline GitHub Actions
test-ubuntu-24.04-cpp-gcc-13-bundled GitHub Actions
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions
test-ubuntu-24.04-cpp-minimal-with-formats GitHub Actions
test-ubuntu-24.04-cpp-thread-sanitizer GitHub Actions

@pitrou
Copy link
Copy Markdown
Member Author

pitrou commented Nov 24, 2025

I give a quick glance and this is ok to me, but I may need some time to find out why it doesn't trigger any bug previously...

Probably because data_size is the entire page size, and it also contains the page values. It's rare for page values to take less than 4 bytes.

@pitrou
Copy link
Copy Markdown
Member Author

pitrou commented Nov 25, 2025

I'll merge now.

@pitrou pitrou merged commit 52d258b into apache:main Nov 25, 2025
39 of 41 checks passed
@pitrou pitrou removed the awaiting review Awaiting review label Nov 25, 2025
@pitrou pitrou deleted the gh48234-fix-pq-check branch November 25, 2025 08:39
Copy link
Copy Markdown
Member

@mapleFU mapleFU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to approve yesterday night, I've tried to set the encoder to BIT_PACKED but it failed to encode. Perhaps we don't have testing data for this, and currently writer would merely produce file like this. So the patch LGTM

@github-actions github-actions bot added the awaiting committer review Awaiting committer review label Nov 25, 2025
@conbench-apache-arrow
Copy link
Copy Markdown

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 52d258b.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 2 possible false positives for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants