GH-48234: [C++][Parquet] Fix overly strict check for BIT_PACKED levels byte size#48235
GH-48234: [C++][Parquet] Fix overly strict check for BIT_PACKED levels byte size#48235pitrou merged 1 commit intoapache:mainfrom
Conversation
… levels byte size The original change from data_size to data_size - 4 was made by me in apache#6848 and it was probably based on a misunderstanding, or just blindly copying the similar check for Encoding::RLE (which, unlike Encoding::BIT_PACKED, does have a 4-byte length header).
|
How was this found? Does someone meet this or we have came across this? |
|
Yes, I forgot to mention: this was found out in this PR: https://github.com/apache/arrow/pull/48205/files#r2553142118 |
|
@github-actions crossbow submit -g cpp |
|
I give a quick glance and this is ok to me, but I may need some time to find out why it doesn't trigger any bug previously... |
|
Revision: 0390720 Submitted crossbow builds: ursacomputing/crossbow @ actions-d4511141a8 |
Probably because |
|
I'll merge now. |
mapleFU
left a comment
There was a problem hiding this comment.
I forgot to approve yesterday night, I've tried to set the encoder to BIT_PACKED but it failed to encode. Perhaps we don't have testing data for this, and currently writer would merely produce file like this. So the patch LGTM
|
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 52d258b. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 2 possible false positives for unstable benchmarks that are known to sometimes produce them. |
Rationale for this change
The original change from
data_sizetodata_size - 4was made by me in #6848 and it was probably based on a misunderstanding, or just blindly copying the similar check forEncoding::RLE(which, unlikeEncoding::BIT_PACKED, does have a 4-byte length header).Are these changes tested?
Yes, by existing CI tests.
Are there any user-facing changes?
No, just a potential bugfix for an unlikely case.
LevelDecoder::SetData#48234