GH-48210: [C++][Parquet] Fix Bloom Filter logic to enable Parquet DB support on s390x#48211
GH-48210: [C++][Parquet] Fix Bloom Filter logic to enable Parquet DB support on s390x#48211Vishwanatha-HD wants to merge 1 commit intoapache:mainfrom
Conversation
|
|
6f6199c to
7dbe358
Compare
|
Bloom filters are one of those parts of Parquet where tiny byte-order details end up mattering way more than you’d expect, so it’s good to see attention landing here. Something I ran into on s390x is that the xxhash input/output tends to stay a lot more predictable if the bitset words are kept in a single canonical order (LE in our case) and the reader/writer treat them as such. In my own experiments I normalized the bitset once at the boundary and let the rest of the logic operate on native values. In this patch, the per-word Not calling this out as a problem — the behavior you’re targeting here lines up with what I’ve seen on s390x, especially around making sure the mask checks behave the same across BE/LE hosts. Just sharing observations in case it’s useful while these pieces get polished. |
7dbe358 to
70dd0c1
Compare
Rationale for this change
This PR is intended to enable Parquet DB support on Big-endian (s390x) systems. The fix in this PR fixes the Bloom Filter logic.
What changes are included in this PR?
The fix includes changes to following file:
cpp/src/parquet/bloom_filter.cc
Are these changes tested?
Yes. The changes are tested on s390x arch to make sure things are working fine. The fix is also tested on x86 arch, to make sure there is no new regression introduced.
Are there any user-facing changes?
No
GitHub main Issue link: #48151