GH-48210: [C++][Parquet] Fix Bloom Filter logic to enable Parquet DB support on s390x by Vishwanatha-HD · Pull Request #48211 · apache/arrow

Vishwanatha-HD · 2025-11-21T15:38:18Z

Rationale for this change

This PR is intended to enable Parquet DB support on Big-endian (s390x) systems. The fix in this PR fixes the Bloom Filter logic.

What changes are included in this PR?

The fix includes changes to following file:
cpp/src/parquet/bloom_filter.cc

Are these changes tested?

Yes. The changes are tested on s390x arch to make sure things are working fine. The fix is also tested on x86 arch, to make sure there is no new regression introduced.

Are there any user-facing changes?

No

GitHub main Issue link: #48151

GitHub Issue: [C++][Parquet] Fix Bloom Filter logic to enable Parquet DB support on Big-Endian (s390x) systems #48210

github-actions · 2025-11-21T15:38:48Z

⚠️ GitHub issue #48210 has been automatically assigned in GitHub to PR creator.

k8ika0s · 2025-11-23T22:20:42Z

@Vishwanatha-HD

Bloom filters are one of those parts of Parquet where tiny byte-order details end up mattering way more than you’d expect, so it’s good to see attention landing here.

Something I ran into on s390x is that the xxhash input/output tends to stay a lot more predictable if the bitset words are kept in a single canonical order (LE in our case) and the reader/writer treat them as such. In my own experiments I normalized the bitset once at the boundary and let the rest of the logic operate on native values.

In this patch, the per-word FromLittleEndian/ToLittleEndian inside the find/insert loops definitely keeps things correct, though it does create a slightly tighter coupling between the hashing logic and the byte-swapping. I only mention it because it can sometimes show up in profiling when bloom filters are exercised heavily over wide row groups.

Not calling this out as a problem — the behavior you’re targeting here lines up with what I’ve seen on s390x, especially around making sure the mask checks behave the same across BE/LE hosts. Just sharing observations in case it’s useful while these pieces get polished.

… s390x

Vishwanatha-HD requested a review from wgtmac as a code owner November 21, 2025 15:38

Vishwanatha-HD mentioned this pull request Nov 21, 2025

[C++][Parquet] Fix Bloom Filter logic to enable Parquet DB support on Big-Endian (s390x) systems #48210

Open

github-actions bot added Component: Parquet Component: C++ awaiting review Awaiting review labels Nov 21, 2025

k8ika0s mentioned this pull request Nov 21, 2025

GH-48213: [C++][Parquet] Fix endianness and test failures on s390x (big-endian) (supersedes partial fixes) #48212

Closed

Vishwanatha-HD mentioned this pull request Nov 21, 2025

[C++][Parquet] Enable Parquet DB support on Big Endian (IBM Z) systems #48151

Open

Vishwanatha-HD force-pushed the fixBloomFilters branch from 6f6199c to 7dbe358 Compare November 22, 2025 05:03

kou changed the title ~~GH-48210 Fix Bloom Filter logic to enable Parquet DB support on s390x~~ GH-48210: [C++][Parquet] Fix Bloom Filter logic to enable Parquet DB support on s390x Nov 22, 2025

apacheGH-48210 Fix Bloom Filter logic to enable Parquet DB support on…

70dd0c1

… s390x

Vishwanatha-HD force-pushed the fixBloomFilters branch from 7dbe358 to 70dd0c1 Compare November 29, 2025 13:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-48210: [C++][Parquet] Fix Bloom Filter logic to enable Parquet DB support on s390x#48211

GH-48210: [C++][Parquet] Fix Bloom Filter logic to enable Parquet DB support on s390x#48211
Vishwanatha-HD wants to merge 1 commit intoapache:mainfrom
Vishwanatha-HD:fixBloomFilters

Vishwanatha-HD commented Nov 21, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Nov 21, 2025

Uh oh!

k8ika0s commented Nov 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Vishwanatha-HD commented Nov 21, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

github-actions bot commented Nov 21, 2025

Uh oh!

k8ika0s commented Nov 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Vishwanatha-HD commented Nov 21, 2025 •

edited by github-actions bot

Loading