Enable the disabled portion of the test_parquet_bloom_filter pytest which aborts due to an Arrow v19 incompatibility issue apache/arrow#45283 with stats.
The arrow bug has been fixed by PR apache/arrow#45285 but there isn't yet an Arrow release containing this PR. Once there is one available and cuDF is bumped to it, we can revert the said pytest back to the following:
def test_parquet_bloom_filters(
datadir, stats_fname, bloom_filter_fname, predicate, expected_len
):
fname_bf = datadir / bloom_filter_fname
df_bf = cudf.read_parquet(fname_bf, filters=predicate).reset_index(
drop=True
)
fname_stats = datadir / stats_fname
df_stats = cudf.read_parquet(fname_stats, filters=predicate).reset_index(
drop=True
)
# Check if tables equal
assert_eq(
df_stats,
df_bf,
)
# Check for table length
assert_eq(
len(df_bf),
expected_len,
)
See comment by @mhaseeb123 in #17422 (comment) for more details.
Enable the disabled portion of the
test_parquet_bloom_filterpytest which aborts due to an Arrow v19 incompatibility issue apache/arrow#45283 with stats.The arrow bug has been fixed by PR apache/arrow#45285 but there isn't yet an Arrow release containing this PR. Once there is one available and cuDF is bumped to it, we can revert the said pytest back to the following:
See comment by @mhaseeb123 in #17422 (comment) for more details.