Skip to content

Enable the predicate pushdown pytest affected by Arrow v19 stats incompatibility bug. #17806

@mhaseeb123

Description

@mhaseeb123

Enable the disabled portion of the test_parquet_bloom_filter pytest which aborts due to an Arrow v19 incompatibility issue apache/arrow#45283 with stats.

The arrow bug has been fixed by PR apache/arrow#45285 but there isn't yet an Arrow release containing this PR. Once there is one available and cuDF is bumped to it, we can revert the said pytest back to the following:

def test_parquet_bloom_filters(
    datadir, stats_fname, bloom_filter_fname, predicate, expected_len
):
    fname_bf = datadir / bloom_filter_fname
    df_bf = cudf.read_parquet(fname_bf, filters=predicate).reset_index(
        drop=True
    )

    fname_stats = datadir / stats_fname
    df_stats = cudf.read_parquet(fname_stats, filters=predicate).reset_index(
        drop=True
    )

    # Check if tables equal
    assert_eq(
        df_stats,
        df_bf,
        )
    # Check for table length
    assert_eq(
        len(df_bf),
        expected_len,
    )

See comment by @mhaseeb123 in #17422 (comment) for more details.

Metadata

Metadata

Assignees

Labels

libcudfAffects libcudf (C++/CUDA) code.testsUnit testing for project

Type

Projects

Status

Needs owner

Relationships

None yet

Development

No branches or pull requests

Issue actions