Skip to content

Push filter predicates down to ExternalTableProvider scans #589

@jonmmease

Description

@jonmmease

After #587, resolve_table accepts a filters parameter for pushed-down filter predicates, but filters are never populated because VegaFusion's _vf_order window (added by with_index() in DataUrlTask::eval and elsewhere) sits between the scan and user filter transforms.

The current plan structure:

scan -> with_index(_vf_order window) -> datetime processing -> filter transform -> ...

DataFusion's PushDownFilter optimizer rule won't push filters past Window nodes, so ExternalTableProvider.scan.filters is always empty.

One potential approach: restructure with_index() so it's applied after filter transforms (or after all transforms that don't depend on row ordering). The filter doesn't reference _vf_order, so it should be safe to apply before the window:

scan -> datetime processing -> filter transform -> with_index(_vf_order window) -> ...

This would let DataFusion naturally push filters into the scan, enabling resolvers (e.g., Delta Lake, Parquet) to skip data at read time.

Ref: vegafusion-runtime/src/data/tasks.rs line 202 (df.with_index()?)
Test: test_resolve_table_with_filter_transform in test_plan_resolver.py asserts captured_filters == [] with a TODO

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions