After #587, resolve_table accepts a filters parameter for pushed-down filter predicates, but filters are never populated because VegaFusion's _vf_order window (added by with_index() in DataUrlTask::eval and elsewhere) sits between the scan and user filter transforms.
The current plan structure:
scan -> with_index(_vf_order window) -> datetime processing -> filter transform -> ...
DataFusion's PushDownFilter optimizer rule won't push filters past Window nodes, so ExternalTableProvider.scan.filters is always empty.
One potential approach: restructure with_index() so it's applied after filter transforms (or after all transforms that don't depend on row ordering). The filter doesn't reference _vf_order, so it should be safe to apply before the window:
scan -> datetime processing -> filter transform -> with_index(_vf_order window) -> ...
This would let DataFusion naturally push filters into the scan, enabling resolvers (e.g., Delta Lake, Parquet) to skip data at read time.
Ref: vegafusion-runtime/src/data/tasks.rs line 202 (df.with_index()?)
Test: test_resolve_table_with_filter_transform in test_plan_resolver.py asserts captured_filters == [] with a TODO
After #587,
resolve_tableaccepts afiltersparameter for pushed-down filter predicates, but filters are never populated because VegaFusion's_vf_orderwindow (added bywith_index()inDataUrlTask::evaland elsewhere) sits between the scan and user filter transforms.The current plan structure:
DataFusion's
PushDownFilteroptimizer rule won't push filters pastWindownodes, soExternalTableProvider.scan.filtersis always empty.One potential approach: restructure
with_index()so it's applied after filter transforms (or after all transforms that don't depend on row ordering). The filter doesn't reference_vf_order, so it should be safe to apply before the window:This would let DataFusion naturally push filters into the scan, enabling resolvers (e.g., Delta Lake, Parquet) to skip data at read time.
Ref:
vegafusion-runtime/src/data/tasks.rsline 202 (df.with_index()?)Test:
test_resolve_table_with_filter_transformintest_plan_resolver.pyassertscaptured_filters == []with a TODO