-
Notifications
You must be signed in to change notification settings - Fork 1k
[FEA] Use post_traversal to populate "base" column statistics #19390
Copy link
Copy link
Closed
Labels
PythonAffects Python cuDF API.Affects Python cuDF API.cudf-polarsIssues specific to cudf-polarsIssues specific to cudf-polarsfeature requestNew feature or requestNew feature or request
Description
Implement a post_traversal pass over the un-lowered IR graph to populate dict[IR, dict[str, ColumnStats]] and dict[IR, RowCount] data structure with base (i.e. source) statistics. The necessary statistics classes were added in #19276.
This traversal will not update the ColumnStats.unique_stats attribute for each column yet. The goal of this traversal is to make sure DataSourceInfo and source-based row-count estimates are fully propagated.
We can also use this traversal to call add_unique_stats_column for known GroupBy and Distinct key columns. This way, the first call too DataSourceInfo.unique_stats(*) (expected during a later IR-graph traversal) will collect row-group information for all known GroupBy/Distinct keys.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
PythonAffects Python cuDF API.Affects Python cuDF API.cudf-polarsIssues specific to cudf-polarsIssues specific to cudf-polarsfeature requestNew feature or requestNew feature or request
Type
Projects
Status
Done