Skip to content

Add post_traversal API to cudf-polars#19258

Merged
rapids-bot[bot] merged 8 commits intorapidsai:branch-25.08from
rjzamora:post-traversal
Jul 3, 2025
Merged

Add post_traversal API to cudf-polars#19258
rapids-bot[bot] merged 8 commits intorapidsai:branch-25.08from
rjzamora:post-traversal

Conversation

@rjzamora
Copy link
Copy Markdown
Member

Description

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@rjzamora rjzamora requested a review from a team as a code owner June 30, 2025 22:11
@rjzamora rjzamora requested review from Matt711 and bdice June 30, 2025 22:11
@rjzamora rjzamora added feature request New feature or request 2 - In Progress Currently a work in progress non-breaking Non-breaking change cudf-polars Issues specific to cudf-polars labels Jun 30, 2025
@github-actions github-actions bot added the Python Affects Python cuDF API. label Jun 30, 2025
@GPUtester GPUtester moved this to In Progress in cuDF Python Jun 30, 2025
Copy link
Copy Markdown
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks.

Only question is around test coverage. It's probably worth having a a test with multiple nodes to ensure that we go through the expressions in the right order (I suspect that's important). I'd suggest something, but I'm not sure offhand whether it goes through the first or last node first 😓

@TomAugspurger
Copy link
Copy Markdown
Contributor

This maybe?

diff --git a/python/cudf_polars/tests/dsl/test_traversal.py b/python/cudf_polars/tests/dsl/test_traversal.py
index 7b7312fdd0..1b95e04b36 100644
--- a/python/cudf_polars/tests/dsl/test_traversal.py
+++ b/python/cudf_polars/tests/dsl/test_traversal.py
@@ -71,6 +71,24 @@ def test_post_traversal_unique():
     assert unique_exprs == [expr.Col(dt, "b"), expr.Col(dt, "a"), e3]
 
 
+def test_post_traversal_multi():
+    dt = DataType(pl.datatypes.Int8())
+
+    e1 = make_expr(dt, "a", "a")
+    e2 = make_expr(dt, "a", "b")
+    e3 = make_expr(dt, "b", "a")
+
+    unique_exprs = list(post_traversal([e1, e2, e3]))
+    assert len(unique_exprs) == 5
+    assert unique_exprs == [
+        expr.Col(dt, "b"),
+        expr.Col(dt, "a"),
+        e3,
+        e2,
+        e1,
+    ]
+
+

That passes. Hopefully it's correct.

@rjzamora
Copy link
Copy Markdown
Member Author

rjzamora commented Jul 2, 2025

That passes. Hopefully it's correct.

Thanks @TomAugspurger ! Seems reasonable to me (added).

@TomAugspurger
Copy link
Copy Markdown
Contributor

Fixed the merge conflict from #19135.

@TomAugspurger
Copy link
Copy Markdown
Contributor

/merge

@rapids-bot rapids-bot bot merged commit 1c0c45d into rapidsai:branch-25.08 Jul 3, 2025
91 checks passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in cuDF Python Jul 3, 2025
rapids-bot bot pushed a commit that referenced this pull request Jul 16, 2025
Probably supersedes #19130

The goal of this PR is to define the classes needed to store column statistics for an `IR` node. Some cirteria:

- We need the statistics for a column to contain a reference to the underlying datasource information (e.g. unique-value statistics, row-count, and average storage/file size). 
- We want caching for each datasource and column.
- We want the option to perform metadata/data sampling lazily on the datasource.
- We want our Parquet partitioning logic to use the same infrastructure (to avoid redundant sampling).
- We want to record when a specific statistic is "exact" (rather than estimated).

Also related:
- #19258
- #19273

Authors:
  - Richard (Rick) Zamora (https://github.com/rjzamora)

Approvers:
  - Tom Augspurger (https://github.com/TomAugspurger)
  - Matthew Murray (https://github.com/Matt711)

URL: #19276
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2 - In Progress Currently a work in progress cudf-polars Issues specific to cudf-polars feature request New feature or request non-breaking Non-breaking change Python Affects Python cuDF API.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

6 participants