Add post_traversal API to cudf-polars#19258
Merged
rapids-bot[bot] merged 8 commits intorapidsai:branch-25.08from Jul 3, 2025
Merged
Add post_traversal API to cudf-polars#19258rapids-bot[bot] merged 8 commits intorapidsai:branch-25.08from
post_traversal API to cudf-polars#19258rapids-bot[bot] merged 8 commits intorapidsai:branch-25.08from
Conversation
3 tasks
mroeschke
reviewed
Jun 30, 2025
mroeschke
reviewed
Jun 30, 2025
TomAugspurger
approved these changes
Jul 1, 2025
Contributor
There was a problem hiding this comment.
Looks good, thanks.
Only question is around test coverage. It's probably worth having a a test with multiple nodes to ensure that we go through the expressions in the right order (I suspect that's important). I'd suggest something, but I'm not sure offhand whether it goes through the first or last node first 😓
Contributor
|
This maybe? diff --git a/python/cudf_polars/tests/dsl/test_traversal.py b/python/cudf_polars/tests/dsl/test_traversal.py
index 7b7312fdd0..1b95e04b36 100644
--- a/python/cudf_polars/tests/dsl/test_traversal.py
+++ b/python/cudf_polars/tests/dsl/test_traversal.py
@@ -71,6 +71,24 @@ def test_post_traversal_unique():
assert unique_exprs == [expr.Col(dt, "b"), expr.Col(dt, "a"), e3]
+def test_post_traversal_multi():
+ dt = DataType(pl.datatypes.Int8())
+
+ e1 = make_expr(dt, "a", "a")
+ e2 = make_expr(dt, "a", "b")
+ e3 = make_expr(dt, "b", "a")
+
+ unique_exprs = list(post_traversal([e1, e2, e3]))
+ assert len(unique_exprs) == 5
+ assert unique_exprs == [
+ expr.Col(dt, "b"),
+ expr.Col(dt, "a"),
+ e3,
+ e2,
+ e1,
+ ]
+
+That passes. Hopefully it's correct. |
Member
Author
Thanks @TomAugspurger ! Seems reasonable to me (added). |
Matt711
approved these changes
Jul 2, 2025
wence-
reviewed
Jul 2, 2025
3 tasks
Contributor
|
Fixed the merge conflict from #19135. |
wence-
approved these changes
Jul 2, 2025
mroeschke
approved these changes
Jul 2, 2025
Contributor
|
/merge |
rapids-bot bot
pushed a commit
that referenced
this pull request
Jul 16, 2025
Probably supersedes #19130 The goal of this PR is to define the classes needed to store column statistics for an `IR` node. Some cirteria: - We need the statistics for a column to contain a reference to the underlying datasource information (e.g. unique-value statistics, row-count, and average storage/file size). - We want caching for each datasource and column. - We want the option to perform metadata/data sampling lazily on the datasource. - We want our Parquet partitioning logic to use the same infrastructure (to avoid redundant sampling). - We want to record when a specific statistic is "exact" (rather than estimated). Also related: - #19258 - #19273 Authors: - Richard (Rick) Zamora (https://github.com/rjzamora) Approvers: - Tom Augspurger (https://github.com/TomAugspurger) - Matthew Murray (https://github.com/Matt711) URL: #19276
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
post_traversalAPI.Checklist