Skip to content

Used TypeDict for CachingVisitor.state#19135

Merged
rapids-bot[bot] merged 29 commits intorapidsai:branch-25.08from
TomAugspurger:tom/typed-state
Jul 2, 2025
Merged

Used TypeDict for CachingVisitor.state#19135
rapids-bot[bot] merged 29 commits intorapidsai:branch-25.08from
TomAugspurger:tom/typed-state

Conversation

@TomAugspurger
Copy link
Copy Markdown
Contributor

@TomAugspurger TomAugspurger commented Jun 11, 2025

This changes CachingVisitor to used a TypedDict for state. Previously, we used a Mapping[str, Any], which had two problems:

  1. Typos in the key names can cause unexpected KeyErrors
  2. The Any type for the values of state means that, at least by default, you won't get any type checking on things using values looked up from state.

Unit tests should catch all these, but this can provide some earlier feedback on any issues.

Now, we'll declare a State TypedDict with some specific fields for each transformer that uses CachingVisitor.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Jun 11, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions github-actions bot added Python Affects Python cuDF API. cudf-polars Issues specific to cudf-polars labels Jun 11, 2025
@TomAugspurger TomAugspurger added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change and removed Python Affects Python cuDF API. cudf-polars Issues specific to cudf-polars labels Jun 11, 2025
@GPUtester GPUtester moved this to In Progress in cuDF Python Jun 11, 2025
@github-actions github-actions bot added Python Affects Python cuDF API. cudf-polars Issues specific to cudf-polars labels Jun 11, 2025
This changes `CachingVisitor.state` to used a `TypedDict`. Previously,
we used a `Mapping[str, Any]`, which had two problems:

1. Risks typos in the key names causing unexpected KeyErrors
2. The `Any` type for the values of `state` mean that, at least by
   default, you won't get any type checking on things using values
   looked up from `state`.

Unit tests should catch all these, but this can provide some earlier
feedback on any issues.

The `total=False` keyword is used in the typeddict to indicate that
these keys are all optional. `mypy` *doesn't* error if you do a lookup
on an optional field of a TypedDict, so we do still need to handle
missing keys like we have been.
@TomAugspurger
Copy link
Copy Markdown
Contributor Author

FYI @rjzamora this will have some overlap with #19130. Any new keys we add to state will need a field in CachingVisitorState.

@TomAugspurger TomAugspurger marked this pull request as ready for review June 11, 2025 19:29
@TomAugspurger TomAugspurger requested a review from a team as a code owner June 11, 2025 19:29
@TomAugspurger
Copy link
Copy Markdown
Contributor Author

Hmm, failing on 3.10: https://github.com/rapidsai/cudf/actions/runs/15594634303/job/43923137155?pr=19135#step:10:7551

 │ │ TypeError: cannot inherit from both a TypedDict type and a non-TypedDict base class

This might need to wait until we drop Python 3.10.

@TomAugspurger
Copy link
Copy Markdown
Contributor Author

Just seeing if CI passes with TypedDict from typing_extensions. The docs say this should work. If it does, we can discuss adding that as a dependency for python=3.10. Maybe not worth it.

@TomAugspurger TomAugspurger requested a review from a team as a code owner June 12, 2025 16:16
@TomAugspurger TomAugspurger requested a review from gforsyth June 12, 2025 16:16
@TomAugspurger TomAugspurger marked this pull request as draft June 12, 2025 16:17
@TomAugspurger
Copy link
Copy Markdown
Contributor Author

I'd like to add a dependency on typing-extensions for just python 3.10. For 3.11 and newer, we don't need typing-extensions. Here's what I think is the desired output:

diff --git a/conda/recipes/cudf-polars/recipe.yaml b/conda/recipes/cudf-polars/recipe.yaml
index 85740e5930..8133c7e8a5 100644
--- a/conda/recipes/cudf-polars/recipe.yaml
+++ b/conda/recipes/cudf-polars/recipe.yaml
@@ -50,6 +50,7 @@ requirements:
     - python
     - pylibcudf =${{ version }}
     - polars >=1.24,<1.31
+    - typing-extensions  # py==310
     - ${{ pin_compatible("cuda-version", upper_bound="x", lower_bound="x") }}
   ignore_run_exports:
     by_name:
diff --git a/python/cudf_polars/pyproject.toml b/python/cudf_polars/pyproject.toml
index b3d28e027d..9589a8a3f9 100644
--- a/python/cudf_polars/pyproject.toml
+++ b/python/cudf_polars/pyproject.toml
@@ -21,6 +21,7 @@ requires-python = ">=3.10"
 dependencies = [
     "polars>=1.25,<1.31",
     "pylibcudf==25.8.*,>=0.0.0a0",
+    "typing-extensions; python_version < '3.11'",
 ] # This list was generated by `rapids-dependency-file-generator`. To make changes, edit ../../dependencies.yaml and run `rapids-dependency-file-generator`.
 classifiers = [
     "Intended Audience :: Developers",

I'd hoped that 4ed5168 would do the trick, but that gives the same output as what's on 25.08 (no change).

@gforsyth
Copy link
Copy Markdown
Contributor

gforsyth commented Jun 12, 2025

Hey @TomAugspurger -- I'm not sure if the dependency file generator knows how to handle version-specific dependencies in a pyproject.toml.

I think it's reasonable to do something like:

@@ -834,13 +834,11 @@ dependencies:
         packages:
           - polars>=1.25,<1.31
     specific:
-      - output_types: [conda, requirements, pyproject]
+      - output_types: pyproject
         matrices:
-          - matrix: {py: "3.10"}
-            packages:
-              - typing-extensions
           - matrix: null
-            packages: null
+            packages:
+              - "typing-extensions; python_version < '3.11'"
   run_cudf_polars_experimental:
     common:
       - output_types: [conda, requirements, pyproject]

The other spot where typing_extensions would need to show up is in the conda environment files, but it seems to already be there.

As for the rattler-build recipes, those aren't covered (yet) by dependencies.yaml, so you can go ahead and make the necessary changes directly.

@rjzamora
Copy link
Copy Markdown
Member

If folks are OK with this overall I'll clean up a couple things (docs, where we define these Transformer subtypes) and ping for another round of reviews.

I'll defer to you on whether the extra effort is worth it, and I'll be happy to review.

@TomAugspurger TomAugspurger marked this pull request as draft June 18, 2025 14:24
- Move State dicts to where they're used
- Move Transformer subclasses to where they're used
@TomAugspurger TomAugspurger marked this pull request as ready for review June 18, 2025 16:30
@TomAugspurger TomAugspurger requested a review from wence- June 24, 2025 13:44
Copy link
Copy Markdown
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks close to ready right now (hence approving).

I think we should be able to get away without explicitly constructing the TypedDict instances (see example).

@TomAugspurger
Copy link
Copy Markdown
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit d0dc7d0 into rapidsai:branch-25.08 Jul 2, 2025
94 checks passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in cuDF Python Jul 2, 2025
@TomAugspurger TomAugspurger deleted the tom/typed-state branch July 2, 2025 22:05
fn: Callable[[U_contra, GenericTransformer[U_contra, V_co, StateT_co]], V_co],
*,
state: Mapping[str, Any] | None = None,
state: Mapping[str, Any],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have been StateT_co I think

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That errored with

python/cudf_polars/cudf_polars/dsl/traversal.py:128: error: Cannot use a covariant type variable as a parameter  [misc]

Some discussion of this at python/mypy#7049, but I wasn't able to figure out a way around that.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because mypy doesn't understand how we're using this state object and doesn't realise that this make_recursive is actually a type constructor rather than a regular function.

We could have fixed this by:

type: ignore (describing why)

defining a copy of CachingVisitor called make_recursive that is identically except for the lack of cache.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll fix get that fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cudf-polars Issues specific to cudf-polars improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

6 participants