Skip to content

Allow setting StreamingExecutor.target_partition_size with an environment variable#19316

Merged
rapids-bot[bot] merged 4 commits intorapidsai:branch-25.08from
TomAugspurger:tom/target-partition-size-from-env
Jul 9, 2025
Merged

Allow setting StreamingExecutor.target_partition_size with an environment variable#19316
rapids-bot[bot] merged 4 commits intorapidsai:branch-25.08from
TomAugspurger:tom/target-partition-size-from-env

Conversation

@TomAugspurger
Copy link
Copy Markdown
Contributor

Description

This provides a way to set the default target_partition_size for the cudf-polars streaming executor with an environment variable. The default behavior is unchanged: use a fraction of the device size, as reported by pynvml, with a warning if pynvml can't be found. Setting it via the environment is useful for usage through libraries like Narwhals, where AFAICT there isn't a way to pass through arguments to the engine.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Jul 8, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions github-actions bot added Python Affects Python cuDF API. cudf-polars Issues specific to cudf-polars labels Jul 8, 2025
@GPUtester GPUtester moved this to In Progress in cuDF Python Jul 8, 2025
@TomAugspurger TomAugspurger force-pushed the tom/target-partition-size-from-env branch from 8b5921c to 5102809 Compare July 8, 2025 17:08
@TomAugspurger TomAugspurger added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jul 8, 2025
@TomAugspurger TomAugspurger marked this pull request as ready for review July 8, 2025 20:03
@TomAugspurger TomAugspurger requested a review from a team as a code owner July 8, 2025 20:03
@TomAugspurger TomAugspurger requested review from Matt711 and vyasr July 8, 2025 20:03
@TomAugspurger TomAugspurger changed the title Allow setting target partition size from the environment Allow setting StreamingExecutor.target_partition_size with an environment variable Jul 8, 2025
of the device memory, where the fraction depends on the scheduler:

- distributed: 1/40th of the device memory
- synchronous: 1/16th of the device memory
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What this fraction experimentally derived? Eg. Run pdsh with different partition sizes

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure (cc @rjzamora).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they were experimentally derived (though the "experiments" were rough/limited). These sizes avoided OOMs and provided reasonable performance on both V100 and H100 machines for both 1- and 8-GPU execution.

@TomAugspurger
Copy link
Copy Markdown
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 622af40 into rapidsai:branch-25.08 Jul 9, 2025
91 checks passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in cuDF Python Jul 9, 2025
TomAugspurger added a commit to TomAugspurger/pygdf that referenced this pull request Jul 14, 2025
This updates our configuration handling to enable setting the default
value through environment variables for ~all of our configuration
options.

Follow-up to rapidsai#19316.

Closes rapidsai#19330
TomAugspurger added a commit to TomAugspurger/pygdf that referenced this pull request Jul 14, 2025
This updates our configuration handling to enable setting the default
value through environment variables for ~all of our configuration
options.

Follow-up to rapidsai#19316.

Closes rapidsai#19330
TomAugspurger added a commit to TomAugspurger/pygdf that referenced this pull request Jul 14, 2025
This updates our configuration handling to enable setting the default
value through environment variables for ~all of our configuration
options.

Follow-up to rapidsai#19316.

Closes rapidsai#19330
rapids-bot bot pushed a commit that referenced this pull request Jul 16, 2025
This updates our configuration handling to enable setting the default value through environment variables for ~all of our configuration options, rather than just specific ones like `target_partition_size_default`.

Follow-up to #19316.

Closes #19330

Authors:
  - Tom Augspurger (https://github.com/TomAugspurger)

Approvers:
  - James Lamb (https://github.com/jameslamb)
  - Richard (Rick) Zamora (https://github.com/rjzamora)

URL: #19369
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cudf-polars Issues specific to cudf-polars improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

4 participants