Allow setting StreamingExecutor.target_partition_size with an environment variable#19316
Merged
rapids-bot[bot] merged 4 commits intorapidsai:branch-25.08from Jul 9, 2025
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
8b5921c to
5102809
Compare
…partition-size-from-env
StreamingExecutor.target_partition_size with an environment variable
rjzamora
reviewed
Jul 9, 2025
Matt711
approved these changes
Jul 9, 2025
| of the device memory, where the fraction depends on the scheduler: | ||
|
|
||
| - distributed: 1/40th of the device memory | ||
| - synchronous: 1/16th of the device memory |
Contributor
There was a problem hiding this comment.
What this fraction experimentally derived? Eg. Run pdsh with different partition sizes
Member
There was a problem hiding this comment.
Yes, they were experimentally derived (though the "experiments" were rough/limited). These sizes avoided OOMs and provided reasonable performance on both V100 and H100 machines for both 1- and 8-GPU execution.
…partition-size-from-env
Contributor
Author
|
/merge |
TomAugspurger
added a commit
to TomAugspurger/pygdf
that referenced
this pull request
Jul 14, 2025
This updates our configuration handling to enable setting the default value through environment variables for ~all of our configuration options. Follow-up to rapidsai#19316. Closes rapidsai#19330
TomAugspurger
added a commit
to TomAugspurger/pygdf
that referenced
this pull request
Jul 14, 2025
This updates our configuration handling to enable setting the default value through environment variables for ~all of our configuration options. Follow-up to rapidsai#19316. Closes rapidsai#19330
TomAugspurger
added a commit
to TomAugspurger/pygdf
that referenced
this pull request
Jul 14, 2025
This updates our configuration handling to enable setting the default value through environment variables for ~all of our configuration options. Follow-up to rapidsai#19316. Closes rapidsai#19330
rapids-bot bot
pushed a commit
that referenced
this pull request
Jul 16, 2025
This updates our configuration handling to enable setting the default value through environment variables for ~all of our configuration options, rather than just specific ones like `target_partition_size_default`. Follow-up to #19316. Closes #19330 Authors: - Tom Augspurger (https://github.com/TomAugspurger) Approvers: - James Lamb (https://github.com/jameslamb) - Richard (Rick) Zamora (https://github.com/rjzamora) URL: #19369
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This provides a way to set the default
target_partition_sizefor the cudf-polars streaming executor with an environment variable. The default behavior is unchanged: use a fraction of the device size, as reported by pynvml, with a warning if pynvml can't be found. Setting it via the environment is useful for usage through libraries like Narwhals, where AFAICT there isn't a way to pass through arguments to the engine.