Change References to state.actors() to ray.util.state.list_actors or ray.util.state.get_actor#56733
Change References to state.actors() to ray.util.state.list_actors or ray.util.state.get_actor#56733edoakes merged 17 commits intoray-project:masterfrom
state.actors() to ray.util.state.list_actors or ray.util.state.get_actor#56733Conversation
There was a problem hiding this comment.
Code Review
This pull request successfully moves the actors function from ray._private.state to ray._common.state to make it a more general utility. The changes are mostly straightforward replacements of the import path across the codebase, and a new test file has been added for the moved function.
My review has identified a few areas for improvement:
- A new dependency from
_commonto_privateis introduced, which goes against the stated goal of the PR. - A type hint in the new
actorsfunction can be made more specific usingtyping.Literal. - One call site in
test_global_state.pywas missed during the refactoring. - A test case in the new test file could be made clearer.
Overall, this is a good refactoring, and addressing these points will improve the design and correctness.
ebc22b7 to
987cca2
Compare
@edoakes The state API would require the dashboard to be running. Even though the dashboard is highly recommended, shouldn't we be agnostic of this functionality? Another option - that still keeps things agnostic of the dashboard running - is I can refactor state API to use the Which of the two would you prefer? |
All of the usage changed in this PR is in test files, so I see no strong reason to avoid requiring the dashboard for these. It's a common pattern to use the state API for other libraries' tests as well. |
There is a small amount of usage in the dataset (for split when hints are given) and runtime_context files as well. Will have the changes you requested posted soon though, just fixing some tests. |
Signed-off-by: Jason <jcarlson212@gmail.com>
Signed-off-by: Jason <jcarlson212@gmail.com>
Signed-off-by: Jason <jcarlson212@gmail.com>
Signed-off-by: Jason <jcarlson212@gmail.com>
Fixes the runtime context state submodule loading bug by importing the state API module as STATE at the top of the file. Uses similar safety precaution in the dataset.py file since these were the only two public API spots recently changed (not isolated to just tests). Signed-off-by: Jason <jcarlson212@gmail.com>
Signed-off-by: Jason <jcarlson212@gmail.com>
The dashboard wasn't being initialized before unit tests for shutdown_only fixtures since the params get ignored, causing some unit tests to fail. Signed-off-by: Jason <jcarlson212@gmail.com>
Signed-off-by: Jason <jcarlson212@gmail.com>
Signed-off-by: Jason <jcarlson212@gmail.com>
Signed-off-by: Jason <jcarlson212@gmail.com>
edoakes
left a comment
There was a problem hiding this comment.
Looks good. Only real concern is the change in data/dataset.py.
Given it is a change to the scheduler of Ray Data and the rest of the changes here are test-only, it would probably be wise to leave that change out of this PR and update it separately (with the Ray Data maintainers' blessing).
python/ray/data/dataset.py
Outdated
| actors_state = { | ||
| actor.actor_id: actor.node_id | ||
| for actor in list_actors( | ||
| detail=True, | ||
| limit=summarize_actors().get("cluster", {}).get("total_actors", 0) | ||
| + 100, # fetch current actors. Some staleness is assumed fine. | ||
| ) | ||
| } | ||
| return {actor: actors_state.get(actor._actor_id.hex()) for actor in actors} |
There was a problem hiding this comment.
@bveeramani can you verify this is OK to change? Looks like a pretty core change to data scheduling logic.
There was a problem hiding this comment.
Small comment for @bveeramani - even before these changes in theory the number of actors could change after the map was built, so this shouldn't change any consistency guarantees from what we had before per my understanding (except one property where we might return a number of actors that does not correspond to any discrete point in time in CPU cycles, which I don't think would be relied on... we'd instead be interpolating between CPU cycles).
There was a problem hiding this comment.
Looks like a pretty core change to data scheduling logic.
I think this logic is only used by the Dataset.split() API, which isn't a particularly important API because people use Dataset.streaming_split() instead.
can you verify this is OK to change?
This looks reasonable.
Is list_actors comparably fast to ray._private.state.actors()? Users expect dataset composition to be fast. If list_actors can be slow (many seconds), that might cause confusion/friction for users.
Other than that, some minor suggestions:
- It wasn't obvious to me what the keys and values for
actors_staterepresent based on a name. Something likeactor_hex_to_node_idmight be clearer - What happens if we remove the
+ 100? I think it's a good practice to avoid magic numbers/arbitrary constants if possible.
Signed-off-by: Jason <jcarlson212@gmail.com>
Nice. I'll make a separate PR for the dataset.py#split change after the maintainer's blessing is given 🙏 . All of the tests seem to be passing except two potential flakes: |
|
Kicked off full premerge CI run: https://buildkite.com/ray-project/premerge/builds/49795 |
|
@edoakes - this test is failing: I didn't make any changes to the runtime_context.py file (still uses old |
Likely right that it's just flaky. I kicked off CI again to see if it passes this time. If so, 🚢 |
… or `ray.util.state.get_actor` (#56733) Changes references to `_private.state.actors()` to references to `ray.util.state.list_actors` or `ray.util.state.get_actor` depending on the plurality of the fetch. This helps the `_private` submodule better serve its purpose of containing abstractions only used for core ray internals. Note that `ray.runtime_context` still uses `_private.actors` as it's a core ray internal that needs to function regardless of whether or not the dashboard server is running. ## Related issue number Fixes #53478 --------- Signed-off-by: Jason <jcarlson212@gmail.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
… or `ray.util.state.get_actor` (#56733) Changes references to `_private.state.actors()` to references to `ray.util.state.list_actors` or `ray.util.state.get_actor` depending on the plurality of the fetch. This helps the `_private` submodule better serve its purpose of containing abstractions only used for core ray internals. Note that `ray.runtime_context` still uses `_private.actors` as it's a core ray internal that needs to function regardless of whether or not the dashboard server is running. ## Related issue number Fixes #53478 --------- Signed-off-by: Jason <jcarlson212@gmail.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Signed-off-by: Douglas Strodtman <douglas@anyscale.com>
… or `ray.util.state.get_actor` (ray-project#56733) Changes references to `_private.state.actors()` to references to `ray.util.state.list_actors` or `ray.util.state.get_actor` depending on the plurality of the fetch. This helps the `_private` submodule better serve its purpose of containing abstractions only used for core ray internals. Note that `ray.runtime_context` still uses `_private.actors` as it's a core ray internal that needs to function regardless of whether or not the dashboard server is running. ## Related issue number Fixes ray-project#53478 --------- Signed-off-by: Jason <jcarlson212@gmail.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
… or `ray.util.state.get_actor` (ray-project#56733) Changes references to `_private.state.actors()` to references to `ray.util.state.list_actors` or `ray.util.state.get_actor` depending on the plurality of the fetch. This helps the `_private` submodule better serve its purpose of containing abstractions only used for core ray internals. Note that `ray.runtime_context` still uses `_private.actors` as it's a core ray internal that needs to function regardless of whether or not the dashboard server is running. ## Related issue number Fixes ray-project#53478 --------- Signed-off-by: Jason <jcarlson212@gmail.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
… or `ray.util.state.get_actor` (ray-project#56733) Changes references to `_private.state.actors()` to references to `ray.util.state.list_actors` or `ray.util.state.get_actor` depending on the plurality of the fetch. This helps the `_private` submodule better serve its purpose of containing abstractions only used for core ray internals. Note that `ray.runtime_context` still uses `_private.actors` as it's a core ray internal that needs to function regardless of whether or not the dashboard server is running. ## Related issue number Fixes ray-project#53478 --------- Signed-off-by: Jason <jcarlson212@gmail.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Signed-off-by: Future-Outlier <eric901201@gmail.com>
Why are these changes needed?
Changes references to
_private.state.actors()to references toray.util.state.list_actorsorray.util.state.get_actordepending on the plurality of the fetch. This helps the_privatesubmodule better serve its purpose of containing abstractions only used for core ray internals. Note thatray.runtime_contextstill uses_private.actorsas it's a core ray internal that needs to function regardless of whether or not the dashboard server is running.Related issue number
Fixes #53478
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.