Change References to `state.actors()` to `ray.util.state.list_actors` or `ray.util.state.get_actor` by jcarlson212 · Pull Request #56733 · ray-project/ray

jcarlson212 · 2025-09-19T06:38:53Z

Why are these changes needed?

Changes references to _private.state.actors() to references to ray.util.state.list_actors or ray.util.state.get_actor depending on the plurality of the fetch. This helps the _private submodule better serve its purpose of containing abstractions only used for core ray internals. Note that ray.runtime_context still uses _private.actors as it's a core ray internal that needs to function regardless of whether or not the dashboard server is running.

Related issue number

Fixes #53478

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

gemini-code-assist

Code Review

This pull request successfully moves the actors function from ray._private.state to ray._common.state to make it a more general utility. The changes are mostly straightforward replacements of the import path across the codebase, and a new test file has been added for the moved function.

My review has identified a few areas for improvement:

A new dependency from _common to _private is introduced, which goes against the stated goal of the PR.
A type hint in the new actors function can be made more specific using typing.Literal.
One call site in test_global_state.py was missed during the refactoring.
A test case in the new test file could be made clearer.

Overall, this is a good refactoring, and addressing these points will improve the design and correctness.

python/ray/_common/state.py

python/ray/tests/test_global_state.py

python/ray/_common/state.py

python/ray/tests/test_state_internal.py

edoakes

I don't think we should move this one actually -- instead we should replace usage of it with the state API

jcarlson212 · 2025-09-19T18:44:07Z

state API

@edoakes The state API would require the dashboard to be running. Even though the dashboard is highly recommended, shouldn't we be agnostic of this functionality? ray._private.state.actors() already is agnostic of it despite not being intended for external usage, making it a decent option for being migrated to _common and referenced instead by other modules.

Another option - that still keeps things agnostic of the dashboard running - is I can refactor state API to use the ray._private.state.actors method as a fallback and then make other modules reference the state API directly. This will still make modules outside of core have some coupling to the _private.state.actors functionality but it's a fairly centralized coupling as it's mostly the state API.

Which of the two would you prefer?

edoakes · 2025-09-19T20:51:22Z

@edoakes The state API would require the dashboard to be running. Even though the dashboard is highly recommended, shouldn't we be agnostic of this functionality? ray._private.state.actors() already is agnostic of it despite not being intended for external usage, making it a decent option for being migrated to _common and referenced instead by other modules.

All of the usage changed in this PR is in test files, so I see no strong reason to avoid requiring the dashboard for these. It's a common pattern to use the state API for other libraries' tests as well.

jcarlson212 · 2025-09-22T04:12:37Z

@edoakes The state API would require the dashboard to be running. Even though the dashboard is highly recommended, shouldn't we be agnostic of this functionality? ray._private.state.actors() already is agnostic of it despite not being intended for external usage, making it a decent option for being migrated to _common and referenced instead by other modules.

All of the usage changed in this PR is in test files, so I see no strong reason to avoid requiring the dashboard for these. It's a common pattern to use the state API for other libraries' tests as well.

There is a small amount of usage in the dataset (for split when hints are given) and runtime_context files as well. Will have the changes you requested posted soon though, just fixing some tests.

python/ray/_private/test_utils.py

python/ray/runtime_context.py

python/ray/tests/conftest.py

python/ray/tests/test_asyncio.py

python/ray/runtime_context.py

python/ray/data/dataset.py

Signed-off-by: Jason <jcarlson212@gmail.com>

Fixes the runtime context state submodule loading bug by importing the state API module as STATE at the top of the file. Uses similar safety precaution in the dataset.py file since these were the only two public API spots recently changed (not isolated to just tests). Signed-off-by: Jason <jcarlson212@gmail.com>

Signed-off-by: Jason <jcarlson212@gmail.com>

The dashboard wasn't being initialized before unit tests for shutdown_only fixtures since the params get ignored, causing some unit tests to fail. Signed-off-by: Jason <jcarlson212@gmail.com>

Signed-off-by: Jason <jcarlson212@gmail.com>

edoakes

Looks good. Only real concern is the change in data/dataset.py.

Given it is a change to the scheduler of Ray Data and the rest of the changes here are test-only, it would probably be wise to leave that change out of this PR and update it separately (with the Ray Data maintainers' blessing).

python/ray/_private/state.py

edoakes · 2025-09-24T13:42:21Z

python/ray/data/dataset.py

+            actors_state = {
+                actor.actor_id: actor.node_id
+                for actor in list_actors(
+                    detail=True,
+                    limit=summarize_actors().get("cluster", {}).get("total_actors", 0)
+                    + 100,  # fetch current actors. Some staleness is assumed fine.
+                )
            }
+            return {actor: actors_state.get(actor._actor_id.hex()) for actor in actors}


@bveeramani can you verify this is OK to change? Looks like a pretty core change to data scheduling logic.

Small comment for @bveeramani - even before these changes in theory the number of actors could change after the map was built, so this shouldn't change any consistency guarantees from what we had before per my understanding (except one property where we might return a number of actors that does not correspond to any discrete point in time in CPU cycles, which I don't think would be relied on... we'd instead be interpolating between CPU cycles).

Looks like a pretty core change to data scheduling logic.

I think this logic is only used by the Dataset.split() API, which isn't a particularly important API because people use Dataset.streaming_split() instead.

can you verify this is OK to change?

This looks reasonable.

Is list_actors comparably fast to ray._private.state.actors()? Users expect dataset composition to be fast. If list_actors can be slow (many seconds), that might cause confusion/friction for users.

Other than that, some minor suggestions:

It wasn't obvious to me what the keys and values for actors_state represent based on a name. Something like actor_hex_to_node_id might be clearer

What happens if we remove the + 100? I think it's a good practice to avoid magic numbers/arbitrary constants if possible.

python/ray/tests/test_actor.py

python/ray/tests/conftest.py

Signed-off-by: Jason <jcarlson212@gmail.com>

jcarlson212 · 2025-09-24T19:23:33Z

Looks good. Only real concern is the change in data/dataset.py.

Given it is a change to the scheduler of Ray Data and the rest of the changes here are test-only, it would probably be wise to leave that change out of this PR and update it separately (with the Ray Data maintainers' blessing).

Nice. I'll make a separate PR for the dataset.py#split change after the maintainer's blessing is given 🙏 . All of the tests seem to be passing except two potential flakes: python/ray/tests/test_failure.py::test_update_object_location_batch_failure[request] and python/ray/tests/test_global_gc.py::test_local_gc_called_once_per_interval

edoakes · 2025-09-24T19:46:03Z

Kicked off full premerge CI run: https://buildkite.com/ray-project/premerge/builds/49795

jcarlson212 · 2025-09-25T16:11:13Z

@edoakes - this test is failing: python/ray/train/v2/tests/test_jax_trainer.py::test_minimal_multihost. Looks like the temporary file being used by the runtime env is being cleaned up by accident:

[2025-09-24T20:56:15Z] E                       FileNotFoundError: [Errno 2] No such file or directory: '/tmp/ray/session_2025-09-24_20-56-05_216815_27031/runtime_resources/pip/d4fac6bfd8715b366a57915afa0179ca0e4906be/exec_cwd

I didn't make any changes to the runtime_context.py file (still uses old _private.state.actors function), so it seems unrelated to these changes. The CI dashboard shows it failing in 90% of builds with 10% achieving a hard failure: https://flaky-tests.ray.io/#showAll=true.

edoakes · 2025-09-25T17:14:27Z

I didn't make any changes to the runtime_context.py file (still uses old _private.state.actors function), so it seems unrelated to these changes. The CI dashboard shows it failing in 90% of builds with 10% achieving a hard failure: https://flaky-tests.ray.io/#showAll=true.

Likely right that it's just flaky. I kicked off CI again to see if it passes this time. If so, 🚢

python/ray/serve/tests/test_logging.py

… or `ray.util.state.get_actor` (#56733) Changes references to `_private.state.actors()` to references to `ray.util.state.list_actors` or `ray.util.state.get_actor` depending on the plurality of the fetch. This helps the `_private` submodule better serve its purpose of containing abstractions only used for core ray internals. Note that `ray.runtime_context` still uses `_private.actors` as it's a core ray internal that needs to function regardless of whether or not the dashboard server is running. ## Related issue number Fixes #53478 --------- Signed-off-by: Jason <jcarlson212@gmail.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>

… or `ray.util.state.get_actor` (#56733) Changes references to `_private.state.actors()` to references to `ray.util.state.list_actors` or `ray.util.state.get_actor` depending on the plurality of the fetch. This helps the `_private` submodule better serve its purpose of containing abstractions only used for core ray internals. Note that `ray.runtime_context` still uses `_private.actors` as it's a core ray internal that needs to function regardless of whether or not the dashboard server is running. ## Related issue number Fixes #53478 --------- Signed-off-by: Jason <jcarlson212@gmail.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Signed-off-by: Douglas Strodtman <douglas@anyscale.com>

… or `ray.util.state.get_actor` (ray-project#56733) Changes references to `_private.state.actors()` to references to `ray.util.state.list_actors` or `ray.util.state.get_actor` depending on the plurality of the fetch. This helps the `_private` submodule better serve its purpose of containing abstractions only used for core ray internals. Note that `ray.runtime_context` still uses `_private.actors` as it's a core ray internal that needs to function regardless of whether or not the dashboard server is running. ## Related issue number Fixes ray-project#53478 --------- Signed-off-by: Jason <jcarlson212@gmail.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>

… or `ray.util.state.get_actor` (ray-project#56733) Changes references to `_private.state.actors()` to references to `ray.util.state.list_actors` or `ray.util.state.get_actor` depending on the plurality of the fetch. This helps the `_private` submodule better serve its purpose of containing abstractions only used for core ray internals. Note that `ray.runtime_context` still uses `_private.actors` as it's a core ray internal that needs to function regardless of whether or not the dashboard server is running. ## Related issue number Fixes ray-project#53478 --------- Signed-off-by: Jason <jcarlson212@gmail.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Signed-off-by: Future-Outlier <eric901201@gmail.com>

jcarlson212 requested review from a team, edoakes and jjyao as code owners September 19, 2025 06:38

gemini-code-assist bot reviewed Sep 19, 2025

View reviewed changes

python/ray/_common/state.py Outdated Show resolved Hide resolved

python/ray/tests/test_global_state.py Outdated Show resolved Hide resolved

python/ray/_common/state.py Outdated Show resolved Hide resolved

python/ray/tests/test_state_internal.py Outdated Show resolved Hide resolved

jcarlson212 force-pushed the master branch 3 times, most recently from ebc22b7 to 987cca2 Compare September 19, 2025 06:54

ray-gardener bot added docs An issue or change related to documentation core Issues that should be addressed in Ray Core community-contribution Contributed by the community labels Sep 19, 2025

edoakes requested changes Sep 19, 2025

View reviewed changes

This comment was marked as outdated.

Sign in to view

jcarlson212 force-pushed the master branch from 4df9499 to 9c5b93b Compare September 22, 2025 03:42

This comment was marked as outdated.

Sign in to view

jcarlson212 force-pushed the master branch from d0614df to 67a0fc2 Compare September 22, 2025 05:04

This comment was marked as outdated.

Sign in to view

jcarlson212 force-pushed the master branch from 9c71c36 to babba0f Compare September 22, 2025 16:18

This comment was marked as outdated.

Sign in to view

edoakes reviewed Sep 22, 2025

View reviewed changes

python/ray/_private/test_utils.py Outdated Show resolved Hide resolved

python/ray/runtime_context.py Show resolved Hide resolved

python/ray/tests/conftest.py Outdated Show resolved Hide resolved

python/ray/tests/test_asyncio.py Outdated Show resolved Hide resolved

jcarlson212 force-pushed the master branch from 8880c09 to ad6d235 Compare September 23, 2025 04:38

edoakes reviewed Sep 23, 2025

View reviewed changes

python/ray/runtime_context.py Outdated Show resolved Hide resolved

edoakes reviewed Sep 23, 2025

View reviewed changes

python/ray/data/dataset.py Outdated Show resolved Hide resolved

jcarlson212 force-pushed the master branch from 6da024d to ad35d67 Compare September 23, 2025 14:17

jcarlson212 requested a review from a team as a code owner September 23, 2025 14:17

jcarlson212 added 10 commits September 23, 2025 22:23

Add detail=True for NodeId and NumRestarts

9b8867a

Signed-off-by: Jason <jcarlson212@gmail.com>

Override Ruff for Module Import

30ac5de

Signed-off-by: Jason <jcarlson212@gmail.com>

Resolve Remaining Import State Module Errors on Remote Workers

3829392

Signed-off-by: Jason <jcarlson212@gmail.com>

Improve Import Hygiene and Cut RuntimeContext DepOn Dashboard

67427df

Signed-off-by: Jason <jcarlson212@gmail.com>

Remove State Import Aliases

5b6e94b

Signed-off-by: Jason <jcarlson212@gmail.com>

Fix Shutdown Only Dashboard Initialization Bug

1d7ba7e

The dashboard wasn't being initialized before unit tests for shutdown_only fixtures since the params get ignored, causing some unit tests to fail. Signed-off-by: Jason <jcarlson212@gmail.com>

Add Back _private Actors Timestamps Unit Test

63445c9

Signed-off-by: Jason <jcarlson212@gmail.com>

Add Ray Dashboard Init To Failing Actor Tests

2fa9774

Signed-off-by: Jason <jcarlson212@gmail.com>

Add Dashboard to Fix More Tests

d4211fa

Signed-off-by: Jason <jcarlson212@gmail.com>

jcarlson212 force-pushed the master branch from e95b764 to d4211fa Compare September 24, 2025 02:23

This comment was marked as outdated.

Sign in to view

edoakes reviewed Sep 24, 2025

View reviewed changes

Remove Redundant Imports, Dataset.py Ref Revert, Improve Hygeine

8ed77e9

Signed-off-by: Jason <jcarlson212@gmail.com>

edoakes added the go add ONLY when ready to merge, run all tests label Sep 24, 2025

Merge branch 'master' into master

e8a10c8

edoakes enabled auto-merge (squash) September 25, 2025 17:14

cursor bot reviewed Sep 25, 2025

View reviewed changes

python/ray/serve/tests/test_logging.py Show resolved Hide resolved

edoakes approved these changes Sep 25, 2025

View reviewed changes

edoakes disabled auto-merge September 25, 2025 22:11

edoakes merged commit da9aaf0 into ray-project:master Sep 25, 2025
7 checks passed

Conversation

jcarlson212 commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why are these changes needed?

Related issue number

Checks

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

edoakes left a comment

Choose a reason for hiding this comment

Uh oh!

jcarlson212 commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

edoakes commented Sep 19, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

jcarlson212 commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

edoakes left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

edoakes Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

jcarlson212 Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

bveeramani Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jcarlson212 commented Sep 24, 2025

Uh oh!

edoakes commented Sep 24, 2025

Uh oh!

jcarlson212 commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

edoakes commented Sep 25, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jcarlson212 commented Sep 19, 2025 •

edited

Loading

jcarlson212 commented Sep 19, 2025 •

edited

Loading

jcarlson212 commented Sep 22, 2025 •

edited

Loading

jcarlson212 commented Sep 25, 2025 •

edited

Loading