Skip to content

Change References to state.actors() to ray.util.state.list_actors or ray.util.state.get_actor#56733

Merged
edoakes merged 17 commits intoray-project:masterfrom
jcarlson212:master
Sep 25, 2025
Merged

Change References to state.actors() to ray.util.state.list_actors or ray.util.state.get_actor#56733
edoakes merged 17 commits intoray-project:masterfrom
jcarlson212:master

Conversation

@jcarlson212
Copy link
Contributor

@jcarlson212 jcarlson212 commented Sep 19, 2025

Why are these changes needed?

Changes references to _private.state.actors() to references to ray.util.state.list_actors or ray.util.state.get_actor depending on the plurality of the fetch. This helps the _private submodule better serve its purpose of containing abstractions only used for core ray internals. Note that ray.runtime_context still uses _private.actors as it's a core ray internal that needs to function regardless of whether or not the dashboard server is running.

Related issue number

Fixes #53478

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@jcarlson212 jcarlson212 requested review from a team, edoakes and jjyao as code owners September 19, 2025 06:38
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request successfully moves the actors function from ray._private.state to ray._common.state to make it a more general utility. The changes are mostly straightforward replacements of the import path across the codebase, and a new test file has been added for the moved function.

My review has identified a few areas for improvement:

  • A new dependency from _common to _private is introduced, which goes against the stated goal of the PR.
  • A type hint in the new actors function can be made more specific using typing.Literal.
  • One call site in test_global_state.py was missed during the refactoring.
  • A test case in the new test file could be made clearer.

Overall, this is a good refactoring, and addressing these points will improve the design and correctness.

@jcarlson212 jcarlson212 force-pushed the master branch 3 times, most recently from ebc22b7 to 987cca2 Compare September 19, 2025 06:54
@ray-gardener ray-gardener bot added docs An issue or change related to documentation core Issues that should be addressed in Ray Core community-contribution Contributed by the community labels Sep 19, 2025
Copy link
Collaborator

@edoakes edoakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should move this one actually -- instead we should replace usage of it with the state API

@jcarlson212
Copy link
Contributor Author

jcarlson212 commented Sep 19, 2025

state API

@edoakes The state API would require the dashboard to be running. Even though the dashboard is highly recommended, shouldn't we be agnostic of this functionality? ray._private.state.actors() already is agnostic of it despite not being intended for external usage, making it a decent option for being migrated to _common and referenced instead by other modules.

Another option - that still keeps things agnostic of the dashboard running - is I can refactor state API to use the ray._private.state.actors method as a fallback and then make other modules reference the state API directly. This will still make modules outside of core have some coupling to the _private.state.actors functionality but it's a fairly centralized coupling as it's mostly the state API.

Which of the two would you prefer?

@edoakes
Copy link
Collaborator

edoakes commented Sep 19, 2025

@edoakes The state API would require the dashboard to be running. Even though the dashboard is highly recommended, shouldn't we be agnostic of this functionality? ray._private.state.actors() already is agnostic of it despite not being intended for external usage, making it a decent option for being migrated to _common and referenced instead by other modules.

All of the usage changed in this PR is in test files, so I see no strong reason to avoid requiring the dashboard for these. It's a common pattern to use the state API for other libraries' tests as well.

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

@jcarlson212
Copy link
Contributor Author

jcarlson212 commented Sep 22, 2025

@edoakes The state API would require the dashboard to be running. Even though the dashboard is highly recommended, shouldn't we be agnostic of this functionality? ray._private.state.actors() already is agnostic of it despite not being intended for external usage, making it a decent option for being migrated to _common and referenced instead by other modules.

All of the usage changed in this PR is in test files, so I see no strong reason to avoid requiring the dashboard for these. It's a common pattern to use the state API for other libraries' tests as well.

There is a small amount of usage in the dataset (for split when hints are given) and runtime_context files as well. Will have the changes you requested posted soon though, just fixing some tests.

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

Signed-off-by: Jason <jcarlson212@gmail.com>
Signed-off-by: Jason <jcarlson212@gmail.com>
Signed-off-by: Jason <jcarlson212@gmail.com>
Signed-off-by: Jason <jcarlson212@gmail.com>
Fixes the runtime context state submodule loading bug by importing
the state API module as STATE at the top of the file. Uses similar safety precaution in
the dataset.py file since these were the only two public API spots recently changed (not isolated to
just tests).

Signed-off-by: Jason <jcarlson212@gmail.com>
Signed-off-by: Jason <jcarlson212@gmail.com>
The dashboard wasn't being initialized before unit tests for shutdown_only fixtures
since the params get ignored, causing some unit tests to fail.

Signed-off-by: Jason <jcarlson212@gmail.com>
Signed-off-by: Jason <jcarlson212@gmail.com>
Signed-off-by: Jason <jcarlson212@gmail.com>
Signed-off-by: Jason <jcarlson212@gmail.com>
cursor[bot]

This comment was marked as outdated.

Copy link
Collaborator

@edoakes edoakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Only real concern is the change in data/dataset.py.

Given it is a change to the scheduler of Ray Data and the rest of the changes here are test-only, it would probably be wise to leave that change out of this PR and update it separately (with the Ray Data maintainers' blessing).

Comment on lines +2090 to +2098
actors_state = {
actor.actor_id: actor.node_id
for actor in list_actors(
detail=True,
limit=summarize_actors().get("cluster", {}).get("total_actors", 0)
+ 100, # fetch current actors. Some staleness is assumed fine.
)
}
return {actor: actors_state.get(actor._actor_id.hex()) for actor in actors}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bveeramani can you verify this is OK to change? Looks like a pretty core change to data scheduling logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small comment for @bveeramani - even before these changes in theory the number of actors could change after the map was built, so this shouldn't change any consistency guarantees from what we had before per my understanding (except one property where we might return a number of actors that does not correspond to any discrete point in time in CPU cycles, which I don't think would be relied on... we'd instead be interpolating between CPU cycles).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a pretty core change to data scheduling logic.

I think this logic is only used by the Dataset.split() API, which isn't a particularly important API because people use Dataset.streaming_split() instead.

can you verify this is OK to change?

This looks reasonable.

Is list_actors comparably fast to ray._private.state.actors()? Users expect dataset composition to be fast. If list_actors can be slow (many seconds), that might cause confusion/friction for users.

Other than that, some minor suggestions:

  • It wasn't obvious to me what the keys and values for actors_state represent based on a name. Something like actor_hex_to_node_id might be clearer
  • What happens if we remove the + 100? I think it's a good practice to avoid magic numbers/arbitrary constants if possible.

Signed-off-by: Jason <jcarlson212@gmail.com>
@jcarlson212
Copy link
Contributor Author

Looks good. Only real concern is the change in data/dataset.py.

Given it is a change to the scheduler of Ray Data and the rest of the changes here are test-only, it would probably be wise to leave that change out of this PR and update it separately (with the Ray Data maintainers' blessing).

Nice. I'll make a separate PR for the dataset.py#split change after the maintainer's blessing is given 🙏 . All of the tests seem to be passing except two potential flakes: python/ray/tests/test_failure.py::test_update_object_location_batch_failure[request] and python/ray/tests/test_global_gc.py::test_local_gc_called_once_per_interval

@edoakes edoakes added the go add ONLY when ready to merge, run all tests label Sep 24, 2025
@edoakes
Copy link
Collaborator

edoakes commented Sep 24, 2025

Kicked off full premerge CI run: https://buildkite.com/ray-project/premerge/builds/49795

@jcarlson212
Copy link
Contributor Author

jcarlson212 commented Sep 25, 2025

@edoakes - this test is failing: python/ray/train/v2/tests/test_jax_trainer.py::test_minimal_multihost. Looks like the temporary file being used by the runtime env is being cleaned up by accident:

[2025-09-24T20:56:15Z] E                       FileNotFoundError: [Errno 2] No such file or directory: '/tmp/ray/session_2025-09-24_20-56-05_216815_27031/runtime_resources/pip/d4fac6bfd8715b366a57915afa0179ca0e4906be/exec_cwd

I didn't make any changes to the runtime_context.py file (still uses old _private.state.actors function), so it seems unrelated to these changes. The CI dashboard shows it failing in 90% of builds with 10% achieving a hard failure: https://flaky-tests.ray.io/#showAll=true.

@edoakes
Copy link
Collaborator

edoakes commented Sep 25, 2025

I didn't make any changes to the runtime_context.py file (still uses old _private.state.actors function), so it seems unrelated to these changes. The CI dashboard shows it failing in 90% of builds with 10% achieving a hard failure: https://flaky-tests.ray.io/#showAll=true.

Likely right that it's just flaky. I kicked off CI again to see if it passes this time. If so, 🚢

@edoakes edoakes enabled auto-merge (squash) September 25, 2025 17:14
@edoakes edoakes disabled auto-merge September 25, 2025 22:11
@edoakes edoakes merged commit da9aaf0 into ray-project:master Sep 25, 2025
7 checks passed
elliot-barn pushed a commit that referenced this pull request Sep 27, 2025
… or `ray.util.state.get_actor` (#56733)

Changes references to `_private.state.actors()` to references to
`ray.util.state.list_actors` or `ray.util.state.get_actor` depending on
the plurality of the fetch. This helps the `_private` submodule better
serve its purpose of containing abstractions only used for core ray
internals. Note that `ray.runtime_context` still uses `_private.actors`
as it's a core ray internal that needs to function regardless of whether
or not the dashboard server is running.

## Related issue number

Fixes #53478 

---------

Signed-off-by: Jason <jcarlson212@gmail.com>
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
dstrodtman pushed a commit that referenced this pull request Oct 6, 2025
… or `ray.util.state.get_actor` (#56733)

Changes references to `_private.state.actors()` to references to
`ray.util.state.list_actors` or `ray.util.state.get_actor` depending on
the plurality of the fetch. This helps the `_private` submodule better
serve its purpose of containing abstractions only used for core ray
internals. Note that `ray.runtime_context` still uses `_private.actors`
as it's a core ray internal that needs to function regardless of whether
or not the dashboard server is running.

## Related issue number

Fixes #53478

---------

Signed-off-by: Jason <jcarlson212@gmail.com>
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Signed-off-by: Douglas Strodtman <douglas@anyscale.com>
justinyeh1995 pushed a commit to justinyeh1995/ray that referenced this pull request Oct 20, 2025
… or `ray.util.state.get_actor` (ray-project#56733)

Changes references to `_private.state.actors()` to references to
`ray.util.state.list_actors` or `ray.util.state.get_actor` depending on
the plurality of the fetch. This helps the `_private` submodule better
serve its purpose of containing abstractions only used for core ray
internals. Note that `ray.runtime_context` still uses `_private.actors`
as it's a core ray internal that needs to function regardless of whether
or not the dashboard server is running.

## Related issue number

Fixes ray-project#53478 

---------

Signed-off-by: Jason <jcarlson212@gmail.com>
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
… or `ray.util.state.get_actor` (ray-project#56733)

Changes references to `_private.state.actors()` to references to
`ray.util.state.list_actors` or `ray.util.state.get_actor` depending on
the plurality of the fetch. This helps the `_private` submodule better
serve its purpose of containing abstractions only used for core ray
internals. Note that `ray.runtime_context` still uses `_private.actors`
as it's a core ray internal that needs to function regardless of whether
or not the dashboard server is running.

## Related issue number

Fixes ray-project#53478 

---------

Signed-off-by: Jason <jcarlson212@gmail.com>
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Future-Outlier pushed a commit to Future-Outlier/ray that referenced this pull request Dec 7, 2025
… or `ray.util.state.get_actor` (ray-project#56733)

Changes references to `_private.state.actors()` to references to
`ray.util.state.list_actors` or `ray.util.state.get_actor` depending on
the plurality of the fetch. This helps the `_private` submodule better
serve its purpose of containing abstractions only used for core ray
internals. Note that `ray.runtime_context` still uses `_private.actors`
as it's a core ray internal that needs to function regardless of whether
or not the dashboard server is running.

## Related issue number

Fixes ray-project#53478

---------

Signed-off-by: Jason <jcarlson212@gmail.com>
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Signed-off-by: Future-Outlier <eric901201@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community core Issues that should be addressed in Ray Core docs An issue or change related to documentation go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[core|serve] Migrate shared utilities from ray._private to ray._common

3 participants