[data] Default return 0s for object locality instead of -1s by iamjustinhsu · Pull Request #58754 · ray-project/ray

iamjustinhsu · 2025-11-18T20:21:25Z

Description

Previously, if DataContext.get_current().enable_get_object_locations_for_metrics=False (which it is by default), then we would default return (-1, -1, -1). This wasn't properly being handled, so we would get negative metrics. This PR addresses that

This PR also fixes run_index=-1 for streaming split. For all iterators but streaming split:

create executor with dataset_tag from 2)
increment dataset_tag
Get dataset_tag (dataset_-1)

However, streaming_split skips step 2. This PR addreses that.

Related issues

Additional information

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

gemini-code-assist

Code Review

This pull request correctly addresses an issue where negative metrics were being reported when object locality tracking was disabled. The fix involves checking for the sentinel value (-1, -1, -1) and skipping the metrics aggregation, which is the right approach. Additionally, the new log message that appears once is a helpful diagnostic for users, and the updated docstring clarifies the purpose of the configuration flag. I have one minor suggestion to simplify the check for the sentinel value.

python/ray/data/_internal/block_batching/util.py

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

python/ray/data/_internal/block_batching/util.py

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

iamjustinhsu · 2025-11-18T20:26:28Z

python/ray/data/_internal/block_batching/util.py

+            "Set `DataContext.get_current().enable_get_object_locations_for_metrics=True` to enable this."
+        )

-    return -1, -1, -1


(-1, -1, -1) wasn't really being used, so defaulting to 0s instead.

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

cursor · 2025-11-19T01:18:19Z

python/ray/data/tests/test_stats.py


    assert final_stats == ds_stats
-    assert f"dataset_{ds._uuid}_0" == update_fn.call_args_list[-1].args[1]
+    assert f"dataset_{ds._uuid}_1" == update_fn.call_args_list[-1].args[1]


Bug: Incorrect test expectation after run index change

The test expects the dataset ID to be "dataset_{uuid}_1" after calling take_all(), but with the new logic where _run_index starts at 0 and is incremented after creating the executor, the first execution uses "dataset_{uuid}_0". The executor is created with get_dataset_id() which returns the current _run_index value before incrementing it. Since _run_index starts at 0, the first execution uses index 0, not 1. The test assertion should be checking for "dataset_{uuid}_0" instead of "dataset_{uuid}_1".

bveeramani

@iamjustinhsu Does this prevent the issue where you get -1000 blocks, like in the "Iteration Blocks Local" chart? If so, how?

bveeramani · 2025-11-19T02:52:17Z

python/ray/data/_internal/block_batching/util.py

+    elif log_once("enable_get_object_locations_for_metrics"):
+        logger.info(
+            "Tracking the number of object cache hits and misses is currently disabled. "
+            "Set `DataContext.get_current().enable_get_object_locations_for_metrics=True` to enable this."
+        )


Since enable_get_object_locations_for_metrics is disabled by default, we'll also emit this warning by default too, right?

If the metric is niche, it might be better to surface it only when the feature is explicitly relevant. Otherwise, the warning might clutter logs or confuse users.

This feature is only used when they do iter_batches(), iter_rows, iterations etc..., which I don't think is exclusively train specific (these are on the data dashboard). I can remove this info statement alltogether, or change it to a debug statement and specify that iteration metrics (data dashboard) are not being tracked. Thoughts?

Does this prevent the issue where you get -1000 blocks, like in the "Iteration Blocks Local" chart? If so, how?

It does because before, we were returning -1 when this feature is disabled. Then we were doing some blind logic to always add the # of cache hits, # of cache misses, and unknowns, even if it was -1. Here is a local run i did:

the first 2 runs I set it to False, the 3rd (in blue) i set it to true.

IMO this might confuse users. From the log message, I don't think it's clear whether you should enable this, and what the tradeoffs are (I think there are performance implications?)

In the majority of cases, do you want to enable this? If so, I think we should by default. If not, I don't think we need to emit a log.

Also, I think we log this once per train worker, and I think that could become noisy.

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

…/handle-disabled-object-locality-stats

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

bveeramani

One comment abut the log, but otherwise LGTM

bveeramani · 2025-11-24T19:20:14Z

python/ray/data/_internal/block_batching/util.py

+    elif log_once("enable_get_object_locations_for_metrics"):
+        logger.info(
+            "Tracking the number of object cache hits and misses is currently disabled. "
+            "Set `DataContext.get_current().enable_get_object_locations_for_metrics=True` to enable this."
+        )


IMO this might confuse users. From the log message, I don't think it's clear whether you should enable this, and what the tradeoffs are (I think there are performance implications?)

In the majority of cases, do you want to enable this? If so, I think we should by default. If not, I don't think we need to emit a log.

Also, I think we log this once per train worker, and I think that could become noisy.

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

…ect#58754) ## Description Previously, if `DataContext.get_current().enable_get_object_locations_for_metrics=False` (which it is by default), then we would default return `(-1, -1, -1)`. This wasn't properly being handled, so we would get negative metrics. This PR addresses that This PR also fixes run_index=-1 for **streaming split**. For all iterators but streaming split: 1. create executor with `dataset_tag` from 2) 2. increment `dataset_tag` 3. Get dataset_tag (dataset_-1) However, streaming_split skips step 2. This PR addreses that. ## Related issues ## Additional information --------- Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

…ect#58754) ## Description Previously, if `DataContext.get_current().enable_get_object_locations_for_metrics=False` (which it is by default), then we would default return `(-1, -1, -1)`. This wasn't properly being handled, so we would get negative metrics. This PR addresses that This PR also fixes run_index=-1 for **streaming split**. For all iterators but streaming split: 1. create executor with `dataset_tag` from 2) 2. increment `dataset_tag` 3. Get dataset_tag (dataset_-1) However, streaming_split skips step 2. This PR addreses that. ## Related issues ## Additional information --------- Signed-off-by: iamjustinhsu <jhsu@anyscale.com> Signed-off-by: peterxcli <peterxcli@gmail.com>

[data] Skip if object locality if disabled

9c2880f

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

iamjustinhsu requested a review from a team as a code owner November 18, 2025 20:21

gemini-code-assist bot reviewed Nov 18, 2025

View reviewed changes

python/ray/data/_internal/block_batching/util.py Outdated Show resolved Hide resolved

oops

c34bb21

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

iamjustinhsu force-pushed the jhsu/handle-disabled-object-locality-stats branch from f6b259e to c34bb21 Compare November 18, 2025 20:23

cursor bot reviewed Nov 18, 2025

View reviewed changes

python/ray/data/_internal/block_batching/util.py Outdated Show resolved Hide resolved

fix

8ac18f8

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

iamjustinhsu commented Nov 18, 2025

View reviewed changes

iamjustinhsu changed the title ~~[data] Skip if object locality if disabled~~ [data] Default return 0s for object locality instead of -1s Nov 18, 2025

iamjustinhsu added 2 commits November 18, 2025 13:45

post-increment run_index

5e824a7

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

fix test

7a3242d

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

cursor bot reviewed Nov 19, 2025

View reviewed changes

ray-gardener bot added the data Ray Data-related issues label Nov 19, 2025

bveeramani reviewed Nov 19, 2025

View reviewed changes

iamjustinhsu added 3 commits November 19, 2025 09:55

fix

7e74765

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

Merge branch 'master' of https://github.com/ray-project/ray into jhsu…

fe878ff

…/handle-disabled-object-locality-stats

fix streaming split

9924ddc

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

bveeramani approved these changes Nov 24, 2025

View reviewed changes

fix

16c8bde

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

bveeramani enabled auto-merge (squash) November 25, 2025 19:48

github-actions bot added the go add ONLY when ready to merge, run all tests label Nov 25, 2025

bveeramani merged commit 8f6ce88 into ray-project:master Nov 25, 2025
8 checks passed

iamjustinhsu deleted the jhsu/handle-disabled-object-locality-stats branch November 25, 2025 20:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[data] Default return 0s for object locality instead of -1s#58754

[data] Default return 0s for object locality instead of -1s#58754
bveeramani merged 9 commits intoray-project:masterfrom
iamjustinhsu:jhsu/handle-disabled-object-locality-stats

iamjustinhsu commented Nov 18, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

iamjustinhsu Nov 18, 2025

Uh oh!

cursor bot Nov 19, 2025

Uh oh!

bveeramani left a comment

Uh oh!

bveeramani Nov 19, 2025

Uh oh!

iamjustinhsu Nov 19, 2025

Uh oh!

bveeramani Nov 24, 2025

Uh oh!

bveeramani left a comment

Uh oh!

bveeramani Nov 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

iamjustinhsu commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related issues

Additional information

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

iamjustinhsu Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

cursor bot Nov 19, 2025

Choose a reason for hiding this comment

Bug: Incorrect test expectation after run index change

Uh oh!

bveeramani left a comment

Choose a reason for hiding this comment

Uh oh!

bveeramani Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

iamjustinhsu Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

bveeramani Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

bveeramani left a comment

Choose a reason for hiding this comment

Uh oh!

bveeramani Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

iamjustinhsu commented Nov 18, 2025 •

edited

Loading