[data] Remove stats update thread by iamjustinhsu · Pull Request #57971 · ray-project/ray

iamjustinhsu · 2025-10-21T22:24:56Z

Description

Before this PR, the metrics would follow this path

StreamingExecutor collects metrics per operator
_StatsManager creates a thread to export metrics
StreamingExecutor sends metrics to _StatsManager, which performs a copy and holds a _stats_lock.
Stats Thread reads the metrics sent from 2)
Stats Thread sleeps every 5-10 seconds before exporting metrics to _StatsActor. These metrics can come in 2 forms: iteration and execution metrics.

I believe the purpose of the stats thread created in 2) was 2-fold

Don't export stats very frequently
Don't export Iteration and Execution stats separately (have them sent in the same rpc call)

However, this creates a lot of complexity (handling idle threads, etc...) and also makes it harder to perform histogram metrics, which need to copy an entire list of values. See #57851 for more details.

By removing the stats thread in 2), we can reduce complexity of management, and also avoid wasteful copying of metrics. The downside is that iteration and execution metrics are now sent separately, increasing the # of rpc calls. I don't think this is a concern, because the async updates to the _StatsActor were happening previously, and we can also tweak the update interval.

~~It's important to note that _stats_lock still lives on to update the last timestamps of each dataset. See * below for more details.~~

Now the new flow is:

StreamingExecutor collects metrics per operator
StreamingExecutor checks the last time _StatsActor was updated. If more than a default 5 seconds has passed since last updated, we send metrics to _StatsActor through the _StatsManager. Afterwards, we update the last updated timestamp. See * below for caveat.

*[important] Ray Data supports running multiple datasets concurrently. Therefore, I must keep track of each dataset last updated timestamp. _stats_lock is used to update that dictionary[dataset, last_updated] safely on register_dataset and on shutdown. On update, we don't require the lock because it does not update the dictionary's size. If we want to remove the lock entirely, I can think of 2 workarounds.

Create a per dataset StatsManager. Pros: no thread lock. Cons: Much more code changes. The iteration metrics go through a separate code path that is independent of the streaming executor, which will make this more challenging.
Update on every unix_epoch_timestamp % interval == 0, so that at 12:00, 12:05, etc.. the updates will be on that interval. Pros: easy to implement and it's stateless. Cons: Breaks down for slower streaming executors.
I can removed the lock by keeping the state in the 2 areas

BatchIterator
StreamingExecutor

I also verified that #55163 still solves the original issue

Related issues

Additional information

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

…calls + create an interval for updating stats per dataset Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

…/remove-stats-thread

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

alexeykudinkin · 2025-10-31T22:05:17Z

python/ray/data/_internal/block_batching/iter_batches.py

-    def before_epoch_start(self):
-        self._yielded_first_batch = False
+    @contextmanager
+    def _epoch_context(self):
+        """Context manager for epoch lifecycle: setup and cleanup.

-    def after_epoch_end(self):
-        StatsManager.clear_iteration_metrics(self._dataset_tag)


Actually find this callback methods (before/after) a bit easier to understand, let's keep them

For handling stateful stuff, if an exception is occurred, then it doesn't get cleaned up. If we go for 1 : 1 streaming executor -> StatsManager, then I can remove clear_iteration_metrics, and go back to the original version safelly

alexeykudinkin · 2025-10-31T22:11:07Z

python/ray/data/_internal/stats.py

+        # NOTE: This must be thread-safe because multiple datasets can
+        # be running at the same time. Decreasing the size of the dictionary
+        # is not thread-safe
+        with self._update_last_updated_lock:


Let's get rid of that:

Instantiate StatsManager in StreamingExecutor (so that these are 1:1)

Remove all locks

I decided to remove the lock by making StatsManager stateless, and push that information to the batch iterator and executor.

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

bveeramani

Overall LGTM.

ty. This is a useful refactor

bveeramani · 2025-11-10T18:54:29Z

python/ray/data/_internal/stats.py

-# Creating/getting an actor from multiple threads is not safe.
-# https://github.com/ray-project/ray/issues/41324
-_stats_actor_lock: threading.RLock = threading.RLock()


Why don't we need this anymore?

Two reasons:

[Core] getting/creating an actor from multiple thread errors #41324 is resolved

This follows the pattern of get_or_create_actor_location_tracker which also creates an actor on the driver process. As far as I know, we haven't had issues with ^ so I think it's safe to remove the lock

python/ray/data/_internal/stats.py

bveeramani · 2025-11-11T01:00:00Z

python/ray/data/_internal/execution/streaming_executor.py


+        self._metrics_last_updated: float = 0.0


Nit: Here and in iterator -- wasn't totally obvious to me what this variable represents based on the name alone. Would recommend either choosing a more descriptive name or adding a comment

bveeramani · 2025-11-11T01:03:18Z

python/ray/data/_internal/stats.py

+            if force_update:
+                ray.wait([ref], timeout=1)


What's the motivation for waiting here? I don't think we waited in the previous implementation, and blocking calls in the scheduling loop (even with 1s timeout) makes me feel nervous.

To keep the previous force_update. Honestly, it's not required and have removed it

bveeramani · 2025-11-11T01:05:04Z

python/ray/data/_internal/block_batching/iter_batches.py

+        _StatsManager.update_iteration_metrics(
+            self._stats, self._dataset_tag, force_update=True
+        )


My understanding is that, in the previous implementation, we don't update any metrics in after_epoch_end.

Don't think it's an issue, but what's the motivation for performing a force update here?

Hmm, I think this was a subtle bug in the previous implementation.

Essentially, the streaming executor force updates the metrics on shutdown to zero out the metrics. A similar story lies here as well -- If we don't update the metrics after_epoch_end, then the last iteration metrics aren't updated. I don't think "clearing" the metrics was the right move because then we wouldn't finalize the last iteration metrics.

lmk if that makes sense

python/ray/data/tests/test_stats.py

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

python/ray/data/_internal/execution/streaming_executor.py

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

cursor · 2025-11-11T01:53:03Z

python/ray/data/_internal/block_batching/iter_batches.py

+        now = time.time()
+        if (now - self._metrics_last_updated) > self.UPDATE_METRICS_INTERVAL_S:
+            _StatsManager.update_iteration_metrics(self._stats, self._dataset_tag)
+            self._metrics_last_updated = now


Bug: Incorrect timing skews metrics update intervals.

The metrics update interval check in yield_batch_context measures time incorrectly by calling time.time() after user code execution completes. This includes user processing time in the interval calculation, causing metrics to update at unpredictable intervals rather than the intended 5-second cadence. The timestamp should be captured before the yield statement or at the start of the context manager.

…/remove-stats-thread

python/ray/data/_internal/block_batching/iter_batches.py

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

python/ray/data/_internal/block_batching/iter_batches.py

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

cursor · 2025-11-12T20:33:07Z

python/ray/data/_internal/block_batching/iter_batches.py

    def yield_batch_context(self, batch: Batch):
-        with self._stats.iter_user_s.timer() if self._stats else nullcontext():
+        if self._stats is None:
+            return


Bug: Context Manager Yielding Error Breaks Context

The yield_batch_context context manager returns early without yielding when self._stats is None. Context managers decorated with @contextmanager must yield exactly once, otherwise a RuntimeError is raised when entering the context. The function should yield in both branches, similar to how get_next_batch_context handles the None case by yielding in the else branch, or use nullcontext() as the old implementation did.

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

## Description Before this PR, the metrics would follow this path 1. `StreamingExecutor` collects metrics per operator 2. `_StatsManager` creates a thread to export metrics 3. `StreamingExecutor` sends metrics to `_StatsManager`, which performs a copy and holds a `_stats_lock`. 4. Stats Thread reads the metrics sent from 2) 5. Stats Thread sleeps every 5-10 seconds before exporting metrics to `_StatsActor`. These metrics can come in 2 forms: iteration and execution metrics. I believe the purpose of the stats thread created in 2) was 2-fold - Don't export stats very frequently - Don't export Iteration and Execution stats separately (have them sent in the same rpc call) However, this creates a lot of complexity (handling idle threads, etc...) and also makes it harder to perform histogram metrics, which need to copy an entire list of values. See ray-project#57851 for more details. By removing the stats thread in 2), we can reduce complexity of management, and also avoid wasteful copying of metrics. The downside is that iteration and execution metrics are now sent separately, increasing the # of rpc calls. I don't think this is a concern, because the async updates to the `_StatsActor` were happening previously, and we can also tweak the update interval. ~~It's important to note that `_stats_lock` still lives on to update the last timestamps of each dataset. See * below for more details.~~ Now the new flow is: 1. `StreamingExecutor` collects metrics per operator 2. `StreamingExecutor` checks the last time `_StatsActor` was updated. If more than a default 5 seconds has passed since last updated, we send metrics to `_StatsActor` through the `_StatsManager`. Afterwards, we update the last updated timestamp. See * below for caveat. ~~\*[important] Ray Data supports running multiple datasets concurrently. Therefore, I must keep track of each dataset last updated timestamp. `_stats_lock` is used to update that dictionary[dataset, last_updated] safely on `register_dataset` and on `shutdown`. On update, we don't require the lock because it does not update the dictionary's size. If we want to remove the lock entirely, I can think of 2 workarounds.~~ 1. ~~Create a per dataset `StatsManager`. Pros: no thread lock. Cons: Much more code changes. The iteration metrics go through a separate code path that is independent of the streaming executor, which will make this more challenging.~~ 2. ~~Update on every unix_epoch_timestamp % interval == 0, so that at 12:00, 12:05, etc.. the updates will be on that interval. Pros: easy to implement and it's stateless. Cons: Breaks down for slower streaming executors.~~ 3. I can removed the lock by keeping the state in the 2 areas - BatchIterator - StreamingExecutor I also verified that ray-project#55163 still solves the original issue ## Related issues ## Additional information --------- Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

## Description Before this PR, the metrics would follow this path 1. `StreamingExecutor` collects metrics per operator 2. `_StatsManager` creates a thread to export metrics 3. `StreamingExecutor` sends metrics to `_StatsManager`, which performs a copy and holds a `_stats_lock`. 4. Stats Thread reads the metrics sent from 2) 5. Stats Thread sleeps every 5-10 seconds before exporting metrics to `_StatsActor`. These metrics can come in 2 forms: iteration and execution metrics. I believe the purpose of the stats thread created in 2) was 2-fold - Don't export stats very frequently - Don't export Iteration and Execution stats separately (have them sent in the same rpc call) However, this creates a lot of complexity (handling idle threads, etc...) and also makes it harder to perform histogram metrics, which need to copy an entire list of values. See ray-project#57851 for more details. By removing the stats thread in 2), we can reduce complexity of management, and also avoid wasteful copying of metrics. The downside is that iteration and execution metrics are now sent separately, increasing the # of rpc calls. I don't think this is a concern, because the async updates to the `_StatsActor` were happening previously, and we can also tweak the update interval. ~~It's important to note that `_stats_lock` still lives on to update the last timestamps of each dataset. See * below for more details.~~ Now the new flow is: 1. `StreamingExecutor` collects metrics per operator 2. `StreamingExecutor` checks the last time `_StatsActor` was updated. If more than a default 5 seconds has passed since last updated, we send metrics to `_StatsActor` through the `_StatsManager`. Afterwards, we update the last updated timestamp. See * below for caveat. ~~\*[important] Ray Data supports running multiple datasets concurrently. Therefore, I must keep track of each dataset last updated timestamp. `_stats_lock` is used to update that dictionary[dataset, last_updated] safely on `register_dataset` and on `shutdown`. On update, we don't require the lock because it does not update the dictionary's size. If we want to remove the lock entirely, I can think of 2 workarounds.~~ 1. ~~Create a per dataset `StatsManager`. Pros: no thread lock. Cons: Much more code changes. The iteration metrics go through a separate code path that is independent of the streaming executor, which will make this more challenging.~~ 2. ~~Update on every unix_epoch_timestamp % interval == 0, so that at 12:00, 12:05, etc.. the updates will be on that interval. Pros: easy to implement and it's stateless. Cons: Breaks down for slower streaming executors.~~ 3. I can removed the lock by keeping the state in the 2 areas - BatchIterator - StreamingExecutor I also verified that ray-project#55163 still solves the original issue ## Related issues ## Additional information --------- Signed-off-by: iamjustinhsu <jhsu@anyscale.com> Signed-off-by: Aydin Abiar <aydin@anyscale.com>

## Description Before this PR, the metrics would follow this path 1. `StreamingExecutor` collects metrics per operator 2. `_StatsManager` creates a thread to export metrics 3. `StreamingExecutor` sends metrics to `_StatsManager`, which performs a copy and holds a `_stats_lock`. 4. Stats Thread reads the metrics sent from 2) 5. Stats Thread sleeps every 5-10 seconds before exporting metrics to `_StatsActor`. These metrics can come in 2 forms: iteration and execution metrics. I believe the purpose of the stats thread created in 2) was 2-fold - Don't export stats very frequently - Don't export Iteration and Execution stats separately (have them sent in the same rpc call) However, this creates a lot of complexity (handling idle threads, etc...) and also makes it harder to perform histogram metrics, which need to copy an entire list of values. See ray-project#57851 for more details. By removing the stats thread in 2), we can reduce complexity of management, and also avoid wasteful copying of metrics. The downside is that iteration and execution metrics are now sent separately, increasing the # of rpc calls. I don't think this is a concern, because the async updates to the `_StatsActor` were happening previously, and we can also tweak the update interval. ~~It's important to note that `_stats_lock` still lives on to update the last timestamps of each dataset. See * below for more details.~~ Now the new flow is: 1. `StreamingExecutor` collects metrics per operator 2. `StreamingExecutor` checks the last time `_StatsActor` was updated. If more than a default 5 seconds has passed since last updated, we send metrics to `_StatsActor` through the `_StatsManager`. Afterwards, we update the last updated timestamp. See * below for caveat. ~~\*[important] Ray Data supports running multiple datasets concurrently. Therefore, I must keep track of each dataset last updated timestamp. `_stats_lock` is used to update that dictionary[dataset, last_updated] safely on `register_dataset` and on `shutdown`. On update, we don't require the lock because it does not update the dictionary's size. If we want to remove the lock entirely, I can think of 2 workarounds.~~ 1. ~~Create a per dataset `StatsManager`. Pros: no thread lock. Cons: Much more code changes. The iteration metrics go through a separate code path that is independent of the streaming executor, which will make this more challenging.~~ 2. ~~Update on every unix_epoch_timestamp % interval == 0, so that at 12:00, 12:05, etc.. the updates will be on that interval. Pros: easy to implement and it's stateless. Cons: Breaks down for slower streaming executors.~~ 3. I can removed the lock by keeping the state in the 2 areas - BatchIterator - StreamingExecutor I also verified that ray-project#55163 still solves the original issue ## Related issues ## Additional information --------- Signed-off-by: iamjustinhsu <jhsu@anyscale.com> Signed-off-by: YK <1811651+ykdojo@users.noreply.github.com>

## Description Before this PR, the metrics would follow this path 1. `StreamingExecutor` collects metrics per operator 2. `_StatsManager` creates a thread to export metrics 3. `StreamingExecutor` sends metrics to `_StatsManager`, which performs a copy and holds a `_stats_lock`. 4. Stats Thread reads the metrics sent from 2) 5. Stats Thread sleeps every 5-10 seconds before exporting metrics to `_StatsActor`. These metrics can come in 2 forms: iteration and execution metrics. I believe the purpose of the stats thread created in 2) was 2-fold - Don't export stats very frequently - Don't export Iteration and Execution stats separately (have them sent in the same rpc call) However, this creates a lot of complexity (handling idle threads, etc...) and also makes it harder to perform histogram metrics, which need to copy an entire list of values. See ray-project#57851 for more details. By removing the stats thread in 2), we can reduce complexity of management, and also avoid wasteful copying of metrics. The downside is that iteration and execution metrics are now sent separately, increasing the # of rpc calls. I don't think this is a concern, because the async updates to the `_StatsActor` were happening previously, and we can also tweak the update interval. ~~It's important to note that `_stats_lock` still lives on to update the last timestamps of each dataset. See * below for more details.~~ Now the new flow is: 1. `StreamingExecutor` collects metrics per operator 2. `StreamingExecutor` checks the last time `_StatsActor` was updated. If more than a default 5 seconds has passed since last updated, we send metrics to `_StatsActor` through the `_StatsManager`. Afterwards, we update the last updated timestamp. See * below for caveat. ~~\*[important] Ray Data supports running multiple datasets concurrently. Therefore, I must keep track of each dataset last updated timestamp. `_stats_lock` is used to update that dictionary[dataset, last_updated] safely on `register_dataset` and on `shutdown`. On update, we don't require the lock because it does not update the dictionary's size. If we want to remove the lock entirely, I can think of 2 workarounds.~~ 1. ~~Create a per dataset `StatsManager`. Pros: no thread lock. Cons: Much more code changes. The iteration metrics go through a separate code path that is independent of the streaming executor, which will make this more challenging.~~ 2. ~~Update on every unix_epoch_timestamp % interval == 0, so that at 12:00, 12:05, etc.. the updates will be on that interval. Pros: easy to implement and it's stateless. Cons: Breaks down for slower streaming executors.~~ 3. I can removed the lock by keeping the state in the 2 areas - BatchIterator - StreamingExecutor I also verified that ray-project#55163 still solves the original issue ## Related issues ## Additional information --------- Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

## Description Before this PR, the metrics would follow this path 1. `StreamingExecutor` collects metrics per operator 2. `_StatsManager` creates a thread to export metrics 3. `StreamingExecutor` sends metrics to `_StatsManager`, which performs a copy and holds a `_stats_lock`. 4. Stats Thread reads the metrics sent from 2) 5. Stats Thread sleeps every 5-10 seconds before exporting metrics to `_StatsActor`. These metrics can come in 2 forms: iteration and execution metrics. I believe the purpose of the stats thread created in 2) was 2-fold - Don't export stats very frequently - Don't export Iteration and Execution stats separately (have them sent in the same rpc call) However, this creates a lot of complexity (handling idle threads, etc...) and also makes it harder to perform histogram metrics, which need to copy an entire list of values. See ray-project#57851 for more details. By removing the stats thread in 2), we can reduce complexity of management, and also avoid wasteful copying of metrics. The downside is that iteration and execution metrics are now sent separately, increasing the # of rpc calls. I don't think this is a concern, because the async updates to the `_StatsActor` were happening previously, and we can also tweak the update interval. ~~It's important to note that `_stats_lock` still lives on to update the last timestamps of each dataset. See * below for more details.~~ Now the new flow is: 1. `StreamingExecutor` collects metrics per operator 2. `StreamingExecutor` checks the last time `_StatsActor` was updated. If more than a default 5 seconds has passed since last updated, we send metrics to `_StatsActor` through the `_StatsManager`. Afterwards, we update the last updated timestamp. See * below for caveat. ~~\*[important] Ray Data supports running multiple datasets concurrently. Therefore, I must keep track of each dataset last updated timestamp. `_stats_lock` is used to update that dictionary[dataset, last_updated] safely on `register_dataset` and on `shutdown`. On update, we don't require the lock because it does not update the dictionary's size. If we want to remove the lock entirely, I can think of 2 workarounds.~~ 1. ~~Create a per dataset `StatsManager`. Pros: no thread lock. Cons: Much more code changes. The iteration metrics go through a separate code path that is independent of the streaming executor, which will make this more challenging.~~ 2. ~~Update on every unix_epoch_timestamp % interval == 0, so that at 12:00, 12:05, etc.. the updates will be on that interval. Pros: easy to implement and it's stateless. Cons: Breaks down for slower streaming executors.~~ 3. I can removed the lock by keeping the state in the 2 areas - BatchIterator - StreamingExecutor I also verified that ray-project#55163 still solves the original issue ## Related issues ## Additional information --------- Signed-off-by: iamjustinhsu <jhsu@anyscale.com> Signed-off-by: Future-Outlier <eric901201@gmail.com>

## Description Before this PR, the metrics would follow this path 1. `StreamingExecutor` collects metrics per operator 2. `_StatsManager` creates a thread to export metrics 3. `StreamingExecutor` sends metrics to `_StatsManager`, which performs a copy and holds a `_stats_lock`. 4. Stats Thread reads the metrics sent from 2) 5. Stats Thread sleeps every 5-10 seconds before exporting metrics to `_StatsActor`. These metrics can come in 2 forms: iteration and execution metrics. I believe the purpose of the stats thread created in 2) was 2-fold - Don't export stats very frequently - Don't export Iteration and Execution stats separately (have them sent in the same rpc call) However, this creates a lot of complexity (handling idle threads, etc...) and also makes it harder to perform histogram metrics, which need to copy an entire list of values. See ray-project#57851 for more details. By removing the stats thread in 2), we can reduce complexity of management, and also avoid wasteful copying of metrics. The downside is that iteration and execution metrics are now sent separately, increasing the # of rpc calls. I don't think this is a concern, because the async updates to the `_StatsActor` were happening previously, and we can also tweak the update interval. ~~It's important to note that `_stats_lock` still lives on to update the last timestamps of each dataset. See * below for more details.~~ Now the new flow is: 1. `StreamingExecutor` collects metrics per operator 2. `StreamingExecutor` checks the last time `_StatsActor` was updated. If more than a default 5 seconds has passed since last updated, we send metrics to `_StatsActor` through the `_StatsManager`. Afterwards, we update the last updated timestamp. See * below for caveat. ~~\*[important] Ray Data supports running multiple datasets concurrently. Therefore, I must keep track of each dataset last updated timestamp. `_stats_lock` is used to update that dictionary[dataset, last_updated] safely on `register_dataset` and on `shutdown`. On update, we don't require the lock because it does not update the dictionary's size. If we want to remove the lock entirely, I can think of 2 workarounds.~~ 1. ~~Create a per dataset `StatsManager`. Pros: no thread lock. Cons: Much more code changes. The iteration metrics go through a separate code path that is independent of the streaming executor, which will make this more challenging.~~ 2. ~~Update on every unix_epoch_timestamp % interval == 0, so that at 12:00, 12:05, etc.. the updates will be on that interval. Pros: easy to implement and it's stateless. Cons: Breaks down for slower streaming executors.~~ 3. I can removed the lock by keeping the state in the 2 areas - BatchIterator - StreamingExecutor I also verified that ray-project#55163 still solves the original issue ## Related issues ## Additional information --------- Signed-off-by: iamjustinhsu <jhsu@anyscale.com> Signed-off-by: peterxcli <peterxcli@gmail.com>

iamjustinhsu added 3 commits October 21, 2025 15:24

[data] Remote stats update thread

452dd25

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

remove last_execution stats + fix tests

a39f01c

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

remove dead code

1d617ab

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

iamjustinhsu changed the title ~~[data] Remote stats update thread~~ [data] Remove stats update thread Oct 24, 2025

iamjustinhsu marked this pull request as ready for review October 29, 2025 16:29

iamjustinhsu requested a review from a team as a code owner October 29, 2025 16:30

This comment was marked as outdated.

Sign in to view

iamjustinhsu added 3 commits October 29, 2025 10:19

fix stats manager tests to respect number of execution and iteration …

aef5670

…calls + create an interval for updating stats per dataset Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

remove print

07fc634

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

update comment

b03d831

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

This comment was marked as outdated.

Sign in to view

clear dataset tags after completion

f97d934

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

This comment was marked as outdated.

Sign in to view

iamjustinhsu added 4 commits October 29, 2025 11:03

make updating a dictionary thread-safe

883642d

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

update comment about stats_lock

01d3e78

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

clearer name

31b0b72

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

Merge branch 'master' of https://github.com/ray-project/ray into jhsu…

540b7d4

…/remove-stats-thread

ray-gardener bot added performance data Ray Data-related issues labels Oct 29, 2025

This comment was marked as outdated.

Sign in to view

iamjustinhsu added 4 commits October 29, 2025 12:27

add error handling for remote calls

fc5d197

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

update type annotations

99997df

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

update comment

a614354

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

consolidate clearing path for iteration metrics

e85c4e9

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

This comment was marked as outdated.

Sign in to view

lint

002c19f

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

This comment was marked as outdated.

Sign in to view

iamjustinhsu added 3 commits October 30, 2025 10:08

update comment

1006af7

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

create metrics for StreamingSplit

6c4950c

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

use context managers

5fcbbdd

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

idempotent register_dataset_tag

d0f3b70

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

alexeykudinkin reviewed Oct 31, 2025

View reviewed changes

lint

90ce0d6

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

iamjustinhsu force-pushed the jhsu/remove-stats-thread branch from 5366997 to 90ce0d6 Compare November 1, 2025 20:03

iamjustinhsu added 3 commits November 1, 2025 13:06

use ray.wait + timeout

5d649c8

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

update comment

7aca107

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

update doc string

8397e6a

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

bveeramani reviewed Nov 11, 2025

View reviewed changes

iamjustinhsu added 2 commits November 10, 2025 17:23

address comments

3011c9b

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

update comment

99dd550

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

cursor bot reviewed Nov 11, 2025

View reviewed changes

python/ray/data/_internal/execution/streaming_executor.py Show resolved Hide resolved

update arg

f96846c

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

cursor bot reviewed Nov 11, 2025

View reviewed changes

Merge branch 'master' of https://github.com/ray-project/ray into jhsu…

5bf9329

…/remove-stats-thread

cursor bot reviewed Nov 12, 2025

View reviewed changes

python/ray/data/_internal/block_batching/iter_batches.py Show resolved Hide resolved

python/ray/data/_internal/block_batching/iter_batches.py Show resolved Hide resolved

get_or_create_stats_actor fix

6324ddd

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

bveeramani approved these changes Nov 12, 2025

View reviewed changes

python/ray/data/_internal/block_batching/iter_batches.py Show resolved Hide resolved

check for None

8ea42ea

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

cursor bot reviewed Nov 12, 2025

View reviewed changes

check for None again

9b52925

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

iamjustinhsu added the go add ONLY when ready to merge, run all tests label Nov 12, 2025

rebase

510070d

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

bveeramani merged commit 698b614 into ray-project:master Nov 14, 2025
6 checks passed

iamjustinhsu deleted the jhsu/remove-stats-thread branch November 14, 2025 22:48

Conversation

iamjustinhsu commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related issues

Additional information

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

iamjustinhsu Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bveeramani left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

iamjustinhsu Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cursor bot Nov 11, 2025

Choose a reason for hiding this comment

Bug: Incorrect timing skews metrics update intervals.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot Nov 12, 2025

Choose a reason for hiding this comment

Bug: Context Manager Yielding Error Breaks Context

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

iamjustinhsu commented Oct 21, 2025 •

edited

Loading

iamjustinhsu Nov 6, 2025 •

edited

Loading

iamjustinhsu Nov 11, 2025 •

edited

Loading