[data] documentation for ray data metrics#58610
[data] documentation for ray data metrics#58610richardliaw merged 3 commits intoray-project:masterfrom
Conversation
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
There was a problem hiding this comment.
Code Review
This pull request adds valuable documentation for Ray Data's Prometheus metrics, which will greatly help users in monitoring their data workloads. The structure is clear and the information is comprehensive. I've provided a few suggestions to enhance the accuracy and clarity of some metric descriptions and to remove a redundant note, ensuring the documentation is as precise as possible.
| * - `data_freed_bytes` | ||
| - Bytes freed by dataset operators | ||
| * - `data_current_bytes` | ||
| - Bytes in the memory store used by dataset operators |
| * - `average_bytes_inputs_per_task` | ||
| - Average size in bytes of ref bundles passed to tasks, or `None` if no tasks submitted | ||
| * - `average_rows_inputs_per_task` | ||
| - Average number of rows passed in to the task, or `None` if no task submitted |
There was a problem hiding this comment.
To be more precise and consistent with the source code documentation, it's better to specify that this is the average number of rows in the input blocks.
| - Average number of rows passed in to the task, or `None` if no task submitted | |
| - Average number of rows in input blocks per task, or `None` if no task submitted |
| * - `rows_task_outputs_generated` | ||
| - Number of output rows generated by tasks | ||
| * - `row_outputs_taken` | ||
| - Number of rows taken by downstream operators |
| * - `row_outputs_taken` | ||
| - Number of rows taken by downstream operators | ||
| * - `block_outputs_taken` | ||
| - Number of blocks taken by downstream operators |
| * - `block_outputs_taken` | ||
| - Number of blocks taken by downstream operators | ||
| * - `num_outputs_taken` | ||
| - Number of output blocks taken by downstream operators |
There was a problem hiding this comment.
| * - `num_outputs_taken` | ||
| - Number of output blocks taken by downstream operators | ||
| * - `bytes_outputs_taken` | ||
| - Byte size of output blocks taken by downstream operators |
There was a problem hiding this comment.
| * - `bytes_outputs_taken` | ||
| - Byte size of output blocks taken by downstream operators | ||
| * - `num_outputs_of_finished_tasks` | ||
| - Number of generated output blocks from finished tasks |
| * - `task_completion_time` | ||
| - Histogram of time spent running tasks to completion | ||
| * - `block_completion_time` | ||
| - Histogram of time spent running a single block to completion |
There was a problem hiding this comment.
The source code mentions an important detail about how this metric is approximated when multiple blocks are generated per task. It would be beneficial to include this in the documentation for accuracy.
| - Histogram of time spent running a single block to completion | |
| - Histogram of time spent running a single block to completion. If multiple blocks are generated per task, this is approximated by assuming each block took an equal amount of time to process. |
| .. note:: | ||
| Most metrics are only available for physical operators that use the map operation, such as operators created by :meth:`~ray.data.Dataset.map_batches`, :meth:`~ray.data.Dataset.map`, and :meth:`~ray.data.Dataset.flat_map`. |
…/ray-data-metrics-documentation
richardliaw
left a comment
There was a problem hiding this comment.
I wonder if you can automatically generate this in the future like what we do for all of our API docs
## Description Adds ray data metrics documentation for visibility. This should be periodically updated with the latest metrics. ## Related issues None ## Additional information None --------- Signed-off-by: iamjustinhsu <jhsu@anyscale.com> Signed-off-by: Justin Miller <justinrmiller@gmail.com>
## Description Adds ray data metrics documentation for visibility. This should be periodically updated with the latest metrics. ## Related issues None ## Additional information None --------- Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
## Description Adds ray data metrics documentation for visibility. This should be periodically updated with the latest metrics. ## Related issues None ## Additional information None --------- Signed-off-by: iamjustinhsu <jhsu@anyscale.com> Signed-off-by: Aydin Abiar <aydin@anyscale.com>
## Description Adds ray data metrics documentation for visibility. This should be periodically updated with the latest metrics. ## Related issues None ## Additional information None --------- Signed-off-by: iamjustinhsu <jhsu@anyscale.com> Signed-off-by: YK <1811651+ykdojo@users.noreply.github.com>
## Description Adds ray data metrics documentation for visibility. This should be periodically updated with the latest metrics. ## Related issues None ## Additional information None --------- Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
## Description Adds ray data metrics documentation for visibility. This should be periodically updated with the latest metrics. ## Related issues None ## Additional information None --------- Signed-off-by: iamjustinhsu <jhsu@anyscale.com> Signed-off-by: Future-Outlier <eric901201@gmail.com>
## Description Adds ray data metrics documentation for visibility. This should be periodically updated with the latest metrics. ## Related issues None ## Additional information None --------- Signed-off-by: iamjustinhsu <jhsu@anyscale.com> Signed-off-by: peterxcli <peterxcli@gmail.com>
Description
Adds ray data metrics documentation for visibility. This should be periodically updated with the latest metrics.
Related issues
None
Additional information
None