[data] explain optimized by iamjustinhsu · Pull Request #58074 · ray-project/ray

iamjustinhsu · 2025-10-24T02:13:21Z

Description

This PR introduces more information into the explain API. Before, explain showed Unoptimized Logical Plan, and Optimized Physical Plan. To make the explain API clearer, I introduce 4 types of plans

Logical Plan
Logical Plan (Optimized)
Physical Plan
Physical Plan (Optimized)

Example Output

>>> import ray
>>> ray.data.range(1000).select_columns("id").explain()
-------- Logical Plan --------
Project[Project]
+- Read[ReadRange]

-------- Logical Plan (Optimized) --------
Project[Project]
+- Read[ReadRange]

-------- Physical Plan --------
TaskPoolMapOperator[Project]
+- TaskPoolMapOperator[ReadRange]
   +- InputDataBuffer[Input]

-------- Physical Plan (Optimized) --------
TaskPoolMapOperator[ReadRange->Project]
+- InputDataBuffer[Input]

Related issues

None

Additional information

None

…54857) Signed-off-by: EkinKarabulut <ekarabulut@nvidia.com> Signed-off-by: EkinKarabulut <82878945+EkinKarabulut@users.noreply.github.com> Signed-off-by: Rueian <rueiancsie@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Co-authored-by: fscnick <6858627+fscnick@users.noreply.github.com> Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com> Co-authored-by: Rueian <rueiancsie@gmail.com> Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

…/explain-optimized

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

my-vegetable-has-exploded · 2025-10-24T03:55:11Z

This PR introduces more information into the explain API. Before, explain showed Unoptimized Logical Plan, and Optimized Physical Plan. To make the explain API clearer, I introduce 4 types of plans
* Logical Plan

* Logical Plan (Optimized)

* Physical Plan

* Physical Plan (Optimized)

Make sense to me. But is unoptimized plan needed? 😂

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

iamjustinhsu · 2025-10-24T15:55:56Z

@my-vegetable-has-exploded I think it's nice to have, to see how the plan is being transformed.

iamjustinhsu · 2025-10-24T17:31:54Z

python/ray/data/tests/test_consumption.py

-        "+- Map(<lambda>)\n"
-        "   +- ReadRange\n"
-        "-------- Physical Plan --------\n"
+        "Filter[Filter(<lambda>)]\n"


@richardliaw should this be verbose mode?

hi @iamjustinhsu, maybe we can in pick up #57798 here?

do you mean to combine the PRs? I think we should keep these separate because they serve different purposes, although merge conflicts will be a bit messy.

alexeykudinkin

LGTM, minor comments

alexeykudinkin · 2025-10-24T18:13:16Z

python/ray/data/_internal/plan.py

+        convert_fns: List[Callable[[Plan], Plan]] = [
+            lambda x: x,
+            LogicalOptimizer().optimize,
+            create_planner().plan,
+            PhysicalOptimizer().optimize,
+        ]


Instead, abstract bsae method from get_optimized_plan that will be returning all 4 (so that function we use here is exactly the same we're using when executing)

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

…/explain-optimized

alexeykudinkin · 2025-10-30T23:37:45Z

python/ray/data/_internal/plan.py

+        convert_fns = [lambda x: x] + get_plan_conversion_fns()
+        titles: List[str] = [
+            "Logical Plan",
+            "Logical Plan (Optimized)",
+            "Physical Plan",
+            "Physical Plan (Optimized)",
+        ]


Suggested change

convert_fns = [lambda x: x] + get_plan_conversion_fns()

titles: List[str] = [

"Logical Plan",

"Logical Plan (Optimized)",

"Physical Plan",

"Physical Plan (Optimized)",

]

titles, plan_transform_fn = zip(*[

("Logical Plan", None),

("Logical Plan (Optimized)", optimize_logical),

("Physical Plan", plan),

("Physical Plan (Optimized)", optimize_physical),

])

## Description This PR introduces more information into the `explain` API. Before, `explain` showed Unoptimized Logical Plan, and Optimized Physical Plan. To make the `explain` API clearer, I introduce 4 types of plans - Logical Plan - Logical Plan (Optimized) - Physical Plan - Physical Plan (Optimized) Example Output ```python >>> import ray >>> ray.data.range(1000).select_columns("id").explain() -------- Logical Plan -------- Project[Project] +- Read[ReadRange] -------- Logical Plan (Optimized) -------- Project[Project] +- Read[ReadRange] -------- Physical Plan -------- TaskPoolMapOperator[Project] +- TaskPoolMapOperator[ReadRange] +- InputDataBuffer[Input] -------- Physical Plan (Optimized) -------- TaskPoolMapOperator[ReadRange->Project] +- InputDataBuffer[Input] ``` ## Related issues None ## Additional information None --------- Signed-off-by: EkinKarabulut <ekarabulut@nvidia.com> Signed-off-by: EkinKarabulut <82878945+EkinKarabulut@users.noreply.github.com> Signed-off-by: Rueian <rueiancsie@gmail.com> Signed-off-by: iamjustinhsu <jhsu@anyscale.com> Co-authored-by: EkinKarabulut <82878945+EkinKarabulut@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Co-authored-by: fscnick <6858627+fscnick@users.noreply.github.com> Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com> Co-authored-by: Rueian <rueiancsie@gmail.com>

## Description This PR introduces more information into the `explain` API. Before, `explain` showed Unoptimized Logical Plan, and Optimized Physical Plan. To make the `explain` API clearer, I introduce 4 types of plans - Logical Plan - Logical Plan (Optimized) - Physical Plan - Physical Plan (Optimized) Example Output ```python >>> import ray >>> ray.data.range(1000).select_columns("id").explain() -------- Logical Plan -------- Project[Project] +- Read[ReadRange] -------- Logical Plan (Optimized) -------- Project[Project] +- Read[ReadRange] -------- Physical Plan -------- TaskPoolMapOperator[Project] +- TaskPoolMapOperator[ReadRange] +- InputDataBuffer[Input] -------- Physical Plan (Optimized) -------- TaskPoolMapOperator[ReadRange->Project] +- InputDataBuffer[Input] ``` ## Related issues None ## Additional information None --------- Signed-off-by: EkinKarabulut <ekarabulut@nvidia.com> Signed-off-by: EkinKarabulut <82878945+EkinKarabulut@users.noreply.github.com> Signed-off-by: Rueian <rueiancsie@gmail.com> Signed-off-by: iamjustinhsu <jhsu@anyscale.com> Co-authored-by: EkinKarabulut <82878945+EkinKarabulut@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Co-authored-by: fscnick <6858627+fscnick@users.noreply.github.com> Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com> Co-authored-by: Rueian <rueiancsie@gmail.com> Signed-off-by: Aydin Abiar <aydin@anyscale.com>

## Description This PR introduces more information into the `explain` API. Before, `explain` showed Unoptimized Logical Plan, and Optimized Physical Plan. To make the `explain` API clearer, I introduce 4 types of plans - Logical Plan - Logical Plan (Optimized) - Physical Plan - Physical Plan (Optimized) Example Output ```python >>> import ray >>> ray.data.range(1000).select_columns("id").explain() -------- Logical Plan -------- Project[Project] +- Read[ReadRange] -------- Logical Plan (Optimized) -------- Project[Project] +- Read[ReadRange] -------- Physical Plan -------- TaskPoolMapOperator[Project] +- TaskPoolMapOperator[ReadRange] +- InputDataBuffer[Input] -------- Physical Plan (Optimized) -------- TaskPoolMapOperator[ReadRange->Project] +- InputDataBuffer[Input] ``` ## Related issues None ## Additional information None --------- Signed-off-by: EkinKarabulut <ekarabulut@nvidia.com> Signed-off-by: EkinKarabulut <82878945+EkinKarabulut@users.noreply.github.com> Signed-off-by: Rueian <rueiancsie@gmail.com> Signed-off-by: iamjustinhsu <jhsu@anyscale.com> Co-authored-by: EkinKarabulut <82878945+EkinKarabulut@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Co-authored-by: fscnick <6858627+fscnick@users.noreply.github.com> Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com> Co-authored-by: Rueian <rueiancsie@gmail.com> Signed-off-by: Future-Outlier <eric901201@gmail.com>

## Description This PR introduces more information into the `explain` API. Before, `explain` showed Unoptimized Logical Plan, and Optimized Physical Plan. To make the `explain` API clearer, I introduce 4 types of plans - Logical Plan - Logical Plan (Optimized) - Physical Plan - Physical Plan (Optimized) Example Output ```python >>> import ray >>> ray.data.range(1000).select_columns("id").explain() -------- Logical Plan -------- Project[Project] +- Read[ReadRange] -------- Logical Plan (Optimized) -------- Project[Project] +- Read[ReadRange] -------- Physical Plan -------- TaskPoolMapOperator[Project] +- TaskPoolMapOperator[ReadRange] +- InputDataBuffer[Input] -------- Physical Plan (Optimized) -------- TaskPoolMapOperator[ReadRange->Project] +- InputDataBuffer[Input] ``` ## Related issues None ## Additional information None --------- Signed-off-by: EkinKarabulut <ekarabulut@nvidia.com> Signed-off-by: EkinKarabulut <82878945+EkinKarabulut@users.noreply.github.com> Signed-off-by: Rueian <rueiancsie@gmail.com> Signed-off-by: iamjustinhsu <jhsu@anyscale.com> Co-authored-by: EkinKarabulut <82878945+EkinKarabulut@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Co-authored-by: fscnick <6858627+fscnick@users.noreply.github.com> Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com> Co-authored-by: Rueian <rueiancsie@gmail.com> Signed-off-by: peterxcli <peterxcli@gmail.com>

EkinKarabulut and others added 5 commits October 23, 2025 19:09

Merge branch 'master' of https://github.com/ray-project/ray into jhsu…

3ff8de3

…/explain-optimized

add comments

31cab59

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

rename

37a2f42

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

move

b3668e8

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

iamjustinhsu changed the title ~~Jhsu/explain optimized~~ [data] explain optimized Oct 24, 2025

fix tests

2246101

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

iamjustinhsu commented Oct 24, 2025

View reviewed changes

alexeykudinkin reviewed Oct 24, 2025

View reviewed changes

iamjustinhsu added 5 commits October 24, 2025 13:24

abstract fn

4910b37

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

fix test

632c9d7

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

rewire logical to optimized in get execution plan

75b8df0

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

fix test

af99c34

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

Merge branch 'master' of https://github.com/ray-project/ray into jhsu…

a5c229f

…/explain-optimized

iamjustinhsu marked this pull request as ready for review October 28, 2025 22:00

iamjustinhsu requested a review from a team as a code owner October 28, 2025 22:00

iamjustinhsu added the go add ONLY when ready to merge, run all tests label Oct 28, 2025

ray-gardener bot added the data Ray Data-related issues label Oct 29, 2025

alexeykudinkin approved these changes Oct 30, 2025

View reviewed changes

alexeykudinkin merged commit 62d23ff into ray-project:master Oct 30, 2025
7 checks passed

iamjustinhsu deleted the jhsu/explain-optimized branch October 30, 2025 23:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[data] explain optimized#58074

[data] explain optimized#58074
alexeykudinkin merged 11 commits intoray-project:masterfrom
iamjustinhsu:jhsu/explain-optimized

iamjustinhsu commented Oct 24, 2025 •

edited

Loading

Uh oh!

my-vegetable-has-exploded commented Oct 24, 2025

Uh oh!

iamjustinhsu commented Oct 24, 2025

Uh oh!

iamjustinhsu Oct 24, 2025 •

edited

Loading

Uh oh!

my-vegetable-has-exploded Oct 29, 2025

Uh oh!

iamjustinhsu Oct 29, 2025

Uh oh!

alexeykudinkin left a comment

Uh oh!

alexeykudinkin Oct 24, 2025

Uh oh!

alexeykudinkin Oct 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

iamjustinhsu commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related issues

Additional information

Uh oh!

my-vegetable-has-exploded commented Oct 24, 2025

Uh oh!

iamjustinhsu commented Oct 24, 2025

Uh oh!

iamjustinhsu Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

my-vegetable-has-exploded Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

iamjustinhsu Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

alexeykudinkin left a comment

Choose a reason for hiding this comment

Uh oh!

alexeykudinkin Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

alexeykudinkin Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

iamjustinhsu commented Oct 24, 2025 •

edited

Loading

iamjustinhsu Oct 24, 2025 •

edited

Loading