Skip to content

[data] explain optimized#58074

Merged
alexeykudinkin merged 11 commits intoray-project:masterfrom
iamjustinhsu:jhsu/explain-optimized
Oct 30, 2025
Merged

[data] explain optimized#58074
alexeykudinkin merged 11 commits intoray-project:masterfrom
iamjustinhsu:jhsu/explain-optimized

Conversation

@iamjustinhsu
Copy link
Contributor

@iamjustinhsu iamjustinhsu commented Oct 24, 2025

Description

This PR introduces more information into the explain API. Before, explain showed Unoptimized Logical Plan, and Optimized Physical Plan. To make the explain API clearer, I introduce 4 types of plans

  • Logical Plan
  • Logical Plan (Optimized)
  • Physical Plan
  • Physical Plan (Optimized)

Example Output

>>> import ray
>>> ray.data.range(1000).select_columns("id").explain()
-------- Logical Plan --------
Project[Project]
+- Read[ReadRange]

-------- Logical Plan (Optimized) --------
Project[Project]
+- Read[ReadRange]

-------- Physical Plan --------
TaskPoolMapOperator[Project]
+- TaskPoolMapOperator[ReadRange]
   +- InputDataBuffer[Input]

-------- Physical Plan (Optimized) --------
TaskPoolMapOperator[ReadRange->Project]
+- InputDataBuffer[Input]

Related issues

None

Additional information

None

EkinKarabulut and others added 5 commits October 23, 2025 19:09
…54857)

Signed-off-by: EkinKarabulut <ekarabulut@nvidia.com>
Signed-off-by: EkinKarabulut <82878945+EkinKarabulut@users.noreply.github.com>
Signed-off-by: Rueian <rueiancsie@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com>
Co-authored-by: fscnick <6858627+fscnick@users.noreply.github.com>
Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com>
Co-authored-by: Rueian <rueiancsie@gmail.com>
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
@my-vegetable-has-exploded
Copy link
Contributor

This PR introduces more information into the explain API. Before, explain showed Unoptimized Logical Plan, and Optimized Physical Plan. To make the explain API clearer, I introduce 4 types of plans

* Logical Plan

* Logical Plan (Optimized)

* Physical Plan

* Physical Plan (Optimized)

Make sense to me. But is unoptimized plan needed? 😂

@iamjustinhsu iamjustinhsu changed the title Jhsu/explain optimized [data] explain optimized Oct 24, 2025
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
@iamjustinhsu
Copy link
Contributor Author

@my-vegetable-has-exploded I think it's nice to have, to see how the plan is being transformed.

"+- Map(<lambda>)\n"
" +- ReadRange\n"
"-------- Physical Plan --------\n"
"Filter[Filter(<lambda>)]\n"
Copy link
Contributor Author

@iamjustinhsu iamjustinhsu Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@richardliaw should this be verbose mode?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @iamjustinhsu, maybe we can in pick up #57798 here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean to combine the PRs? I think we should keep these separate because they serve different purposes, although merge conflicts will be a bit messy.

Copy link
Contributor

@alexeykudinkin alexeykudinkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, minor comments

Comment on lines +119 to +124
convert_fns: List[Callable[[Plan], Plan]] = [
lambda x: x,
LogicalOptimizer().optimize,
create_planner().plan,
PhysicalOptimizer().optimize,
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead, abstract bsae method from get_optimized_plan that will be returning all 4 (so that function we use here is exactly the same we're using when executing)

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
@iamjustinhsu iamjustinhsu marked this pull request as ready for review October 28, 2025 22:00
@iamjustinhsu iamjustinhsu requested a review from a team as a code owner October 28, 2025 22:00
@iamjustinhsu iamjustinhsu added the go add ONLY when ready to merge, run all tests label Oct 28, 2025
@ray-gardener ray-gardener bot added the data Ray Data-related issues label Oct 29, 2025
Comment on lines +118 to +124
convert_fns = [lambda x: x] + get_plan_conversion_fns()
titles: List[str] = [
"Logical Plan",
"Logical Plan (Optimized)",
"Physical Plan",
"Physical Plan (Optimized)",
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
convert_fns = [lambda x: x] + get_plan_conversion_fns()
titles: List[str] = [
"Logical Plan",
"Logical Plan (Optimized)",
"Physical Plan",
"Physical Plan (Optimized)",
]
titles, plan_transform_fn = zip(*[
("Logical Plan", None),
("Logical Plan (Optimized)", optimize_logical),
("Physical Plan", plan),
("Physical Plan (Optimized)", optimize_physical),
])

@alexeykudinkin alexeykudinkin merged commit 62d23ff into ray-project:master Oct 30, 2025
7 checks passed
@iamjustinhsu iamjustinhsu deleted the jhsu/explain-optimized branch October 30, 2025 23:50
YoussefEssDS pushed a commit to YoussefEssDS/ray that referenced this pull request Nov 8, 2025
## Description
This PR introduces more information into the `explain` API. Before,
`explain` showed Unoptimized Logical Plan, and Optimized Physical Plan.
To make the `explain` API clearer, I introduce 4 types of plans
- Logical Plan
- Logical Plan (Optimized)
- Physical Plan
- Physical Plan (Optimized)

Example Output
```python
>>> import ray
>>> ray.data.range(1000).select_columns("id").explain()
-------- Logical Plan --------
Project[Project]
+- Read[ReadRange]

-------- Logical Plan (Optimized) --------
Project[Project]
+- Read[ReadRange]

-------- Physical Plan --------
TaskPoolMapOperator[Project]
+- TaskPoolMapOperator[ReadRange]
   +- InputDataBuffer[Input]

-------- Physical Plan (Optimized) --------
TaskPoolMapOperator[ReadRange->Project]
+- InputDataBuffer[Input]
```

## Related issues
None

## Additional information
None

---------

Signed-off-by: EkinKarabulut <ekarabulut@nvidia.com>
Signed-off-by: EkinKarabulut <82878945+EkinKarabulut@users.noreply.github.com>
Signed-off-by: Rueian <rueiancsie@gmail.com>
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
Co-authored-by: EkinKarabulut <82878945+EkinKarabulut@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com>
Co-authored-by: fscnick <6858627+fscnick@users.noreply.github.com>
Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com>
Co-authored-by: Rueian <rueiancsie@gmail.com>
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
## Description
This PR introduces more information into the `explain` API. Before,
`explain` showed Unoptimized Logical Plan, and Optimized Physical Plan.
To make the `explain` API clearer, I introduce 4 types of plans
- Logical Plan
- Logical Plan (Optimized)
- Physical Plan
- Physical Plan (Optimized)

Example Output
```python
>>> import ray
>>> ray.data.range(1000).select_columns("id").explain()
-------- Logical Plan --------
Project[Project]
+- Read[ReadRange]

-------- Logical Plan (Optimized) --------
Project[Project]
+- Read[ReadRange]

-------- Physical Plan --------
TaskPoolMapOperator[Project]
+- TaskPoolMapOperator[ReadRange]
   +- InputDataBuffer[Input]

-------- Physical Plan (Optimized) --------
TaskPoolMapOperator[ReadRange->Project]
+- InputDataBuffer[Input]
```

## Related issues
None

## Additional information
None

---------

Signed-off-by: EkinKarabulut <ekarabulut@nvidia.com>
Signed-off-by: EkinKarabulut <82878945+EkinKarabulut@users.noreply.github.com>
Signed-off-by: Rueian <rueiancsie@gmail.com>
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
Co-authored-by: EkinKarabulut <82878945+EkinKarabulut@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com>
Co-authored-by: fscnick <6858627+fscnick@users.noreply.github.com>
Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com>
Co-authored-by: Rueian <rueiancsie@gmail.com>
Aydin-ab pushed a commit to Aydin-ab/ray-aydin that referenced this pull request Nov 19, 2025
## Description
This PR introduces more information into the `explain` API. Before,
`explain` showed Unoptimized Logical Plan, and Optimized Physical Plan.
To make the `explain` API clearer, I introduce 4 types of plans
- Logical Plan
- Logical Plan (Optimized)
- Physical Plan
- Physical Plan (Optimized)

Example Output
```python
>>> import ray
>>> ray.data.range(1000).select_columns("id").explain()
-------- Logical Plan --------
Project[Project]
+- Read[ReadRange]

-------- Logical Plan (Optimized) --------
Project[Project]
+- Read[ReadRange]

-------- Physical Plan --------
TaskPoolMapOperator[Project]
+- TaskPoolMapOperator[ReadRange]
   +- InputDataBuffer[Input]

-------- Physical Plan (Optimized) --------
TaskPoolMapOperator[ReadRange->Project]
+- InputDataBuffer[Input]
```

## Related issues
None

## Additional information
None

---------

Signed-off-by: EkinKarabulut <ekarabulut@nvidia.com>
Signed-off-by: EkinKarabulut <82878945+EkinKarabulut@users.noreply.github.com>
Signed-off-by: Rueian <rueiancsie@gmail.com>
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
Co-authored-by: EkinKarabulut <82878945+EkinKarabulut@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com>
Co-authored-by: fscnick <6858627+fscnick@users.noreply.github.com>
Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com>
Co-authored-by: Rueian <rueiancsie@gmail.com>
Signed-off-by: Aydin Abiar <aydin@anyscale.com>
Future-Outlier pushed a commit to Future-Outlier/ray that referenced this pull request Dec 7, 2025
## Description
This PR introduces more information into the `explain` API. Before,
`explain` showed Unoptimized Logical Plan, and Optimized Physical Plan.
To make the `explain` API clearer, I introduce 4 types of plans
- Logical Plan
- Logical Plan (Optimized)
- Physical Plan
- Physical Plan (Optimized)

Example Output
```python
>>> import ray
>>> ray.data.range(1000).select_columns("id").explain()
-------- Logical Plan --------
Project[Project]
+- Read[ReadRange]

-------- Logical Plan (Optimized) --------
Project[Project]
+- Read[ReadRange]

-------- Physical Plan --------
TaskPoolMapOperator[Project]
+- TaskPoolMapOperator[ReadRange]
   +- InputDataBuffer[Input]

-------- Physical Plan (Optimized) --------
TaskPoolMapOperator[ReadRange->Project]
+- InputDataBuffer[Input]
```

## Related issues
None

## Additional information
None

---------

Signed-off-by: EkinKarabulut <ekarabulut@nvidia.com>
Signed-off-by: EkinKarabulut <82878945+EkinKarabulut@users.noreply.github.com>
Signed-off-by: Rueian <rueiancsie@gmail.com>
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
Co-authored-by: EkinKarabulut <82878945+EkinKarabulut@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com>
Co-authored-by: fscnick <6858627+fscnick@users.noreply.github.com>
Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com>
Co-authored-by: Rueian <rueiancsie@gmail.com>
Signed-off-by: Future-Outlier <eric901201@gmail.com>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
## Description
This PR introduces more information into the `explain` API. Before,
`explain` showed Unoptimized Logical Plan, and Optimized Physical Plan.
To make the `explain` API clearer, I introduce 4 types of plans
- Logical Plan
- Logical Plan (Optimized)
- Physical Plan
- Physical Plan (Optimized)

Example Output
```python
>>> import ray
>>> ray.data.range(1000).select_columns("id").explain()
-------- Logical Plan --------
Project[Project]
+- Read[ReadRange]

-------- Logical Plan (Optimized) --------
Project[Project]
+- Read[ReadRange]

-------- Physical Plan --------
TaskPoolMapOperator[Project]
+- TaskPoolMapOperator[ReadRange]
   +- InputDataBuffer[Input]

-------- Physical Plan (Optimized) --------
TaskPoolMapOperator[ReadRange->Project]
+- InputDataBuffer[Input]
```

## Related issues
None

## Additional information
None

---------

Signed-off-by: EkinKarabulut <ekarabulut@nvidia.com>
Signed-off-by: EkinKarabulut <82878945+EkinKarabulut@users.noreply.github.com>
Signed-off-by: Rueian <rueiancsie@gmail.com>
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
Co-authored-by: EkinKarabulut <82878945+EkinKarabulut@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com>
Co-authored-by: fscnick <6858627+fscnick@users.noreply.github.com>
Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com>
Co-authored-by: Rueian <rueiancsie@gmail.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Ray Data-related issues go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants