Skip to content

[Data] Fuse MapBatches even if they modify the row count#60756

Merged
bveeramani merged 2 commits intomasterfrom
fix-operator-fusion
Feb 4, 2026
Merged

[Data] Fuse MapBatches even if they modify the row count#60756
bveeramani merged 2 commits intomasterfrom
fix-operator-fusion

Conversation

@bveeramani
Copy link
Member

@bveeramani bveeramani commented Feb 4, 2026

This PR updates the operator fusion rule to fuse MapBatches even if they modify the row counts. The intention of this PR is to preserve the historical operator fusion behavior and avoid introducing regressions.

For more details, see the timeline below.

Timeline of Changes

Date Event Description
June 8, 2023 Limit pushdown added Added limit pushdown and a property to MapBatches incorrectly stating it doesn't modify row counts. (#35950)
June 27, 2023 Limit pushdown disabled Rule disabled because it incorrectly pushed limits past UDFs that modified row counts. (#36831)
April 28, 2025 Fusion restricted Added logic to stop fusing operators that modify row counts when the downstream has a batch size. MapBatches stayed fused only because of its incorrect property (#52570).
July 8, 2025 Limit pushdown re-enabled with special case Re-enabled with a special case to prevent pushing limits past MapBatches. (#39486)
Oct 24, 2025 Special case removed Special case removed, re-introducing the bug where limits are pushed past MapBatches. (#57880)
Feb 2, 2026 Property Fix Updated MapBatches to correctly report it modifies rows by default. This fixed the pushdown bug but broke fusion logic. (PR #60448)
Feb 4, 2026 (This PR) Add a special-case to preserve the historical MapBatches fusion behavior

Cursor Bugbot reviewed your changes and found no issues for commit d99e7b1

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
@bveeramani bveeramani requested a review from a team as a code owner February 4, 2026 21:20
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a special-case exception to allow MapBatches operators to be fused, even when they can modify row counts. This change preserves historical behavior and prevents performance regressions that were introduced when MapBatches was updated to correctly report its ability to modify row counts. The changes to the test files correctly reflect this new fusion logic by removing the now-unnecessary udf_modifying_row_count=False parameter. The overall change is well-justified and looks good. I have one minor suggestion to improve code maintainability.

@bveeramani bveeramani enabled auto-merge (squash) February 4, 2026 21:49
@github-actions github-actions bot added the go add ONLY when ready to merge, run all tests label Feb 4, 2026
@bveeramani bveeramani merged commit e3d15ae into master Feb 4, 2026
7 of 8 checks passed
@bveeramani bveeramani deleted the fix-operator-fusion branch February 4, 2026 22:30
tiennguyentony pushed a commit to tiennguyentony/ray that referenced this pull request Feb 7, 2026
…ct#60756)

This PR updates the operator fusion rule to fuse `MapBatches` even if
they modify the row counts. The intention of this PR is to preserve the
historical operator fusion behavior and avoid introducing regressions.

For more details, see the timeline below.
---

### Timeline of Changes

| Date | Event | Description |
| :--- | :--- | :--- |
| **June 8, 2023** | **Limit pushdown added** | Added limit pushdown and
a property to `MapBatches` incorrectly stating it doesn't modify row
counts. (ray-project#35950) |
| **June 27, 2023** | **Limit pushdown disabled** | Rule disabled
because it incorrectly pushed limits past UDFs that modified row counts.
(ray-project#36831) |
| **April 28, 2025** | **Fusion restricted** | Added logic to stop
fusing operators that modify row counts when the downstream has a batch
size. `MapBatches` stayed fused only because of its incorrect property
(ray-project#52570). |
| **July 8, 2025** | **Limit pushdown re-enabled with special case** |
Re-enabled with a special case to prevent pushing limits past
`MapBatches`. ([ray-project#39486](ray-project#39486))
|
| **Oct 24, 2025** | **Special case removed** | Special case removed,
re-introducing the bug where limits are pushed past `MapBatches`.
([ray-project#57880](ray-project#57880)) |
| **Feb 2, 2026** | **Property Fix** | Updated `MapBatches` to correctly
report it modifies rows by default. This fixed the pushdown bug but
broke fusion logic. ([PR
ray-project#60448](ray-project#60448)) |
| **Feb 4, 2026** | (This PR) | Add a special-case to preserve the
historical `MapBatches` fusion behavior |
---

<!-- BUGBOT_STATUS --><sup><a
href="https://cursor.com/dashboard?tab=bugbot">Cursor Bugbot</a>
reviewed your changes and found no issues for commit
<u>d99e7b1</u></sup><!-- /BUGBOT_STATUS -->

---------

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: tiennguyentony <46289799+tiennguyentony@users.noreply.github.com>
tiennguyentony pushed a commit to tiennguyentony/ray that referenced this pull request Feb 7, 2026
…ct#60756)


This PR updates the operator fusion rule to fuse `MapBatches` even if
they modify the row counts. The intention of this PR is to preserve the
historical operator fusion behavior and avoid introducing regressions.

For more details, see the timeline below.
---

### Timeline of Changes

| Date | Event | Description |
| :--- | :--- | :--- |
| **June 8, 2023** | **Limit pushdown added** | Added limit pushdown and
a property to `MapBatches` incorrectly stating it doesn't modify row
counts. (ray-project#35950) |
| **June 27, 2023** | **Limit pushdown disabled** | Rule disabled
because it incorrectly pushed limits past UDFs that modified row counts.
(ray-project#36831) |
| **April 28, 2025** | **Fusion restricted** | Added logic to stop
fusing operators that modify row counts when the downstream has a batch
size. `MapBatches` stayed fused only because of its incorrect property
(ray-project#52570). |
| **July 8, 2025** | **Limit pushdown re-enabled with special case** |
Re-enabled with a special case to prevent pushing limits past
`MapBatches`. ([ray-project#39486](ray-project#39486))
|
| **Oct 24, 2025** | **Special case removed** | Special case removed,
re-introducing the bug where limits are pushed past `MapBatches`.
([ray-project#57880](ray-project#57880)) |
| **Feb 2, 2026** | **Property Fix** | Updated `MapBatches` to correctly
report it modifies rows by default. This fixed the pushdown bug but
broke fusion logic. ([PR
ray-project#60448](ray-project#60448)) |
| **Feb 4, 2026** | (This PR) | Add a special-case to preserve the
historical `MapBatches` fusion behavior |
---

<!-- BUGBOT_STATUS --><sup><a
href="https://cursor.com/dashboard?tab=bugbot">Cursor Bugbot</a>
reviewed your changes and found no issues for commit
<u>d99e7b1</u></sup><!-- /BUGBOT_STATUS -->

---------

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: tiennguyentony <46289799+tiennguyentony@users.noreply.github.com>
tiennguyentony pushed a commit to tiennguyentony/ray that referenced this pull request Feb 7, 2026
…ct#60756)


This PR updates the operator fusion rule to fuse `MapBatches` even if
they modify the row counts. The intention of this PR is to preserve the
historical operator fusion behavior and avoid introducing regressions.

For more details, see the timeline below.
---

### Timeline of Changes

| Date | Event | Description |
| :--- | :--- | :--- |
| **June 8, 2023** | **Limit pushdown added** | Added limit pushdown and
a property to `MapBatches` incorrectly stating it doesn't modify row
counts. (ray-project#35950) |
| **June 27, 2023** | **Limit pushdown disabled** | Rule disabled
because it incorrectly pushed limits past UDFs that modified row counts.
(ray-project#36831) |
| **April 28, 2025** | **Fusion restricted** | Added logic to stop
fusing operators that modify row counts when the downstream has a batch
size. `MapBatches` stayed fused only because of its incorrect property
(ray-project#52570). |
| **July 8, 2025** | **Limit pushdown re-enabled with special case** |
Re-enabled with a special case to prevent pushing limits past
`MapBatches`. ([ray-project#39486](ray-project#39486))
|
| **Oct 24, 2025** | **Special case removed** | Special case removed,
re-introducing the bug where limits are pushed past `MapBatches`.
([ray-project#57880](ray-project#57880)) |
| **Feb 2, 2026** | **Property Fix** | Updated `MapBatches` to correctly
report it modifies rows by default. This fixed the pushdown bug but
broke fusion logic. ([PR
ray-project#60448](ray-project#60448)) |
| **Feb 4, 2026** | (This PR) | Add a special-case to preserve the
historical `MapBatches` fusion behavior |
---

<!-- BUGBOT_STATUS --><sup><a
href="https://cursor.com/dashboard?tab=bugbot">Cursor Bugbot</a>
reviewed your changes and found no issues for commit
<u>d99e7b1</u></sup><!-- /BUGBOT_STATUS -->

---------

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
elliot-barn pushed a commit that referenced this pull request Feb 9, 2026
This PR updates the operator fusion rule to fuse `MapBatches` even if
they modify the row counts. The intention of this PR is to preserve the
historical operator fusion behavior and avoid introducing regressions.

For more details, see the timeline below.
---

### Timeline of Changes

| Date | Event | Description |
| :--- | :--- | :--- |
| **June 8, 2023** | **Limit pushdown added** | Added limit pushdown and
a property to `MapBatches` incorrectly stating it doesn't modify row
counts. (#35950) |
| **June 27, 2023** | **Limit pushdown disabled** | Rule disabled
because it incorrectly pushed limits past UDFs that modified row counts.
(#36831) |
| **April 28, 2025** | **Fusion restricted** | Added logic to stop
fusing operators that modify row counts when the downstream has a batch
size. `MapBatches` stayed fused only because of its incorrect property
(#52570). |
| **July 8, 2025** | **Limit pushdown re-enabled with special case** |
Re-enabled with a special case to prevent pushing limits past
`MapBatches`. ([#39486](#39486))
|
| **Oct 24, 2025** | **Special case removed** | Special case removed,
re-introducing the bug where limits are pushed past `MapBatches`.
([#57880](#57880)) |
| **Feb 2, 2026** | **Property Fix** | Updated `MapBatches` to correctly
report it modifies rows by default. This fixed the pushdown bug but
broke fusion logic. ([PR
#60448](#60448)) |
| **Feb 4, 2026** | (This PR) | Add a special-case to preserve the
historical `MapBatches` fusion behavior |
---

<!-- BUGBOT_STATUS --><sup><a
href="https://cursor.com/dashboard?tab=bugbot">Cursor Bugbot</a>
reviewed your changes and found no issues for commit
<u>d99e7b1</u></sup><!-- /BUGBOT_STATUS -->

---------

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
elliot-barn pushed a commit that referenced this pull request Feb 9, 2026
This PR updates the operator fusion rule to fuse `MapBatches` even if
they modify the row counts. The intention of this PR is to preserve the
historical operator fusion behavior and avoid introducing regressions.

For more details, see the timeline below.
---

### Timeline of Changes

| Date | Event | Description |
| :--- | :--- | :--- |
| **June 8, 2023** | **Limit pushdown added** | Added limit pushdown and
a property to `MapBatches` incorrectly stating it doesn't modify row
counts. (#35950) |
| **June 27, 2023** | **Limit pushdown disabled** | Rule disabled
because it incorrectly pushed limits past UDFs that modified row counts.
(#36831) |
| **April 28, 2025** | **Fusion restricted** | Added logic to stop
fusing operators that modify row counts when the downstream has a batch
size. `MapBatches` stayed fused only because of its incorrect property
(#52570). |
| **July 8, 2025** | **Limit pushdown re-enabled with special case** |
Re-enabled with a special case to prevent pushing limits past
`MapBatches`. ([#39486](#39486))
|
| **Oct 24, 2025** | **Special case removed** | Special case removed,
re-introducing the bug where limits are pushed past `MapBatches`.
([#57880](#57880)) |
| **Feb 2, 2026** | **Property Fix** | Updated `MapBatches` to correctly
report it modifies rows by default. This fixed the pushdown bug but
broke fusion logic. ([PR
#60448](#60448)) |
| **Feb 4, 2026** | (This PR) | Add a special-case to preserve the
historical `MapBatches` fusion behavior |
---

<!-- BUGBOT_STATUS --><sup><a
href="https://cursor.com/dashboard?tab=bugbot">Cursor Bugbot</a>
reviewed your changes and found no issues for commit
<u>d99e7b1</u></sup><!-- /BUGBOT_STATUS -->

---------

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
bveeramani added a commit that referenced this pull request Feb 9, 2026
This PR updates `map_groups` to assume that the UDF might change the row
count. This change is necessary to fix a bug where `Limit` gets
incorrectly pushed past the `map_groups` (fixes
#60872).

For more context, see:
* #60448
* #60756

---------

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
bveeramani added a commit that referenced this pull request Feb 9, 2026
This PR updates `map_groups` to assume that the UDF might change the row
count. This change is necessary to fix a bug where `Limit` gets
incorrectly pushed past the `map_groups` (fixes
#60872).

For more context, see:
* #60448
* #60756

---------

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
preneond pushed a commit to preneond/ray that referenced this pull request Feb 15, 2026
…ject#60881)

This PR updates `map_groups` to assume that the UDF might change the row
count. This change is necessary to fix a bug where `Limit` gets
incorrectly pushed past the `map_groups` (fixes
ray-project#60872).

For more context, see:
* ray-project#60448
* ray-project#60756

---------

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Ondrej Prenek <ondra.prenek@gmail.com>
limarkdcunha pushed a commit to limarkdcunha/ray that referenced this pull request Feb 17, 2026
…ject#60881)

This PR updates `map_groups` to assume that the UDF might change the row
count. This change is necessary to fix a bug where `Limit` gets
incorrectly pushed past the `map_groups` (fixes
ray-project#60872).

For more context, see:
* ray-project#60448
* ray-project#60756

---------

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
MuhammadSaif700 pushed a commit to MuhammadSaif700/ray that referenced this pull request Feb 17, 2026
…ject#60881)

This PR updates `map_groups` to assume that the UDF might change the row
count. This change is necessary to fix a bug where `Limit` gets
incorrectly pushed past the `map_groups` (fixes
ray-project#60872).

For more context, see:
* ray-project#60448
* ray-project#60756

---------

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Muhammad Saif <2024BBIT200@student.Uet.edu.pk>
Kunchd pushed a commit to Kunchd/ray that referenced this pull request Feb 17, 2026
…ct#60756)

This PR updates the operator fusion rule to fuse `MapBatches` even if
they modify the row counts. The intention of this PR is to preserve the
historical operator fusion behavior and avoid introducing regressions.

For more details, see the timeline below.
---

### Timeline of Changes

| Date | Event | Description |
| :--- | :--- | :--- |
| **June 8, 2023** | **Limit pushdown added** | Added limit pushdown and
a property to `MapBatches` incorrectly stating it doesn't modify row
counts. (ray-project#35950) |
| **June 27, 2023** | **Limit pushdown disabled** | Rule disabled
because it incorrectly pushed limits past UDFs that modified row counts.
(ray-project#36831) |
| **April 28, 2025** | **Fusion restricted** | Added logic to stop
fusing operators that modify row counts when the downstream has a batch
size. `MapBatches` stayed fused only because of its incorrect property
(ray-project#52570). |
| **July 8, 2025** | **Limit pushdown re-enabled with special case** |
Re-enabled with a special case to prevent pushing limits past
`MapBatches`. ([ray-project#39486](ray-project#39486))
|
| **Oct 24, 2025** | **Special case removed** | Special case removed,
re-introducing the bug where limits are pushed past `MapBatches`.
([ray-project#57880](ray-project#57880)) |
| **Feb 2, 2026** | **Property Fix** | Updated `MapBatches` to correctly
report it modifies rows by default. This fixed the pushdown bug but
broke fusion logic. ([PR
ray-project#60448](ray-project#60448)) |
| **Feb 4, 2026** | (This PR) | Add a special-case to preserve the
historical `MapBatches` fusion behavior |
---

<!-- BUGBOT_STATUS --><sup><a
href="https://cursor.com/dashboard?tab=bugbot">Cursor Bugbot</a>
reviewed your changes and found no issues for commit
<u>d99e7b1</u></sup><!-- /BUGBOT_STATUS -->

---------

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Kunchd pushed a commit to Kunchd/ray that referenced this pull request Feb 17, 2026
…ject#60881)

This PR updates `map_groups` to assume that the UDF might change the row
count. This change is necessary to fix a bug where `Limit` gets
incorrectly pushed past the `map_groups` (fixes
ray-project#60872).

For more context, see:
* ray-project#60448
* ray-project#60756

---------

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
ans9868 pushed a commit to ans9868/ray that referenced this pull request Feb 18, 2026
…ct#60756)

This PR updates the operator fusion rule to fuse `MapBatches` even if
they modify the row counts. The intention of this PR is to preserve the
historical operator fusion behavior and avoid introducing regressions.

For more details, see the timeline below.
---

### Timeline of Changes

| Date | Event | Description |
| :--- | :--- | :--- |
| **June 8, 2023** | **Limit pushdown added** | Added limit pushdown and
a property to `MapBatches` incorrectly stating it doesn't modify row
counts. (ray-project#35950) |
| **June 27, 2023** | **Limit pushdown disabled** | Rule disabled
because it incorrectly pushed limits past UDFs that modified row counts.
(ray-project#36831) |
| **April 28, 2025** | **Fusion restricted** | Added logic to stop
fusing operators that modify row counts when the downstream has a batch
size. `MapBatches` stayed fused only because of its incorrect property
(ray-project#52570). |
| **July 8, 2025** | **Limit pushdown re-enabled with special case** |
Re-enabled with a special case to prevent pushing limits past
`MapBatches`. ([ray-project#39486](ray-project#39486))
|
| **Oct 24, 2025** | **Special case removed** | Special case removed,
re-introducing the bug where limits are pushed past `MapBatches`.
([ray-project#57880](ray-project#57880)) |
| **Feb 2, 2026** | **Property Fix** | Updated `MapBatches` to correctly
report it modifies rows by default. This fixed the pushdown bug but
broke fusion logic. ([PR
ray-project#60448](ray-project#60448)) |
| **Feb 4, 2026** | (This PR) | Add a special-case to preserve the
historical `MapBatches` fusion behavior |
---

<!-- BUGBOT_STATUS --><sup><a
href="https://cursor.com/dashboard?tab=bugbot">Cursor Bugbot</a>
reviewed your changes and found no issues for commit
<u>d99e7b1</u></sup><!-- /BUGBOT_STATUS -->

---------

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Adel Nour <ans9868@nyu.edu>
ans9868 pushed a commit to ans9868/ray that referenced this pull request Feb 18, 2026
…ject#60881)

This PR updates `map_groups` to assume that the UDF might change the row
count. This change is necessary to fix a bug where `Limit` gets
incorrectly pushed past the `map_groups` (fixes
ray-project#60872).

For more context, see:
* ray-project#60448
* ray-project#60756

---------

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Adel Nour <ans9868@nyu.edu>
Aydin-ab pushed a commit to kunling-anyscale/ray that referenced this pull request Feb 20, 2026
…ct#60756)

This PR updates the operator fusion rule to fuse `MapBatches` even if
they modify the row counts. The intention of this PR is to preserve the
historical operator fusion behavior and avoid introducing regressions.

For more details, see the timeline below.
---

### Timeline of Changes

| Date | Event | Description |
| :--- | :--- | :--- |
| **June 8, 2023** | **Limit pushdown added** | Added limit pushdown and
a property to `MapBatches` incorrectly stating it doesn't modify row
counts. (ray-project#35950) |
| **June 27, 2023** | **Limit pushdown disabled** | Rule disabled
because it incorrectly pushed limits past UDFs that modified row counts.
(ray-project#36831) |
| **April 28, 2025** | **Fusion restricted** | Added logic to stop
fusing operators that modify row counts when the downstream has a batch
size. `MapBatches` stayed fused only because of its incorrect property
(ray-project#52570). |
| **July 8, 2025** | **Limit pushdown re-enabled with special case** |
Re-enabled with a special case to prevent pushing limits past
`MapBatches`. ([ray-project#39486](ray-project#39486))
|
| **Oct 24, 2025** | **Special case removed** | Special case removed,
re-introducing the bug where limits are pushed past `MapBatches`.
([ray-project#57880](ray-project#57880)) |
| **Feb 2, 2026** | **Property Fix** | Updated `MapBatches` to correctly
report it modifies rows by default. This fixed the pushdown bug but
broke fusion logic. ([PR
ray-project#60448](ray-project#60448)) |
| **Feb 4, 2026** | (This PR) | Add a special-case to preserve the
historical `MapBatches` fusion behavior |
---

<!-- BUGBOT_STATUS --><sup><a
href="https://cursor.com/dashboard?tab=bugbot">Cursor Bugbot</a>
reviewed your changes and found no issues for commit
<u>d99e7b1</u></sup><!-- /BUGBOT_STATUS -->

---------

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Aydin-ab pushed a commit to kunling-anyscale/ray that referenced this pull request Feb 20, 2026
…ject#60881)

This PR updates `map_groups` to assume that the UDF might change the row
count. This change is necessary to fix a bug where `Limit` gets
incorrectly pushed past the `map_groups` (fixes
ray-project#60872).

For more context, see:
* ray-project#60448
* ray-project#60756

---------

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
…ct#60756)

This PR updates the operator fusion rule to fuse `MapBatches` even if
they modify the row counts. The intention of this PR is to preserve the
historical operator fusion behavior and avoid introducing regressions.

For more details, see the timeline below.
---

### Timeline of Changes

| Date | Event | Description |
| :--- | :--- | :--- |
| **June 8, 2023** | **Limit pushdown added** | Added limit pushdown and
a property to `MapBatches` incorrectly stating it doesn't modify row
counts. (ray-project#35950) |
| **June 27, 2023** | **Limit pushdown disabled** | Rule disabled
because it incorrectly pushed limits past UDFs that modified row counts.
(ray-project#36831) |
| **April 28, 2025** | **Fusion restricted** | Added logic to stop
fusing operators that modify row counts when the downstream has a batch
size. `MapBatches` stayed fused only because of its incorrect property
(ray-project#52570). |
| **July 8, 2025** | **Limit pushdown re-enabled with special case** |
Re-enabled with a special case to prevent pushing limits past
`MapBatches`. ([ray-project#39486](ray-project#39486))
|
| **Oct 24, 2025** | **Special case removed** | Special case removed,
re-introducing the bug where limits are pushed past `MapBatches`.
([ray-project#57880](ray-project#57880)) |
| **Feb 2, 2026** | **Property Fix** | Updated `MapBatches` to correctly
report it modifies rows by default. This fixed the pushdown bug but
broke fusion logic. ([PR
ray-project#60448](ray-project#60448)) |
| **Feb 4, 2026** | (This PR) | Add a special-case to preserve the
historical `MapBatches` fusion behavior |
---

<!-- BUGBOT_STATUS --><sup><a
href="https://cursor.com/dashboard?tab=bugbot">Cursor Bugbot</a>
reviewed your changes and found no issues for commit
<u>d99e7b1</u></sup><!-- /BUGBOT_STATUS -->

---------

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: peterxcli <peterxcli@gmail.com>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
…ject#60881)

This PR updates `map_groups` to assume that the UDF might change the row
count. This change is necessary to fix a bug where `Limit` gets
incorrectly pushed past the `map_groups` (fixes
ray-project#60872).

For more context, see:
* ray-project#60448
* ray-project#60756

---------

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: peterxcli <peterxcli@gmail.com>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
…ct#60756)

This PR updates the operator fusion rule to fuse `MapBatches` even if
they modify the row counts. The intention of this PR is to preserve the
historical operator fusion behavior and avoid introducing regressions.

For more details, see the timeline below.
---

### Timeline of Changes

| Date | Event | Description |
| :--- | :--- | :--- |
| **June 8, 2023** | **Limit pushdown added** | Added limit pushdown and
a property to `MapBatches` incorrectly stating it doesn't modify row
counts. (ray-project#35950) |
| **June 27, 2023** | **Limit pushdown disabled** | Rule disabled
because it incorrectly pushed limits past UDFs that modified row counts.
(ray-project#36831) |
| **April 28, 2025** | **Fusion restricted** | Added logic to stop
fusing operators that modify row counts when the downstream has a batch
size. `MapBatches` stayed fused only because of its incorrect property
(ray-project#52570). |
| **July 8, 2025** | **Limit pushdown re-enabled with special case** |
Re-enabled with a special case to prevent pushing limits past
`MapBatches`. ([ray-project#39486](ray-project#39486))
|
| **Oct 24, 2025** | **Special case removed** | Special case removed,
re-introducing the bug where limits are pushed past `MapBatches`.
([ray-project#57880](ray-project#57880)) |
| **Feb 2, 2026** | **Property Fix** | Updated `MapBatches` to correctly
report it modifies rows by default. This fixed the pushdown bug but
broke fusion logic. ([PR
ray-project#60448](ray-project#60448)) |
| **Feb 4, 2026** | (This PR) | Add a special-case to preserve the
historical `MapBatches` fusion behavior |
---

<!-- BUGBOT_STATUS --><sup><a
href="https://cursor.com/dashboard?tab=bugbot">Cursor Bugbot</a>
reviewed your changes and found no issues for commit
<u>d99e7b1</u></sup><!-- /BUGBOT_STATUS -->

---------

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: peterxcli <peterxcli@gmail.com>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
…ject#60881)

This PR updates `map_groups` to assume that the UDF might change the row
count. This change is necessary to fix a bug where `Limit` gets
incorrectly pushed past the `map_groups` (fixes
ray-project#60872).

For more context, see:
* ray-project#60448
* ray-project#60756

---------

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: peterxcli <peterxcli@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants