Skip to content

[Data] [optimizer] map/map_batches should output the same number of rows as the input #36295

@raulchen

Description

@raulchen

For such a logical plan read->map->limit, the optimizer will optimize it to read->limit->map.

This is under the assumption that map won't change the number of rows. However, currently this is true in practice.

We should enforce map ops to keep the exact number of rows. Otherwise, the limit pushdown rule will cause wrong results.

Metadata

Metadata

Assignees

Labels

P2Important issue, but not time-criticaldataRay Data-related issues

Type

No type

Projects

Status

In progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions