refactor: rewrite expression-parsing for the 1,432th time

> - I've got big plans
> - please, no more refactors!
> - big plans

I think the expression parsing metadata needs rewriting, yet again, in light of:
- https://github.com/narwhals-dev/narwhals/issues/2444 
- https://github.com/narwhals-dev/narwhals/pull/2310#issuecomment-2763275571

### First, what we need to ensure:

All the following should be allowed
- `nw.col('a').mean().over('b')`
- `nw.col('a').rank()`
- `nw.col('a').rank().over('b')`

All the following should be disallowed:
- `nw.col('a').mean().over('b', order_by='i')`
- `nw.col('a').rank().over('b', order_by='i')`
- `nw.col('a').rank().abs().over('b')`

All the following should be marked as order-dependent so that they can be disallowed for `LazyFrame`:
- `nw.col('a').diff()`
- `nw.col('a').diff().abs().over(order_by='i')` (`diff` wasn't immediately followed by `over(order_by=...)`)

### Solution

First, `ExprKind` gets removed, we don't need it

`ExprMetadata` contain the following:
- `last_node_is_window`: whether the last node was a window (`diff`, `shift`, `rolling_*`, `cum_*`)
- `last_node_is_partitionable_window`: whether the last node was a partitionable window (`rank`, `is_unique`)
- `is_partitionable`: whether `over(partition_by='a')` can be applied to the current expression. `False` by default, as `nw.col('a').over('b')` doesn't make sense
- `is_partitioned`: whether the expression has already has `.over` applied to it (whether with `partition_by` set, or with the implicit global (`nw.lit(1)`) partition). `False` by default
- `is_orderable`: whether `over(order_by='i')` can be applied to the current expression. `False` by default, as `nw.col('a').over(order_by='i')` doesn't make sense
- `requires_ordering`: whether the expression depends on physical row order. `False` by default
- `preserves_length`: whether it preserves the original expression's length
- `is_scalar_like`: whether it produces a single value (aggregation or literal)
- `changes_length`: whether it changes length but without aggregating (e.g. `Expr.drop_nulls`)

Composition (anything non-noted stays unchangedD):
- if an elementwise operation (e.g. `.abs`) is applied all attributes except `last_node_is_*` are preserved
- if an aggregation is applied, then:
  - `is_partitionable` to `True`
  - `is_orderable` to `False` (`nw.col('a').mean().over(order_by='i')` doesn't make sense and is almost certainly a user error)
  - `changes_length` to `False`
  - `is_scalar_like` to `True`
  - `preserves_length` to `False`
- if a window function, like `rolling_*` / `diff` / `cum_*` / `shift`, is applied, then set:
  - `last_node_is_window` set to `True`
  - `is_partitionable` to `True`
  - `is_orderable` to `True`
  - `is_scalar_like` to `True`
  - `preserves_length` to `False`
- if a non-elementwise expression is applied (`rank`, `is_unique`), then set:
  - `last_node_is_partitionable_window` set to `True`
  - `is_partitionable` to `False`
  - `is_partitioned` to `True`
  - `is_orderable` to `False`
- if a filtration is applied (`drop_nulls`, `filter`), then set:
  - `is_partitionable` to `False`
  - `is_orderable` to `False`
  - `is_scalar_like` to `False`, raise if `is_scalar_like` is True (can't do `nw.col('a').mean().drop_nulls()`)
  - `preserves_length` to `False`
 - if `over` is applied, set:
   - `is_partitionable` to `False`, raise if it was `False` and `partition_by` was specified UNLESS `last_node_is_partitionable_window`, in which case the current `partition_by` is combined with the last expression's `partition_by`
   - `is_partitioned` to `True`, raise if it's already `True` (that would be a double-over, not allowed) UNLESS `last_node_is_partitionable_window`, in which case the current `partition_by` is combined with the last expression's `partition_by`
   - `is_orderable` to `False`, raise if it was `False` and `order_by` was specified
   - `changes_length` to `False`, raise if it was `True`
   - `requires_ordering` to `False` if `last_node_is_window` and `order_by` was specified and `is_orderable` is `True`
- for n-ary operations, set:
  - `is_partitionable` to `True` if any input is partitionable
  - `is_partitioned` to `True` if any input is partitioned
  - `is_orderable` to `True` if any input is orderable
  - `requires_ordering` to `True` if any input requires ordering
  - `preserves_length` to `True` if any input preserves length
  - `is_scalar_like` to `True` if all inputs are scalar-like
  - `changes_length` to `False`, and raise if any input changes length

How this achieves the desired disallowments:
- `nw.col('a').rank().abs().over('b')`: `rank` sets `is_partitioned` to `True`, then `abs` sets `last_node_is_partitioned_window` to `False`, and `over` finds that `is_partitioned` is `True` and `last_node_is_partitioned_window` is `False` and so disallows the operation

There's likely some mistakes / things I'm missing and overlooking here, but I just wanted to start by trying to collect my thoughts

**EDIT**

- `changes_length` is unnecessary, it's determined by `preserves_length` and `is_scalar_like`
- `partitionable_window` => `unorderable_window`
- also need to preserve that `.diff().shift().over(order_by=...)` requires physical row order

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: rewrite expression-parsing for the 1,432th time #2478

First, what we need to ensure:

Solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

refactor: rewrite expression-parsing for the 1,432th time #2478

Description

First, what we need to ensure:

Solution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions