fix: Align division by zero behavior across all backends#2761
fix: Align division by zero behavior across all backends#2761MarcoGorelli merged 43 commits intomainfrom
Conversation
|
@MarcoGorelli what's the current status of pandas-like broadcasting? I am having a very bad time having to distinguish between python scalar vs As an example, we would enter frame.<with_columns|select>(
x1 = nw.lit(value) / scalar,
x2 = nw.lit(value) / nw.lit(another_value),
x3 = nw.lit(value) / nw.col(name),
x4 = nw.col(name) / scalar,
x5 = nw.col(name) / nw.lit(value),
x6 = nw.col(name) / nw.col(another_name),
)and they need special attentions 😂 On top of that, there is specific logic given the dtype backend 🙈 |
|
@MarcoGorelli I rolled backed the "not-fully functional" idea I had for pandas. I do see tests failing for "not so old" polars due to PanicExpection's and different results of Finally, for the missing coverage, the #2761 (comment) above is relevant, and I would like to make sure that I am not missing anything |
|
thanks - will take a closer look, i'd like to be quite careful with this one 😄 |
MarcoGorelli
left a comment
There was a problem hiding this comment.
thanks @FBruzzesi !
the pandas implementation looks a bit more complex than i was hoping for, will take a look
but we could start with aligning at least the other backends if that's simpler
| # The following lines are adapted from pandas' pyarrow implementation. | ||
| # Ref: https://github.com/pandas-dev/pandas/blob/262fcfbffcee5c3116e86a951d8b693f90411e68/pandas/core/arrays/arrow/array.py#L124-L154 | ||
|
|
||
| safe_mask = pc.not_equal(right, lit(0, type=right.type)) |
There was a problem hiding this comment.
shall we just add a comment noting what we've changed from pandas' pyarrow impl?
narwhals/_dask/expr.py
Outdated
| def _floordiv( | ||
| df: DaskLazyFrame, series: dx.Series, other: dx.Series | Any | ||
| ) -> dx.Series: | ||
| series, other = align_series_full_broadcast(df, series, other) | ||
| return (series.__floordiv__(other)).where(other != 0, None) | ||
|
|
||
| def func(df: DaskLazyFrame) -> list[dx.Series]: | ||
| if isinstance(other, type(self)): | ||
| if len(other_ := other(df)) > 1: # pragma: no cover | ||
| msg = "Expected expression with single output, found multiple" | ||
| raise ValueError(msg) | ||
| other_series = other_[0] | ||
| else: | ||
| other_series = other |
There was a problem hiding this comment.
🤔 this looks right, but imma have to think about it, not sure about repeating the broadcasting logic here..
narwhals/_sql/expr.py
Outdated
| def __floordiv__(self, other: Any) -> Self: ... | ||
| def __rfloordiv__(self, other: Any) -> Self: ... |
There was a problem hiding this comment.
the implementation looks quite similar between duckdb and pyspark, wonder if the implementation can be shared here?
There was a problem hiding this comment.
Disclaimer: I didn't invest more than 3 minutes looking at this
Yes that's true, but there is a limitation: the new abstracted self._when does not allow for chaining with .otherwise. Understandably so, for ibis it would be complex since it's inside the ibis.cases rather than a chained ops
There was a problem hiding this comment.
it should be possible by putting statemnts within each other, like here
narwhals/narwhals/_sql/expr.py
Lines 515 to 528 in c22b582
There was a problem hiding this comment.
With the latest changes in when, I am able to just write:
def __floordiv__(self, other: Self) -> Self:
def func(expr: NativeExprT, other: NativeExprT) -> NativeExprT:
return self._when(other != self._lit(0), expr // other, self._lit(None))
return self._with_binary(func, other=other)
def __rfloordiv__(self, other: Self) -> Self:
def func(expr: NativeExprT, other: NativeExprT) -> NativeExprT:
return self._when(expr != self._lit(0), other // expr, self._lit(None))
return self._with_binary(func, other=other).alias("literal")at the _sql/ level. This would work for ibis and duckdb, but not for pyspark since we have a different dedicated implementation.
However, this results in all sort of typing issues. At sql level says that NativeExprT does not implement floordiv and rfloordiv. If I go add them these methods than it's even worse.
Honestly it's a 10 lines difference, but we should probably start doing some workaround for the typing.. Personally I don't think it's worth it
There was a problem hiding this comment.
You can use op.floordiv / op.truediv in _sql/expr, that'll resolve the typing issues
Personally I'd have preferred to implement arithmetic ops for NativeExpr and ignore Ibis' half-baked typing, but @dangotbanned seemed quite keen on typing Ibis, which seems fair enough
|
thanks for updating! generally on-board, though still hoping for a simpler dask way, will try something out |
MarcoGorelli
left a comment
There was a problem hiding this comment.
one comment
for Dask, can we just use
other_series = maybe_evaluate_expr(df, other)
instead of lines 238-244?
| # The following lines are adapted from pandas' pyarrow implementation. | ||
| # Ref: https://github.com/pandas-dev/pandas/blob/262fcfbffcee5c3116e86a951d8b693f90411e68/pandas/core/arrays/arrow/array.py#L124-L154 | ||
|
|
||
| safe_mask = pc.not_equal(right, lit(0, type=right.type)) |
MarcoGorelli
left a comment
There was a problem hiding this comment.
this is already a step forwards, we can think about pandas-like later. but i hope nobody is actually floor-dividing by zero 😄
thanks @FBruzzesi !

What type of PR is this? (check all applicable)
Related issues
Checklist