Skip to content

chore: refactor EVERYTHING to do with broadcasting#2005

Merged
MarcoGorelli merged 140 commits intonarwhals-dev:mainfrom
MarcoGorelli:broadcast-in-select
Feb 19, 2025
Merged

chore: refactor EVERYTHING to do with broadcasting#2005
MarcoGorelli merged 140 commits intonarwhals-dev:mainfrom
MarcoGorelli:broadcast-in-select

Conversation

@MarcoGorelli
Copy link
Copy Markdown
Member

@MarcoGorelli MarcoGorelli commented Feb 13, 2025

This...took some effort

What type of PR is this? (check all applicable)

  • 💾 Refactor
  • ✨ Feature
  • 🐛 Bug Fix
  • 🔧 Optimization
  • 📝 Documentation
  • ✅ Test
  • 🐳 Other

Related issues

  • Related issue #<issue number>
  • Closes #<issue number>

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below

@MarcoGorelli
Copy link
Copy Markdown
Member Author

but strings_are_column_names is the inverse of str_as_lit right?

yup! that's a fair point, thanks, might as well align with Polars' internal name

@dangotbanned
Copy link
Copy Markdown
Member

dangotbanned commented Feb 18, 2025

but strings_are_column_names is the inverse of str_as_lit right?

yup! that's a fair point, thanks, might as well align with Polars' internal name

No worries
You can always make a TypeAlias with a doc if you wanna add more detail that follows the parameter around.
E.g.:

StrAsLit: TypeAlias = bool
"""<Explain me>"""

def func(str_as_lit: StrAsLit):...

Personal preference I suppose, but I like being able to get more info - but not having it always there if I'm familiar. These things seem small but do add up over time

@dangotbanned

This comment was marked as resolved.

@MarcoGorelli
Copy link
Copy Markdown
Member Author

great suggestions, thanks - fancy adding a commit with them?

@dangotbanned
Copy link
Copy Markdown
Member

great suggestions, thanks - fancy adding a commit with them?

Sure thing

@dangotbanned
Copy link
Copy Markdown
Member

dangotbanned commented Feb 18, 2025

Suggested edit:

The import was blocking me using the normal suggestion.
This should be a perf improvement as well, since broadcast = False skips the rest

diff --git a/narwhals/_expression_parsing.py b/narwhals/_expression_parsing.py
index 38300ad8..09419a5d 100644
--- a/narwhals/_expression_parsing.py
+++ b/narwhals/_expression_parsing.py
@@ -8,6 +8,7 @@ from enum import auto
 from typing import TYPE_CHECKING
 from typing import Any
 from typing import Callable
+from typing import Iterable
 from typing import Sequence
 from typing import TypedDict
 from typing import TypeVar
@@ -445,25 +446,27 @@ def infer_kind(obj: IntoExpr | _1DArray | object, *, str_as_lit: bool) -> ExprKi
     return ExprKind.LITERAL
 
 
+def _iter_n_ary_operation(
+    compliant_exprs: Iterable[CompliantExpr[Any] | object], kinds: Sequence[ExprKind]
+) -> Iterable[CompliantExpr[Any] | object]:
+    broadcast = any(kind is ExprKind.TRANSFORM for kind in kinds)
+    broadcast_kind = {ExprKind.AGGREGATION, ExprKind.LITERAL}
+    if not broadcast:
+        yield from compliant_exprs
+    else:
+        for compliant_expr, kind in zip(compliant_exprs, kinds):
+            if kind in broadcast_kind and is_compliant_expr(compliant_expr):
+                yield compliant_expr.broadcast(kind)
+            else:
+                yield compliant_expr
+
+
 def apply_n_ary_operation(
-    plx: CompliantNamespace,
-    function: Any,
-    *comparands: IntoExpr,
-    str_as_lit: bool,
+    plx: CompliantNamespace, function: Any, *comparands: IntoExpr, str_as_lit: bool
 ) -> CompliantExpr[Any]:
     compliant_exprs = (
         extract_compliant(plx, comparand, str_as_lit=str_as_lit)
         for comparand in comparands
     )
     kinds = [infer_kind(comparand, str_as_lit=str_as_lit) for comparand in comparands]
-
-    broadcast = any(kind is ExprKind.TRANSFORM for kind in kinds)
-    compliant_exprs = (
-        compliant_expr.broadcast(kind)
-        if broadcast
-        and kind in (ExprKind.AGGREGATION, ExprKind.LITERAL)
-        and is_compliant_expr(compliant_expr)
-        else compliant_expr
-        for compliant_expr, kind in zip(compliant_exprs, kinds)
-    )
-    return function(*compliant_exprs)
+    return function(*_iter_n_ary_operation(compliant_exprs, kinds))

@dangotbanned
Copy link
Copy Markdown
Member

dangotbanned commented Feb 18, 2025

#2005 (comment)

@MarcoGorelli ah I forgot about how singular aliases inherit their docs (unlike unions):

image

I'm not as sold on my own idea now 😅

Also I was thinking this did the same thing as attribute docs, but it isn't visible everywhere like:

image

image

for x in kwargs.values()
)
exprs = chain(args, kwargs.values())
agg_or_lit = {ExprKind.AGGREGATION, ExprKind.LITERAL}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you'd want some of these sets defined on ExprKind like you've got for Implementation?

narwhals/narwhals/utils.py

Lines 238 to 256 in 390d6e0

def is_pandas_like(self: Self) -> bool:
"""Return whether implementation is pandas, Modin, or cuDF.
Returns:
Boolean.
Examples:
>>> import pandas as pd
>>> import narwhals as nw
>>> df_native = pd.DataFrame({"a": [1, 2, 3]})
>>> df = nw.from_native(df_native)
>>> df.implementation.is_pandas_like()
True
"""
return self in {
Implementation.PANDAS,
Implementation.MODIN,
Implementation.CUDF,
}

I think you'd be able to make a lot of little ergonomic changes - if there's more cases that are repeated a lot

Comment on lines +435 to +439
if (
is_narwhals_series(obj)
or is_numpy_array(obj)
or (isinstance(obj, str) and not str_as_lit)
):
Copy link
Copy Markdown
Member

@dangotbanned dangotbanned Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could split out into is_transform(obj: Any, str_as_lit: bool)

Comment on lines +173 to +178
compliant_exprs = [
compliant_expr.broadcast(kind)
if kind in (ExprKind.AGGREGATION, ExprKind.LITERAL)
else compliant_expr
for compliant_expr, kind in zip(compliant_exprs, kinds)
]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and x._metadata["kind"] in (ExprKind.AGGREGATION, ExprKind.LITERAL)
for x in kwargs.values()

def infer_kind(obj: IntoExpr | _1DArray | object, *, str_as_lit: bool) -> ExprKind:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you thought about making this a @classmethod for ExprKind?

Maybe called from_expr or I guess technically from_into_expr (but thats a mouthful)

def maybe_evaluate_expr(
df: CompliantDataFrame, expr: CompliantExpr[CompliantSeriesT_co]
) -> Sequence[CompliantSeriesT_co]: ...
) -> CompliantSeriesT_co: ...
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for a follow up:
This reminds me that we should fix the typing for CompliantSeriesT_co
for Spark we use Column, in duckdb Expression. They are not compliant series :/

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened as (#2044)

@MarcoGorelli
Copy link
Copy Markdown
Member Author

Thanks all for comments, much appreciated 🙏

If people are OK with the logic of the changes, OK if we get this in and then get further refactors in as follow-ups?

Copy link
Copy Markdown
Member

@dangotbanned dangotbanned left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @MarcoGorelli

@MarcoGorelli
Copy link
Copy Markdown
Member Author

Thanks for your review and for taking such a close look, legend mate!

Merging then so we can proceed with further improvements (e.g. other typing PRs, order-dependent lazy ops, and more) before too many merge conflicts crop up

@MarcoGorelli MarcoGorelli merged commit 228e478 into narwhals-dev:main Feb 19, 2025
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants