Conversation
|
- Mentioned in (#2391 (comment)) - Needed again for #2572
at the moment it looks like this adds a self-standing |
* chore(typing): Add `_typing_compat.py` - Mentioned in (#2391 (comment)) - Needed again for #2572 * refactor: Reuse `TypeVar` import * refactor: Reuse `@deprecated` import * refactor: Reuse `Protocol38` import * docs: Add module-level docstring
Still need: - reprs - fix the hierarchy issue (#2572 (comment)) - Flag summing (#2572 (comment))
- 1 step closer to the understanding for (#2572 (comment)) - There's still some magic going on when `polars` serializes - Need to track down where `'collect_groups': 'ElementWise'` and `'collect_groups': 'GroupWise'` first appear - Seems like the flags get reduced
|
Thanks for peeking @MarcoGorelli
That is definitely the eventual goal! 🤞 Despite how quickly things have progressed, I still feel I'm a few steps behind being ready for that just yet. General overviewI'm trying to focus on modeling these structures and how they interact:
My thought was that So like what I have in Current
|
This comment was marked as resolved.
This comment was marked as resolved.
Can't tell if this means `FirstT` will match the entry `firstt`, but preserve the `firstt` fix (https://github.com/codespell-project/codespell#ignoring-words) (#2572 (comment))
|
I should've expected this, but it was a nice suprise to find we get hashable selectors for free 😄 from narwhals._plan import selectors as ndcs
>>> ndcs.matches("[^z]a")._ir == ndcs.matches("[^z]a")._ir
True
>>> ndcs.matches("[^z]a")._ir == ndcs.matches("abc")._ir
False@MarcoGorelli regarding (#2291) from narwhals._plan import selectors as ndcs
>>> ndcs.all()._ir == ndcs.all()._ir
True
lhs = ndcs.all()
rhs = ndcs.all().mean()
>>> lhs._ir == rhs._ir
False
>>> lhs._ir == rhs._ir.expr
TrueAnd the same holds for the non-selectors from narwhals._plan import demo as nwd
lhs = nwd.all()
rhs = nwd.all().mean()
>>> lhs._ir == rhs._ir
False
>>> lhs._ir == rhs._ir.expr
True
>>> type(rhs._ir)
narwhals._plan.aggregation.Mean |
An experiment towards (#2572 (comment))
tests/plan/expr_parsing_test.py
Outdated
| def test_valid_windows() -> None: | ||
| """Was planning to test this matched, but we seem to allow elementwise horizontal? | ||
|
|
||
| https://github.com/narwhals-dev/narwhals/blob/63c8e4771a1df4e0bfeea5559c303a4a447d5cc2/tests/expression_parsing_test.py#L10-L45 | ||
| """ | ||
| ELEMENTWISE_ERR = re.compile(r"cannot use.+over.+elementwise", re.IGNORECASE) # noqa: N806 | ||
| a = nwd.col("a") | ||
| assert a.cum_sum() | ||
| assert a.cum_sum().over(order_by="id") | ||
| with pytest.raises(InvalidOperationError, match=ELEMENTWISE_ERR): | ||
| assert a.cum_sum().abs().over(order_by="id") | ||
|
|
||
| assert (a.cum_sum() + 1).over(order_by="id") | ||
| assert a.cum_sum().cum_sum().over(order_by="id") | ||
| assert a.cum_sum().cum_sum() | ||
| assert nwd.sum_horizontal(a, a.cum_sum()) | ||
| with pytest.raises(InvalidOperationError, match=ELEMENTWISE_ERR): | ||
| assert nwd.sum_horizontal(a, a.cum_sum()).over(order_by="a") | ||
|
|
||
| assert nwd.sum_horizontal(a, a.cum_sum().over(order_by="i")) | ||
| assert nwd.sum_horizontal(a.diff(), a.cum_sum().over(order_by="i")) | ||
| with pytest.raises(InvalidOperationError, match=ELEMENTWISE_ERR): | ||
| assert nwd.sum_horizontal(a.diff(), a.cum_sum()).over(order_by="i") | ||
|
|
||
| with pytest.raises(InvalidOperationError, match=ELEMENTWISE_ERR): | ||
| assert nwd.sum_horizontal(a.diff().abs(), a.cum_sum()).over(order_by="i") |
There was a problem hiding this comment.
@MarcoGorelli quick question
This is adapted from an existing test:
tests.expression_parsing_test.test_window_kind
narwhals/tests/expression_parsing_test.py
Lines 10 to 45 in 63c8e47
AFAICT, all of the expressions I've needed a InvalidOperationError for shouldn't be valid.
But they aren't raising in current narwhals 🤔
1
import narwhals as nw
a = nw.col("a")
a.cum_sum().abs().over(order_by="id")This error explicitly mentions abs
narwhals/narwhals/_expression_parsing.py
Lines 357 to 362 in 9bd10ad
2, 3, 4
These are all raising the same as (1), but the issue seems to be that horizontal functions aren't being treated as elementwise
import narwhals as nw
a = nw.col("a")
nw.sum_horizontal(a, a.cum_sum()).over(order_by="a")
nw.sum_horizontal(a.diff(), a.cum_sum()).over(order_by="i")
nw.sum_horizontal(a.diff().abs(), a.cum_sum()).over(order_by="i")In polars, they all seem to be elementwise but with an additional flag
I've done the same in this PR, but I don't think that flag would factor into this?
narwhals/narwhals/_plan/functions.py
Lines 291 to 299 in 9bd10ad
In #3384, I started tweaking the impl but realized I had only covered it indirectly Merging ahead of that so I can be sure I don't break it
See #3435 (comment) Also - tweaked some typing - added docs - included `coalesce` in the same path
Haven't got any usage of `*exprs` anymore - just generators passed as a single arg
Will close #2571
What type of PR is this? (check all applicable)
Related issues
Exprinternal representation #2571(sort:updated-desc "(expr-ir)/" in:title)Checklist
If you have comments or can explain your changes, please do so below
Important
See (#2571) for detail!!!!!!!
Very open to feedback
Tasks
Show 2025 May-July
pl.Expr.metapl.Expr.meta)ExprIRmetamethods_typing_compatmodule #2578Merge another PR with (perf: Avoid module-levelimportlib.util.find_spec#2391 (comment)) firstTypeVardefaults moreTypeVar("T", bound=Thing, default=Thing), instead of an opaqueExprIRSelector(s)narwhals/narwhals/_plan/expr.py
Lines 336 to 337 in 0bada48
BinaryExprthat describes the restricted set of operators that are allowedpolars, since they wrappl.colinternallyIntoExprin more places (including and beyond whatnarwhalsallows now)demo.py*_horizontalconcat_str)dummy.pyover,sort_by)FunctionOptions+ friends see commentWhere does the{flags: ...}->{collect_groups: ..., flags: ...}expansion happen?polars>=1.3.0fixed the issue (see comment)Ternarywhen-then-otherwise🥳)Metais_*,has_*,output_namemeta methodsroot_namesundo_aliases,popNamename.py(Expr::KeepName,Expr::RenameAlias)polarswill help with themetamethodsCat,Struct,List(a3e29d1)String(72c33ce)DateTime(aee0a7e)_expression_parsing.pyrulesrustversion worksExpansionFlagsexpand_function_inputsrewrite_projectionsreplace_selectorexpand_selectorreplace_selector_innerreplace_and_add_to_resultsreplace_nthprepare_excludedexpand_columnsexpand_dtypesreplace_dtype_or_index_with_columndtypes_match(probably can solve w/ existingnarwhals)expand_indicesreplace_index_with_columnreplace_wildcardrewrite_special_aliasesreplace_wildcard_with_columnreplace_regexexpand_regexExprIR.map_irExprIR #2572 (comment))ExprIR.map_irfor most nodesWindowExpr.map_irFunctionExpr.map_irRollingExpr,AnonymousExprinheritselectorsExprIR(main) #3066ExprIR(main) #3066 (comment))_planpackage #3122group_by, utilizepyarrow.acero#3143{Expr,Series}.{first,last}#2528)protocols.py#3166order_by,hashjoin,DataFrame.{filter,join},Expr.is_{first,last}_distinct#3173__dict__appearing onImmutablesubclasses (thread)__slots__, and not__dict__too #3201ExpansionFlags.from_ir#3206ColumnNameOrSelectorcan be addedExpr.meta.serialize,Expr.deserializepc.Expressionover(*partition_by)#3224rank,with_row_index_by,over(*partition_by, order_by=...)#3295Selectoroverhaul #3233date_range, support{date,int}_range(eager=...)#3294ArrowExpr#3325dt.*,str.to_date(time)dt.*str.to_date(time)ExprIRwith in sync withnarwhalslist.*aggregate methods #3353list.sort#3359{Expr,Series}.any_value#3315overwith currentpolars#3352ArrowExprboilerplatearrow.functionsa package #3384ArrowListNamespace-> (ArrowExprListNamespace,ArrowScalarListNamespace)len,getListScalarEager{DataFrame,Namespace}when thenarwhals-level depends on themconcatcommentSelectors that do more than select #3376LogicalPlan🚀over(*partition_by)windows between expressions in the same contextmaincan do this for multi-output (by not expanding)LogicalPlanIR+ what it looks like:sqlglot(andibis)datafusioncudf_polarsDataTypepropagation and supertyping rulesconcat(..., how={"vertical_relaxed", "diagonal_relaxed"})#3386LogicalPlansnarwhalsbehavior, but likely more of an issue whenSelectors can be used in more placesoh-nodestopolarsinternalsExpr) features forLogicalPlanconcat#3371read_{csv,parquet}#3370DataFrame.write_{csv,parquet}#3369BaseFrameDataFrame.join_asof#3378BaseFrame.unpivot#3368BaseFrame.unnest#3414DataFrameDataFrame.pivot#3373DataFrame.unique#3364