feat: Add order_by option in with_row_index method#2683
Conversation
|
I let 87ebad0 sneak in - it avoids using |
narwhals/dataframe.py
Outdated
|
|
||
| Arguments: | ||
| name: The name of the column as a string. The default is "index". | ||
| order_by: Column(s) to order by when computing the row index. Must be not None. |
There was a problem hiding this comment.
I think it's good to specify that order_by must be not None although it's actually allowed for backward compat for polars and dask - all other lazy backends will raise an exception
There was a problem hiding this comment.
random idea: what if we allow None for dask and polars in v1 only?
at the moment the docstring contradicts the type of the args. I can see it may be confusing
There was a problem hiding this comment.
random idea: what if we allow None for dask and polars in v1 only?
Yes that's probably the best way to go
at the moment the docstring contradicts the type of the args. I can see it may be confusing
One tiny issue is that we have another argument name which has a default already, so the next one also requires a default
There was a problem hiding this comment.
you can make it required by making it keyword-only,
def with_row_index(
self, name: str = "index", *, order_by: str | Sequence[str]
) -> Self:There was a problem hiding this comment.
That's still got a syntax error
Kinda related to (#2683 (comment))
You can do this and remove the mention of None in the docs (#2683 (comment))
def with_row_index(
self, name: str = "index", order_by: str | Sequence[str] = ()
) -> Self:
if not order_by: ...If you want something more explicit, you could use a sentinel like in https://github.com/pola-rs/polars/blob/2494fcc9f9d2f34db2875e485bb4ce10bf8d97a5/py-polars/polars/expr/expr.py#L10419-L10426
There was a problem hiding this comment.
My understanding for a sentinel like that is that it's useful when None can be a valid parameter. Which is not the case here 🙈
There was a problem hiding this comment.
My understanding for a sentinel like that is that it's useful when None can be a valid parameter. Which is not the case here 🙈
Yeah good point, forget I brought sentinels into the mix 😄
() would be my preference - but ultimately we're just balancing trade-offs of backcompat vs a new default that raises
EdAbati
left a comment
There was a problem hiding this comment.
It is a great addition!
(hope you don't mind my comments, I'm strarting to catchup by reviewing some PRs :D)
narwhals/dataframe.py
Outdated
|
|
||
| Arguments: | ||
| name: The name of the column as a string. The default is "index". | ||
| order_by: Column(s) to order by when computing the row index. Must be not None. |
There was a problem hiding this comment.
random idea: what if we allow None for dask and polars in v1 only?
at the moment the docstring contradicts the type of the args. I can see it may be confusing
Co-authored-by: Edoardo Abati <29585319+EdAbati@users.noreply.github.com>
Another part of #2683 (comment) `select` ends with a `_with_native`, so the second one is redundant
narwhals/dataframe.py
Outdated
| if order_by is None: | ||
| msg = ( | ||
| "`LazyFrame.with_row_index` requires `order_by` to be specified as it is an " | ||
| "order-dependent operation." | ||
| ) | ||
| raise ValueError(msg) |
There was a problem hiding this comment.
If it's not allowed to be None and we raise immediately, shall we just remove it from the signature altogether?
Something like
diff --git a/narwhals/dataframe.py b/narwhals/dataframe.py
index 43d864b1e..3576f84b8 100644
--- a/narwhals/dataframe.py
+++ b/narwhals/dataframe.py
@@ -138,12 +138,6 @@ class BaseFrame(Generic[_FrameT]):
) -> R:
return function(self, *args, **kwargs)
- def with_row_index(self, name: str, order_by: str | Sequence[str] | None) -> Self:
- order_by_ = [order_by] if isinstance(order_by, str) else order_by
- return self._with_compliant(
- self._compliant_frame.with_row_index(name, order_by=order_by_)
- )
-
def drop_nulls(self, subset: str | list[str] | None) -> Self:
subset = [subset] if isinstance(subset, str) else subset
return self._with_compliant(self._compliant_frame.drop_nulls(subset=subset))
@@ -1096,7 +1090,10 @@ class DataFrame(BaseFrame[DataFrameT]):
a: [[1,2]]
b: [[4,5]]
"""
- return super().with_row_index(name=name, order_by=order_by)
+ order_by_ = [order_by] if isinstance(order_by, str) else order_by
+ return self._with_compliant(
+ self._compliant_frame.with_row_index(name, order_by=order_by_)
+ )
@property
def schema(self) -> Schema:
@@ -2480,7 +2477,10 @@ class LazyFrame(BaseFrame[FrameT]):
| b: [[5,4]] |
└──────────────────┘
"""
- return super().with_row_index(name=name, order_by=order_by)
+ order_by_ = [order_by] if isinstance(order_by, str) else order_by
+ return self._with_compliant(
+ self._compliant_frame.with_row_index(name, order_by=order_by_)
+ )
@property
def schema(self) -> Schema:?
(splitting up the implementations for nw.DataFrame and nw.LazyFrame is to allow for the signatures to be slightly different, which is fine IMO)
narwhals/dataframe.py
Outdated
| def with_row_index( | ||
| self, name: str = "index", order_by: str | Sequence[str] | None = None | ||
| self, name: str = "index", *, order_by: str | Sequence[str] |
There was a problem hiding this comment.
Well I guess I learned something new today 😄
Never seen a default before a non-default - quite surprised that the * avoids the syntax error!
|
nice one, thanks for sticking to it! |


What type of PR is this? (check all applicable)
Related issues
LazyFrame.with_row_index(..., order_by=...)#2307Checklist