Skip to content

feat: Add order_by option in with_row_index method#2683

Merged
MarcoGorelli merged 50 commits intomainfrom
feat/with-row-index-by
Jun 25, 2025
Merged

feat: Add order_by option in with_row_index method#2683
MarcoGorelli merged 50 commits intomainfrom
feat/with-row-index-by

Conversation

@FBruzzesi
Copy link
Copy Markdown
Member

@FBruzzesi FBruzzesi commented Jun 15, 2025

What type of PR is this? (check all applicable)

  • 💾 Refactor
  • ✨ Feature
  • 🐛 Bug Fix
  • 🔧 Optimization
  • 📝 Documentation
  • ✅ Test
  • 🐳 Other

Related issues

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

@FBruzzesi FBruzzesi requested a review from MarcoGorelli June 15, 2025 20:53
@FBruzzesi FBruzzesi added the enhancement New feature or request label Jun 15, 2025
@FBruzzesi FBruzzesi marked this pull request as ready for review June 16, 2025 13:06
@FBruzzesi FBruzzesi added the road to v2 What takes us closer to `stable.v2` (high prio) label Jun 16, 2025
@FBruzzesi
Copy link
Copy Markdown
Member Author

I let 87ebad0 sneak in - it avoids using range(size) whenever possible. For any large sized dataset this might matter


Arguments:
name: The name of the column as a string. The default is "index".
order_by: Column(s) to order by when computing the row index. Must be not None.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good to specify that order_by must be not None although it's actually allowed for backward compat for polars and dask - all other lazy backends will raise an exception

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

random idea: what if we allow None for dask and polars in v1 only?

at the moment the docstring contradicts the type of the args. I can see it may be confusing

Copy link
Copy Markdown
Member Author

@FBruzzesi FBruzzesi Jun 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

random idea: what if we allow None for dask and polars in v1 only?

Yes that's probably the best way to go

at the moment the docstring contradicts the type of the args. I can see it may be confusing

One tiny issue is that we have another argument name which has a default already, so the next one also requires a default

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can make it required by making it keyword-only,

    def with_row_index(
        self, name: str = "index", *, order_by: str | Sequence[str]
    ) -> Self:

Copy link
Copy Markdown
Member

@dangotbanned dangotbanned Jun 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(#2683 (comment))

That's still got a syntax error

image

Kinda related to (#2683 (comment))

You can do this and remove the mention of None in the docs (#2683 (comment))

    def with_row_index(
        self, name: str = "index", order_by: str | Sequence[str] = ()
    ) -> Self:
        if not order_by: ...

If you want something more explicit, you could use a sentinel like in https://github.com/pola-rs/polars/blob/2494fcc9f9d2f34db2875e485bb4ce10bf8d97a5/py-polars/polars/expr/expr.py#L10419-L10426

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding for a sentinel like that is that it's useful when None can be a valid parameter. Which is not the case here 🙈

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding for a sentinel like that is that it's useful when None can be a valid parameter. Which is not the case here 🙈

Yeah good point, forget I brought sentinels into the mix 😄

() would be my preference - but ultimately we're just balancing trade-offs of backcompat vs a new default that raises

Copy link
Copy Markdown
Collaborator

@EdAbati EdAbati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a great addition!

(hope you don't mind my comments, I'm strarting to catchup by reviewing some PRs :D)


Arguments:
name: The name of the column as a string. The default is "index".
order_by: Column(s) to order by when computing the row index. Must be not None.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

random idea: what if we allow None for dask and polars in v1 only?

at the moment the docstring contradicts the type of the args. I can see it may be confusing

FBruzzesi and others added 2 commits June 17, 2025 09:25
Co-authored-by: Edoardo Abati <29585319+EdAbati@users.noreply.github.com>
Co-authored-by: Edoardo Abati <29585319+EdAbati@users.noreply.github.com>
Comment on lines +2483 to +2488
if order_by is None:
msg = (
"`LazyFrame.with_row_index` requires `order_by` to be specified as it is an "
"order-dependent operation."
)
raise ValueError(msg)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not allowed to be None and we raise immediately, shall we just remove it from the signature altogether?

Something like

diff --git a/narwhals/dataframe.py b/narwhals/dataframe.py
index 43d864b1e..3576f84b8 100644
--- a/narwhals/dataframe.py
+++ b/narwhals/dataframe.py
@@ -138,12 +138,6 @@ class BaseFrame(Generic[_FrameT]):
     ) -> R:
         return function(self, *args, **kwargs)
 
-    def with_row_index(self, name: str, order_by: str | Sequence[str] | None) -> Self:
-        order_by_ = [order_by] if isinstance(order_by, str) else order_by
-        return self._with_compliant(
-            self._compliant_frame.with_row_index(name, order_by=order_by_)
-        )
-
     def drop_nulls(self, subset: str | list[str] | None) -> Self:
         subset = [subset] if isinstance(subset, str) else subset
         return self._with_compliant(self._compliant_frame.drop_nulls(subset=subset))
@@ -1096,7 +1090,10 @@ class DataFrame(BaseFrame[DataFrameT]):
             a: [[1,2]]
             b: [[4,5]]
         """
-        return super().with_row_index(name=name, order_by=order_by)
+        order_by_ = [order_by] if isinstance(order_by, str) else order_by
+        return self._with_compliant(
+            self._compliant_frame.with_row_index(name, order_by=order_by_)
+        )
 
     @property
     def schema(self) -> Schema:
@@ -2480,7 +2477,10 @@ class LazyFrame(BaseFrame[FrameT]):
             |  b: [[5,4]]      |
             └──────────────────┘
         """
-        return super().with_row_index(name=name, order_by=order_by)
+        order_by_ = [order_by] if isinstance(order_by, str) else order_by
+        return self._with_compliant(
+            self._compliant_frame.with_row_index(name, order_by=order_by_)
+        )
 
     @property
     def schema(self) -> Schema:

?

(splitting up the implementations for nw.DataFrame and nw.LazyFrame is to allow for the signatures to be slightly different, which is fine IMO)

Comment on lines +2437 to +2438
def with_row_index(
self, name: str = "index", order_by: str | Sequence[str] | None = None
self, name: str = "index", *, order_by: str | Sequence[str]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I guess I learned something new today 😄

Never seen a default before a non-default - quite surprised that the * avoids the syntax error!

Copy link
Copy Markdown
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome, nice one! 🙌

@MarcoGorelli
Copy link
Copy Markdown
Member

nice one, thanks for sticking to it!

@MarcoGorelli MarcoGorelli merged commit e2f8548 into main Jun 25, 2025
32 checks passed
@MarcoGorelli MarcoGorelli deleted the feat/with-row-index-by branch June 25, 2025 14:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request road to v2 What takes us closer to `stable.v2` (high prio)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Enh]: Allow LazyFrame.with_row_index(..., order_by=...)

4 participants