feat: add multivalue replacement in `.str.replace[_all]` by camriddell · Pull Request #2886 · narwhals-dev/narwhals

camriddell · 2025-07-25T18:51:26Z

What type of PR is this? (check all applicable)

Related issues

Closes [Enh]: Allow .str.replace to receive Expr as value #2745

Checklist

Code follows style guide (ruff)
Tests added
Documented the changes

If you have comments or can explain your changes, please do so below

@MarcoGorelli any thoughts on the docs for this feature? Is it worth adding examples or do you think just the update to the typing here suffices?

camriddell · 2025-07-25T18:54:58Z

Also want a review for the error propagation and the tests. There are a decent number of backends that do not support this feature so I raised an error before calling into it. In the tests I just marked as xfail, but perhaps it makes more sense to check that the underlying TypeError is being raised?

dangotbanned · 2025-07-25T19:00:01Z

any thoughts on the docs for this feature?

I'd say we should be pretty explicit on which backends will support it

Is it just polars and duckdb?

Apologies if I'm misreading the tests 🫣

camriddell · 2025-07-25T19:27:27Z

any thoughts on the docs for this feature?

I'd say we should be pretty explicit on which backends will support it

Is it just polars and duckdb?

Apologies if I'm misreading the tests 🫣

Nope I think you've nailed it. Polars and duckdb support it natively. I added pandas via manual iteration (since pandas drops down to slow iteration for a number of ops anyways).

- propagate from backend errors - allow for backend patches to automatically override behavior

camriddell · 2025-07-28T21:02:51Z

Added some more explicit error checks, and delayed triggering errors unless the backend fails (this should force our tests to begin failing if the backend does support this feature in the future). After aggregating this behavior, here is a table showing the support for each feature by backend.

Backend	`str.replace[scalar]`	`str.replace[multivalue]`	`str.replace_all[scalar]`	`str.replace_all[multivalue]`
pandas	✅	✅ (pure Python)	✅	✅ (pure Python)
polars[eager]	✅	✅	✅	✅
polars[lazy]	✅	✅	✅	✅
duckdb	❌	❌	✅	✅
spark_like	❌	❌	✅	✅
ibis	❌	❌	✅	✅
pyarrow	✅	❌	✅	❌
dask	✅	❌	✅	❌

edit: @dangotbanned added Ibis support for replace_all[multivalue]

dangotbanned · 2025-07-28T21:35:44Z

narwhals/_pandas_like/series_str.py

+            _, other_native = align_and_extract_native(self.compliant, value)
+            values: list[str]
+            if literal:
+                values = [
+                    string.replace(pattern, repl, n)
+                    for string, repl in zip(self.native, value)
+                ]
+            else:
+                regex = re.compile(pattern)
+                values = [
+                    regex.sub(repl, string, count=max(n, 0))
+                    for string, repl in zip(self.native, value)
+                ]
+            constructor = self.compliant.__native_namespace__().Series
+            series = constructor(values, dtype=self.native.dtype, name=self.native.name)
+        return self.with_native(series)


I feel like this is going to be a footgun for performance, but could be convinced with a benchmark or two

How do the pandas, polars + duckdb versions scale for these lengths?

1_000

10_000

100_000

1_000_000

This will definitely be a slower implementation since it's pure Python, however the decision point would be between this and no support at all since the built-in .str.replace doesn't support the multivalue feature. So I was leaning towards "let's support this at the pandas level even if it is slow (akin to the pandas object dtype, which strings still default to)"

If you're still interested I'll code up a benchmark to show you the requested values.

If you're still interested I'll code up a benchmark to show you the requested values.

Indeed I am! 🙂

If we're happy with the trade-off, then I think we can do the same for pyarrow as well?

Two other options:

Emit a warning like we do in PandasLikeGroupBy when using apply

Try to find some way to do this using more of the pandas API.

My main concern is the pandas docs are full of warnings against iterating over Series/DataFrame - and currently this pays the cost twice by using zip 🤔

thanks for looking into this!

between this and no support at all

I'm very strongly on the "no support at all" side

My general philosophy is that we should be translating to a library's own API - if they then follow slow paths internally, then whatever, that's up to them

I'm in line with the opinions here, and I'm more than happy to declare the multivalue replacement in pandas as not supported and push up another commit removing that conditional branch to land this.

Just wanted to share my reasoning for incl. the pandas code above

pandas defaults to storing strings as "object" dtype backed, which under the hood is operating at the Python level.

we're adding a feature that is supported by 2 backends and not supported by 6+ other backends. Is this a problem, fine, as expected? Just thinking that as an end-user it would stink to see that a feature has such limited support.

Curious to everyone's thoughts on the above especially around point 2.?

sure, and that's pandas decision. if/when they switch to something better, then so long as we stick to the pandas api, then we'll also get those improvements. if we implement anything ourselves in python, then we won't

i'm quite interested in enabling narwhals to be a nice front-end to duckdb, so tbh i'm ok with it so long as these conditions are met (which, as far as i can tell, they are):

it's supported by at least polars lazy and duckdb

it's very simple to implement

@camriddell

Curious to everyone's thoughts on the above especially around point 2.?

@MarcoGorelli

i'm quite interested in enabling narwhals to be a nice front-end to duckdb,
it's supported by at least polars lazy and duckdb

My preference would be to not support it, since it can only serve a narrow section of users.
I'm fine with most of this - but the narrower we go - the more likely we are to push compatibility paths back to users:

👍 eager-only
👍 lazy-only
👍 eager + some lazy
👍 lazy excluding dask
👍 most of lazy
👎 only duckdb + polars

The second line in the docs is https://narwhals-dev.github.io/narwhals/

Full API support: cuDF, Modin, pandas, Polars, PyArrow.

There should probably be a * in there, but generally I'd like to push for features that can meet that claim 🙂

One way that I'm trying to think of Narwhals is "what is a good dataframe API, which doesn't make too many assumptions about its backends, meant to look like?"

To me, such a good API generally supports expressified arguments

If in this case it's just a matter of passing things down as expressified arguments, then I think it should be fine

To me, such a good API generally supports expressified arguments

@MarcoGorelli it seems like you've been spared my thoughts of what defines a good API for now 😏

We now have:

feat: Add ~~at least sqlframe~~ spark_like support

feat: ibis support for replace_all

I'm more than happy to move forward on a feature supported by everything except: pandas_like, pyarrow, dask

i'm confused by what the first sentence means sorry

narwhals-dev#2886 (comment)

dangotbanned · 2025-07-31T14:57:06Z

narwhals/_spark_like/expr_str.py

+        if isinstance(value, str):

-        return self.compliant._with_elementwise(func)
+            def func(expr: Column) -> Column:
+                return replace_all(expr, pattern_, F.lit(value))
+
+            return self.compliant._with_elementwise(func)
+        return self.compliant._with_elementwise(
+            lambda expr, value: replace_all(expr, pattern_, value), value=value
+        )


@camriddell I had a read through the pyspark docs and was curious how you reached the conclusions in (#2886 (comment))?

This one was failing because pyspark.sql.functions.lit doesn't accept narwhals._spark_like.expr.SparkLikeExpr

All I've done here is rewritten it in the style of what you had for duckdb and it seems to work 🙂

I was going to look into this today, I managed to break my local pyspark so had to first resolve that.

Ahh I had pyspark working at some point, but haven't had much luck for a while

sqlframe has been fine for me though and installing with this should get you everything except pyspark and cudf 🙏

uv pip install -e ".[dask,modin]" --group "local-dev" --reinstall

dangotbanned · 2025-07-31T15:13:06Z

narwhals/_polars/series.py

+    def replace(
+        self,
+        pattern: str,
+        value: str | PolarsSeries,
+        *,
+        literal: bool = False,
+        n: int = 1,
+    ) -> PolarsSeries:
+        if isinstance(value, Series):
+            value = value.to_native()
+
+        return self._compliant_series._with_native(
+            self._compliant_series.native.str.replace(
+                pattern,
+                value,  # type: ignore[arg-type]
+                literal=literal,
+                n=n,
+            )
+        )
+
+    def replace_all(
+        self, pattern: str, value: str | PolarsSeries, *, literal: bool = False
+    ) -> PolarsSeries:
+        if isinstance(value, Series):
+            value = value.to_native()
+
+        return self._compliant_series._with_native(
+            self._compliant_series.native.str.replace_all(
+                pattern,
+                value,  # type: ignore[arg-type]
+                literal=literal,
+            )
+        )


So there's a few things going on here:

The annotation is str | PolarsSeries (Compliant), but the runtime check is on Series (Narwhals) (https://github.com/camriddell/narwhals/blob/e422e99e3dac9e73df8fcf980b4f42ece76c4058/narwhals/_polars/series.py#L17)
i. I can't think of a case where we would have a nw.Series here

According to polars.Series.str.replace_all, this wouldn't be supported?
i. It could be a documentation issue for polars, since it looks like this uses @expr_dispatch
ii. Otherwise, we should do something like PolarsExpr.from_native(pl.lit(value.native)), and re-use the PolarExpr* version like in zfill

narwhals/narwhals/_polars/series.py

Lines 702 to 706 in 277881a

def zfill(self, width: int) -> PolarsSeries:

series = self._compliant_series

name = series.name

ns = series.__narwhals_namespace__()

return series.to_frame().select(ns.col(name).str.zfill(width)).get_column(name)

I believe both 1. and 2. are related to the below, let me know if this does not address your concerns.

Polars supports passing a Series to value in polars.Series.str.replace_all so I wanted to capture that behavior here as well. My intuition was that a user should be able to do the following

nw.Series(…).str.replace(…, nw.Series(…))

However because the Polars layer is thin, we never coerce passed arguments to their native types to be handled within the _polars.series layer (at least when I used the function signature above, I would get a nw.Series passed into the function).

I was actually planning to open an issue on Polars because the Series objects also support more than just a replacement str (disagreeing with the type hint of value in polars.Series.str.replace_all).

>>> import polars as pl # 1.30.0 >>> import numpy as np >>> pl.Series(['abc', 'ab def']).str.replace('ab', pl.Series(['def', 'GHI'])) # replace_all works the same shape: (2,) Series: '' [str] [ "defc" "GHI def" ] >>> pl.Series(['abc', 'ab def']).str.replace('ab', np.array(['def', 'GHI'])) shape: (2,) Series: '' [str] [ "defc" "GHI def" ]

i. I can't think of a case where we would have a nw.Series here

Okay I had a closer look and what you've written is correct from a typing perspective.

The issue is we really should've passed the CompliantSeries down instead, because then we can just use the __getattr__ instead of adding new methods

I'm gonna push a fix for that now 🥳

Updated

fix: Pass down CompliantSeries not nw.Series

Weirdly, the same thing has the wrong name on nw.Series 🤔

narwhals/narwhals/series.py

Lines 376 to 381 in 0e80656

def _extract_native(self, arg: Any) -> Any:

from narwhals.series import Series

if isinstance(arg, Series):

return arg._compliant_series

return arg

Related https://github.com/narwhals-dev/narwhals/pull/2886/files#r2245792714

…ddell/narwhals into pr/camriddell/2886

Pretty much the same as narwhals-dev#2886 (comment) I've added annotations for all the other ones that would accept the same

camriddell · 2025-08-05T16:11:02Z

@MarcoGorelli and @dangotbanned I believe this is ready for a more formal review.

Thanks for all of your help getting this to the finish line!

MarcoGorelli · 2025-08-08T15:58:41Z

thanks @camriddell for bearing with this one, could you resolve the conflicts please and we can try to get it in?

camriddell · 2025-08-09T20:38:42Z

@MarcoGorelli resolved the conflicts aka incorporated the new generic _sql backend.

dangotbanned · 2025-08-13T18:19:15Z

Nit: would using elementwise instead of multivalue be a better fit for the title?

camriddell · 2025-08-13T19:28:43Z

Nit: would using elementwise instead of multivalue be a better fit for the title?

I'm neutral on this, I think elementwise alone is a little ambiguous since replacement already occurs elementwise (on the input).
That said, I'm not tied to multivalue either because I also think its not entirely clear what is being talked about. Definitely open to refining the terms used here!

MarcoGorelli · 2025-08-14T16:35:46Z

Added some more explicit error checks, and delayed triggering errors unless the backend fails (this should force our tests to begin failing if the backend does support this feature in the future). After aggregating this behavior, here is a table showing the support for each feature by backend.

Backend str.replace[scalar] str.replace[multivalue] str.replace_all[scalar] str.replace_all[multivalue]
pandas ✅ ✅ (pure Python) ✅ ✅ (pure Python)
polars[eager] ✅ ✅ ✅ ✅
polars[lazy] ✅ ✅ ✅ ✅
duckdb ❌ ❌ ✅ ✅
spark_like ❌ ❌ ✅ ✅
ibis ❌ ❌ ✅ ❌
pyarrow ✅ ❌ ✅ ❌
dask ✅ ❌ ✅ ❌

Not sure if I'm misreading this or the test (or if it's out of date), but it looks like you've got this supported for Ibis too now?

MarcoGorelli

Looks great, thank you so much @camriddell , and @dangotbanned for review

Before merging, please remember to include 'cudf' in the xfails (same places as pandas/pyarrow/etc)

camriddell · 2025-08-14T16:39:34Z

Added some more explicit error checks, and delayed triggering errors unless the backend fails (this should force our tests to begin failing if the backend does support this feature in the future). After aggregating this behavior, here is a table showing the support for each feature by backend.
Backend str.replace[scalar] str.replace[multivalue] str.replace_all[scalar] str.replace_all[multivalue]
pandas ✅ ✅ (pure Python) ✅ ✅ (pure Python)
polars[eager] ✅ ✅ ✅ ✅
polars[lazy] ✅ ✅ ✅ ✅
duckdb ❌ ❌ ✅ ✅
spark_like ❌ ❌ ✅ ✅
ibis ❌ ❌ ✅ ❌
pyarrow ✅ ❌ ✅ ❌
dask ✅ ❌ ✅ ❌

Not sure if I'm misreading this or the test (or if it's out of date), but it looks like you've got this supported for Ibis too now?

It's out of date, @dangotbanned took excellent care of the Ibis support!
I'll update the table

MarcoGorelli · 2025-08-25T07:19:50Z

gentle ping on this, i think it's really close to the finish line

camriddell · 2025-08-25T14:05:11Z

gentle ping on this, i think it's really close to the finish line

Thanks for the nudge, I had missed your earlier comment about cudf

camriddell · 2025-08-26T14:59:11Z

@MarcoGorelli I believe this is ready for a final look through!

MarcoGorelli · 2025-08-28T10:56:54Z

thanks, shipping it

got a feeling there's gonna be some cudf failures but we need to do some cudf fixups anyway

further expressifications welcome!

camriddell added 2 commits July 25, 2025 06:11

add multivalue .str.replace

bd51013

fix typing for str.replace

995ab38

camriddell added 6 commits July 28, 2025 12:02

clarify errors in str.replace/replace_all tests

d09bc89

improve .str.replace/replace_all error messaging

0e879f1

- propagate from backend errors - allow for backend patches to automatically override behavior

better align with Polars error message

49bb530

test for internal (and polars) str repl exceptions

7826d38

expose types of supported signatures

614de36

pass typing in .str.replace tests

8e401c0

dangotbanned reviewed Jul 28, 2025

View reviewed changes

camriddell and others added 3 commits July 29, 2025 07:03

revert reuser_series changes

ec3d9e6

feat: Add at least sqlframe support

50cd0fe

narwhals-dev#2886 (comment)

Merge remote-tracking branch 'upstream/main' into pr/camriddell/2886

e422e99

dangotbanned reviewed Jul 31, 2025

View reviewed changes

dangotbanned added 4 commits July 31, 2025 15:42

Merge branch 'main' into enh/str-replace_multivalue

c8be4d9

fix: Pass down CompliantSeries not nw.Series

e4e1a24

Related https://github.com/narwhals-dev/narwhals/pull/2886/files#r2245792714

Merge branch 'enh/str-replace_multivalue' of https://github.com/camri…

c88dcde

…ddell/narwhals into pr/camriddell/2886

feat: ibis support for replace_all

b69fee5

Pretty much the same as narwhals-dev#2886 (comment) I've added annotations for all the other ones that would accept the same

dangotbanned added the enhancement New feature or request label Aug 1, 2025

dangotbanned mentioned this pull request Aug 1, 2025

chore(typing): Fix (Compliant|Polars)Series namespaces #2918

Merged

10 tasks

remove pandas multivalue str.replace_all

44f5120

camriddell marked this pull request as ready for review August 5, 2025 16:09

ref replace_all multivalue to new generic SQL

303aa7d

MarcoGorelli approved these changes Aug 14, 2025

View reviewed changes

add cudf to str/replace multivalue xfail

2461552

MarcoGorelli merged commit bfb61bf into narwhals-dev:main Aug 28, 2025
31 of 33 checks passed

dangotbanned mentioned this pull request Feb 5, 2026

feat: Add support for indices and values Series in Series.scatter #3444

Merged

10 tasks

	def zfill(self, width: int) -> PolarsSeries:
	series = self._compliant_series
	name = series.name
	ns = series.__narwhals_namespace__()
	return series.to_frame().select(ns.col(name).str.zfill(width)).get_column(name)

	def _extract_native(self, arg: Any) -> Any:
	from narwhals.series import Series

	if isinstance(arg, Series):
	return arg._compliant_series
	return arg

Conversation

camriddell commented Jul 25, 2025

What type of PR is this? (check all applicable)

Related issues

Checklist

If you have comments or can explain your changes, please do so below

Uh oh!

camriddell commented Jul 25, 2025

Uh oh!

dangotbanned commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

camriddell commented Jul 25, 2025

Uh oh!

camriddell commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

camriddell Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MarcoGorelli Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dangotbanned Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dangotbanned Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Updated

Uh oh!

camriddell commented Aug 5, 2025

Uh oh!

MarcoGorelli commented Aug 8, 2025

Uh oh!

camriddell commented Aug 9, 2025

Uh oh!

dangotbanned commented Aug 13, 2025

Uh oh!

camriddell commented Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MarcoGorelli commented Aug 14, 2025

Uh oh!

MarcoGorelli left a comment

Choose a reason for hiding this comment

Uh oh!

camriddell commented Aug 14, 2025

Uh oh!

MarcoGorelli commented Aug 25, 2025

Uh oh!

camriddell commented Aug 25, 2025

Uh oh!

dangotbanned commented Jul 25, 2025 •

edited

Loading

camriddell commented Jul 28, 2025 •

edited

Loading

camriddell Jul 29, 2025 •

edited

Loading

MarcoGorelli Aug 14, 2025 •

edited

Loading

dangotbanned Jul 31, 2025 •

edited

Loading

dangotbanned Jul 31, 2025 •

edited

Loading

camriddell commented Aug 13, 2025 •

edited

Loading