fix: don't downcast `large_string` to `string` unnecessarily in `concat_str` for PyArrow by MarcoGorelli · Pull Request #2176 · narwhals-dev/narwhals

MarcoGorelli · 2025-03-09T16:20:42Z

closes #2097

What type of PR is this? (check all applicable)

Related issues

Related issue #<issue number>
Closes #<issue number>

Checklist

Code follows style guide (ruff)
Tests added
Documented the changes

If you have comments or can explain your changes, please do so below

MarcoGorelli · 2025-03-09T21:09:35Z

this would break a plotly test

FAILED tests/test_optional/test_px/test_px_hover.py::test_sunburst_hoverdict_color[pyarrow] - pyarrow.lib.ArrowInvalid: Schema at index 1 was different: 
labels: large_string
parent: large_string
id: large_string
pop: double
country: large_string
continent: large_string
lifeExp: double
vs
labels: large_string
parent: string
id: large_string
pop: double
country: large_string
continent: large_string
lifeExp: double

I don't want to rush this then, leaving it out of tomorrow's release, we'll think about it for the next one

MarcoGorelli · 2025-03-13T21:37:08Z

The plotly test could be fixed by making nw.lit('a') default to large_string in PyArrow. But...should we?

will continue on the issue

dangotbanned · 2025-03-14T11:35:20Z

    return series._from_native_series(concat), offset_left + offset_right


+def cast_to_comparable_string_types(


@MarcoGorelli I think I can shorten this and avoid the # type: ignore[arg-type].

Mind if I add a commit?

sure thaknks

@MarcoGorelli I think I can shorten this and avoid the # type: ignore[arg-type].

I managed to get there in fewer characters - but LOC is bound by the name chunked_arrays 😄

(4dfc6a3)

I'll conclude my round of code golf there for today

narwhals-dev#2176 (comment)

Just reducing the indent levels a lil bit

dangotbanned · 2025-03-14T12:38:47Z

+        schema: The DataFrame schema as Schema or dict of {name: type}. If not
+            specified, the schema will be inferred by the native library.


This is more of a question than a suggestion.

Is there a preference between these two?

If not provided

and

If not specified

I probably wouldn't have noticed this if the lit(..., dtype=...) doc hadn't shown up in the diff

dtype: The data type of the literal value. If not provided, the data type will
be inferred by the native library.

dunno, don't really mind

MarcoGorelli · 2025-03-14T13:16:20Z

thanks Dan!

MarcoGorelli force-pushed the large-string branch 2 times, most recently from 82c17c3 to 44ca7f6 Compare March 9, 2025 16:30

MarcoGorelli added 2 commits March 14, 2025 09:27

dont downcast large_string -> string in concat_str for pyarrow

06a4250

more extensive test

a7a0b27

MarcoGorelli force-pushed the large-string branch from 27dc13b to a7a0b27 Compare March 14, 2025 09:32

MarcoGorelli changed the title ~~fix: map nw.String to pa.large_string (instead of pa.string) and nw.List to pa.large_list instead of pa.list_~~ fix: don't downcast large_string to string unnecessarily in concat_str Mar 14, 2025

MarcoGorelli changed the title ~~fix: don't downcast large_string to string unnecessarily in concat_str~~ fix: don't downcast large_string to string unnecessarily in concat_str for PyArrow Mar 14, 2025

MarcoGorelli marked this pull request as ready for review March 14, 2025 09:39

dangotbanned reviewed Mar 14, 2025

View reviewed changes

Comment thread narwhals/_arrow/namespace.py Outdated

dangotbanned reviewed Mar 14, 2025

View reviewed changes

Comment thread narwhals/_arrow/utils.py Outdated

dangotbanned reviewed Mar 14, 2025

View reviewed changes

Comment thread narwhals/_arrow/utils.py Outdated

dangotbanned reviewed Mar 14, 2025

View reviewed changes

dangotbanned added 4 commits March 14, 2025 11:49

refactor: simplify, avoid # type: ignore[arg-type]

4dfc6a3

narwhals-dev#2176 (comment)

use ArrowSeries.native

1ae9396

refactor: flatten concat_str

e8dda13

Just reducing the indent levels a lil bit

Merge branch 'main' into large-string

4f7db8c

dangotbanned added the fix label Mar 14, 2025

dangotbanned reviewed Mar 14, 2025

View reviewed changes

dangotbanned approved these changes Mar 14, 2025

View reviewed changes

MarcoGorelli merged commit df38225 into narwhals-dev:main Mar 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: don't downcast `large_string` to `string` unnecessarily in `concat_str` for PyArrow#2176

fix: don't downcast `large_string` to `string` unnecessarily in `concat_str` for PyArrow#2176
MarcoGorelli merged 6 commits into
narwhals-dev:mainfrom
MarcoGorelli:large-string

MarcoGorelli commented Mar 9, 2025 •

edited by dangotbanned

Loading

Uh oh!

MarcoGorelli commented Mar 9, 2025

Uh oh!

MarcoGorelli commented Mar 13, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dangotbanned Mar 14, 2025

Uh oh!

MarcoGorelli Mar 14, 2025

Uh oh!

dangotbanned Mar 14, 2025

Uh oh!

dangotbanned Mar 14, 2025 •

edited

Loading

Uh oh!

MarcoGorelli Mar 14, 2025

Uh oh!

MarcoGorelli commented Mar 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		return series._from_native_series(concat), offset_left + offset_right


		def cast_to_comparable_string_types(

		schema: The DataFrame schema as Schema or dict of {name: type}. If not
		specified, the schema will be inferred by the native library.

Conversation

MarcoGorelli commented Mar 9, 2025 • edited by dangotbanned Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this? (check all applicable)

Related issues

Checklist

If you have comments or can explain your changes, please do so below

Uh oh!

MarcoGorelli commented Mar 9, 2025

Uh oh!

MarcoGorelli commented Mar 13, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dangotbanned Mar 14, 2025

Choose a reason for hiding this comment

Uh oh!

MarcoGorelli Mar 14, 2025

Choose a reason for hiding this comment

Uh oh!

dangotbanned Mar 14, 2025

Choose a reason for hiding this comment

Uh oh!

dangotbanned Mar 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MarcoGorelli Mar 14, 2025

Choose a reason for hiding this comment

Uh oh!

MarcoGorelli commented Mar 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MarcoGorelli commented Mar 9, 2025 •

edited by dangotbanned

Loading

dangotbanned Mar 14, 2025 •

edited

Loading