Conversation
Series.from_iterable
I feel like we should be issuing a warning here at-a-minimum narwhals/narwhals/_pandas_like/utils.py Lines 504 to 511 in 1cfcafb But really I'd want an exception raised and an update to the UpdateMoved to #2964 |
| def _dataframe(self) -> type[DataFrame[Any]]: | ||
| return DataFrame | ||
|
|
||
| # TODO @dangotbanned: Fix `from_numpy` override missing in `v2` in another PR |
There was a problem hiding this comment.
Note
- Resolve missing
v2method(s) and tests in another PR
This comment was marked as resolved.
This comment was marked as resolved.
Going to parametrize more for `test_series_from_iterable` #2933 (comment)
`0.20.7` is close enough to our minimum and this conversion I can see being more common than the rest
|
Thanks for the review @MarcoGorelli 😍
Sure I could take a look! Would you be able to run They aren't showing in my top 100 slowest: Show durations
>>> pytest --durations=100
====== slowest 100 durations ===
2.10s call tests/tpch_q1_test.py::test_q1[dask]
1.35s call tests/frame/schema_test.py::test_nested_dtypes_dask
1.27s call tests/v1_test.py::test_gather_every_dask_v1[2-1]
1.12s call tests/no_imports_test.py::test_dask
1.11s call tests/v1_test.py::test_gather_every_dask_v1[1-2]
0.98s call tests/dependencies/imports_test.py::test_to_native_namespace_min_version[dask]
0.85s call tests/namespace_test.py::test_preserve_type_var[dask]
0.67s call tests/expr_and_series/over_test.py::test_over_unsupported_dask
0.61s call tests/new_series_test.py::test_new_series_dask
0.48s call tests/translate/from_native_test.py::test_eager_only_lazy_dask[False-context0]
0.44s call tests/frame/join_test.py::test_anti_join[sqlframe-join_key0-filter_expr0-expected0]
0.43s call tests/expr_and_series/rolling_mean_test.py::test_rolling_mean_expr_lazy_ungrouped[sqlframe-expected_a1-2-2-False]
0.42s call tests/frame/join_test.py::test_join_duplicate_column_names[sqlframe]
0.42s call tests/expr_and_series/rank_test.py::test_lazy_rank_expr_desc[sqlframe-data1-max]
0.41s call tests/frame/lazy_test.py::test_lazy_backend[pandas-dask1]
0.41s call tests/expr_and_series/quantile_test.py::test_quantile_expr[ibis-lower-expected0]
0.40s call tests/expr_and_series/diff_test.py::test_diff_lazy[sqlframe]
0.39s call tests/expr_and_series/rolling_sum_test.py::test_rolling_sum_expr_lazy_grouped[sqlframe-expected_a5-4-1-True]
0.37s call tests/expr_and_series/is_first_distinct_test.py::test_is_first_distinct_expr_lazy_grouped[sqlframe]
0.36s call tests/expr_and_series/operators_test.py::test_comparand_operators_expr[sqlframe-__gt__-expected5]
0.36s call tests/modern_polars/unpivot_test.py::test_unpivot[sqlframe]
0.35s call tests/expr_and_series/name/to_uppercase_test.py::test_to_uppercase_anonymous[sqlframe]
0.34s call tests/v1_test.py::test_toplevel
0.34s call tests/expr_and_series/rolling_mean_test.py::test_rolling_mean_expr_lazy_grouped[sqlframe-expected_a1-2-2-False]
0.33s call tests/expr_and_series/binary_test.py::test_expr_binary[sqlframe]
0.33s call tests/frame/join_test.py::test_anti_join[pandas[pyarrow]-join_key2-filter_expr2-expected2]
0.33s call tests/expr_and_series/name/prefix_test.py::test_suffix_after_alias[sqlframe]
0.33s call tests/expr_and_series/exp_test.py::test_exp_expr[sqlframe]
0.33s call tests/system_info_test.py::test_get_deps_info
0.32s call tests/frame/select_test.py::test_select_duplicates[ibis]
0.32s call tests/frame/join_test.py::test_inner_join_single_key[sqlframe]
0.32s call tests/expr_and_series/shift_test.py::test_shift_lazy[sqlframe]
0.32s call tests/dtypes_test.py::test_2d_array[sqlframe]
0.31s call tests/frame/join_test.py::test_joinasof_time[pandas[pyarrow]-forward-expected1]
0.31s call tests/v2_test.py::test_with_version[sqlframe]
0.31s call tests/expr_and_series/rank_test.py::test_lazy_rank_expr[sqlframe-data0-dense]
0.31s call tests/frame/join_test.py::test_inner_join_two_keys[sqlframe]
0.31s call tests/expr_and_series/null_count_test.py::test_null_count_expr[sqlframe]
0.30s call tests/expr_and_series/cum_sum_test.py::test_lazy_cum_sum_grouped[sqlframe-True-expected_a1]
0.30s call tests/selectors_test.py::test_datetime[polars[eager]]
0.30s call tests/expr_and_series/cum_sum_test.py::test_lazy_cum_sum_ordered_by_nulls[sqlframe-False-expected_a0]
0.30s call tests/translate/from_native_test.py::test_eager_only_pass_through_main[ibis-True-True-context2]
0.28s call tests/frame/to_native_test.py::test_to_native[sqlframe]
0.28s call tests/expr_and_series/rolling_sum_test.py::test_rolling_sum_expr_lazy_ungrouped[sqlframe-expected_a0-2-None-False]
0.27s call tests/expr_and_series/mean_test.py::test_expr_mean_expr[sqlframe-expr1]
0.27s call tests/frame/explode_test.py::test_explode_single_col[sqlframe-l3-expected_values1]
0.26s call tests/frame/join_test.py::test_left_join_overlapping_column[sqlframe]
0.26s call tests/expr_and_series/all_horizontal_test.py::test_allh[sqlframe]
0.25s call tests/expr_and_series/operators_test.py::test_comparand_operators_scalar_expr[ibis-__gt__-expected5]
0.25s call tests/expr_and_series/exclude_test.py::test_exclude[ibis-exclude_selector2-expected_cols2]
0.25s call tests/expr_and_series/str/zfill_test.py::test_str_zfill[sqlframe]
0.24s call tests/expr_and_series/clip_test.py::test_clip_expr[ibis-0-4-expected1]
0.22s call tests/frame/explode_test.py::test_explode_single_col[sqlframe-l2-expected_values0]
0.22s call tests/expr_and_series/name/to_uppercase_test.py::test_to_uppercase[ibis]
0.22s call tests/system_info_test.py::test_get_sys_info
0.22s call tests/expr_and_series/fill_null_test.py::test_fill_null_limits[sqlframe]
0.21s call tests/expr_and_series/clip_test.py::test_clip_expr[sqlframe-None-4-expected2]
0.21s call tests/expr_and_series/all_horizontal_test.py::test_allh_iterator[ibis]
0.21s call tests/expr_and_series/rolling_mean_test.py::test_rolling_mean_expr_lazy_ungrouped[ibis-expected_a5-4-1-True]
0.20s call tests/expr_and_series/null_count_test.py::test_null_count_expr[ibis]
0.19s call tests/expr_and_series/operators_test.py::test_comparand_operators_expr[sqlframe-__ne__-expected1]
0.19s call tests/frame/select_test.py::test_alias_invalid[sqlframe]
0.19s call tests/system_info_test.py::test_show_versions
0.18s call tests/expr_and_series/rank_test.py::test_lazy_rank_expr_desc[duckdb-data0-average]
0.18s call tests/expr_and_series/exclude_test.py::test_exclude[sqlframe-exclude_selector0-expected_cols0]
0.18s call tests/v2_test.py::test_with_version[ibis]
0.17s call tests/frame/join_test.py::test_left_join[sqlframe]
0.17s call tests/expr_and_series/rank_test.py::test_rank_expr_in_over_desc[ibis-max]
0.17s call tests/expr_and_series/cum_sum_test.py::test_lazy_cum_sum_grouped[ibis-True-expected_a1]
0.16s call tests/expr_and_series/operators_test.py::test_logic_operators_expr_scalar[duckdb-__ror__-expected3]
0.16s call tests/translate/from_native_test.py::test_eager_only_pass_through_main[sqlframe-False-False-context0]
0.16s call tests/modern_polars/unpivot_test.py::test_unpivot[ibis]
0.16s call tests/expr_and_series/binary_test.py::test_expr_binary[ibis]
0.16s call tests/expr_and_series/exp_test.py::test_exp_expr[ibis]
0.16s call tests/frame/group_by_test.py::test_fancy_functions[sqlframe]
0.16s call tests/expr_and_series/is_first_distinct_test.py::test_is_first_distinct_expr_lazy[ibis]
0.16s call tests/expr_and_series/operators_test.py::test_comparand_operators_scalar_expr[ibis-__le__-expected2]
0.15s call tests/expr_and_series/str/zfill_test.py::test_str_zfill[ibis]
0.15s call tests/expr_and_series/rolling_sum_test.py::test_rolling_sum_expr[polars[eager]]
0.15s call tests/expr_and_series/shift_test.py::test_shift_lazy_grouped[ibis]
0.15s call tests/expr_and_series/diff_test.py::test_diff_lazy_grouped[ibis]
0.15s call tests/expr_and_series/name/prefix_test.py::test_suffix_after_alias[ibis]
0.14s call tests/selectors_test.py::test_datetime[pandas]
0.14s call tests/dtypes_test.py::test_cast_decimal_to_native
0.13s call tests/expr_and_series/binary_test.py::test_expr_binary[duckdb]
0.13s call tests/expr_and_series/dt/truncate_test.py::test_truncate_tz_aware_duckdb
0.13s call tests/expr_and_series/mean_test.py::test_expr_mean_expr[ibis-expr0]
0.13s call tests/expr_and_series/rank_test.py::test_lazy_rank_expr_desc[sqlframe-data1-ordinal]
0.13s call tests/expr_and_series/dt/truncate_test.py::test_truncate[sqlframe-1d-expected6]
0.13s call tests/expr_and_series/exp_test.py::test_exp_expr[duckdb]
0.13s call tests/frame/to_native_test.py::test_to_native[ibis]
0.12s call tests/read_scan_test.py::test_scan_csv[sqlframe]
0.12s call tests/expr_and_series/cast_test.py::test_cast[sqlframe]
0.12s call tests/expr_and_series/cast_test.py::test_cast_datetime_tz_aware[pandas[pyarrow]]
0.12s call tests/v1_test.py::test_dask_order_dependent_ops
0.12s call tests/dtypes_test.py::test_2d_array[ibis]
0.12s call tests/expr_and_series/fill_null_test.py::test_fill_null_strategies_with_limit_as_none[sqlframe]
0.12s call tests/expr_and_series/operators_test.py::test_logic_operators_expr[duckdb-__or__-expected1]
0.11s call tests/read_scan_test.py::test_scan_parquet_kwargs
0.11s call tests/expr_and_series/shift_test.py::test_shift_lazy[duckdb]
======= 8402 passed, 359 skipped, 509 xfailed in 22.81s |
So I went ahead and did some experimenting locally: Show diff
diff --git a/tests/series_only/from_iterable_test.py b/tests/series_only/from_iterable_test.py
index 733c0728c..17f7fb105 100644
--- a/tests/series_only/from_iterable_test.py
+++ b/tests/series_only/from_iterable_test.py
@@ -21,6 +21,7 @@ if TYPE_CHECKING:
ValuesView,
)
+ from _pytest.mark import ParameterSet
from typing_extensions import TypeAlias
from narwhals._namespace import EagerAllowed
@@ -29,6 +30,10 @@ if TYPE_CHECKING:
IntoIterable: TypeAlias = Callable[..., Iterable[Any]]
+def param_slow(*args: object) -> ParameterSet:
+ return pytest.param(*args, marks=pytest.mark.slow)
+
+
class UserDefinedIterable:
def __init__(self, iterable: Iterable[Any]) -> None:
self.iterable: Iterable[Any] = iterable
@@ -53,7 +58,7 @@ def dict_values(iterable: Iterable[Any]) -> ValuesView[Any]:
return dict(enumerate(iterable)).values()
-_INTO_ITER_3RD_PARTY: list[IntoIterable] = []
+_INTO_ITER_3RD_PARTY: list[IntoIterable | ParameterSet] = []
if find_spec("numpy"): # pragma: no cover
import numpy as np
@@ -64,7 +69,7 @@ else: # pragma: no cover
if find_spec("pandas"): # pragma: no cover
import pandas as pd
- _INTO_ITER_3RD_PARTY.extend([pd.Index, pd.array, pd.Series])
+ _INTO_ITER_3RD_PARTY.extend([pd.Index, param_slow(pd.array), pd.Series])
else: # pragma: no cover
...
if find_spec("polars"): # pragma: no cover
@@ -83,16 +88,20 @@ if find_spec("pyarrow"): # pragma: no cover
else: # pragma: no cover
...
-_INTO_ITER_STDLIB: tuple[IntoIterable, ...] = (
+_INTO_ITER_STDLIB: tuple[IntoIterable | ParameterSet, ...] = (
list,
tuple,
iter,
- deque,
- generator_function,
+ param_slow(deque),
+ param_slow(generator_function),
generator_expression,
)
-_INTO_ITER_STDLIB_EXOTIC: tuple[IntoIterable, ...] = dict_keys, dict_values
-INTO_ITER: tuple[IntoIterable, ...] = (
+
+_INTO_ITER_STDLIB_EXOTIC: tuple[ParameterSet, ...] = (
+ param_slow(dict_keys),
+ param_slow(dict_values),
+)
+INTO_ITER: tuple[IntoIterable | ParameterSet, ...] = (
*_INTO_ITER_STDLIB,
*_INTO_ITER_STDLIB_EXOTIC,
UserDefinedIterable,
@@ -113,7 +122,7 @@ def _ids_into_iter(obj: Any) -> str:
@pytest.mark.parametrize(
("values", "dtype"),
[
- ((4, 1, 2), nw.Int32),
+ param_slow((4, 1, 2), nw.Int32),
((-1, 5, 100), None),
((2.1, 2.7, 2.0), nw.Float64),
(("one", "two"), nw.String),
@@ -167,10 +176,10 @@ def test_series_from_iterable(
@pytest.mark.parametrize(("values", "expected_dtype"), [((4, 1, 2), nw.Int64)])
def test_series_from_iterable_infer(
- eager_backend: EagerAllowed, values: Sequence[Any], expected_dtype: IntoDType
+ eager_implementation: EagerAllowed, values: Sequence[Any], expected_dtype: IntoDType
) -> None:
name = "b"
- result = nw.Series.from_iterable(name, values, backend=eager_backend)
+ result = nw.Series.from_iterable(name, values, backend=eager_implementation)
assert result.dtype == expected_dtype
assert_equal_series(result, values, name)
@@ -182,14 +191,14 @@ def test_series_from_iterable_not_eager() -> None:
nw.Series.from_iterable("", [1, 2, 3], backend=backend)
-def test_series_from_iterable_numpy_not_1d(eager_backend: EagerAllowed) -> None:
+def test_series_from_iterable_numpy_not_1d(eager_implementation: EagerAllowed) -> None:
pytest.importorskip("numpy")
import numpy as np
with pytest.raises(ValueError, match="only.+1D numpy arrays"):
- nw.Series.from_iterable("", np.array([[0], [2]]), backend=eager_backend)
+ nw.Series.from_iterable("", np.array([[0], [2]]), backend=eager_implementation)
-def test_series_from_iterable_not_iterable(eager_backend: EagerAllowed) -> None:
+def test_series_from_iterable_not_iterable(eager_implementation: EagerAllowed) -> None:
with pytest.raises(TypeError, match="iterable.+got.+int"):
- nw.Series.from_iterable("", 2000, backend=eager_backend) # type: ignore[arg-type]
+ nw.Series.from_iterable("", 2000, backend=eager_implementation) # type: ignore[arg-type]
So besides just outright removing cases or randomly generating them - the best option would be running with Something else we could try is adapting this into a It could be good to use this as a chance to get wider coverage - in addition to just making the testing feedback loop shorter 🙂 |
This comment was marked as resolved.
This comment was marked as resolved.
narwhals/_polars/series.py
Outdated
| iterable = cast("Sequence[Any]", data) | ||
| native = pl.Series(name=name, values=iterable) | ||
| if dtype: | ||
| native = native.cast(narwhals_to_native_dtype(dtype, version)) | ||
| return cls.from_native(native, context=context) |
There was a problem hiding this comment.
I'm thinking this change is what's causing #2933 (comment)
There was a problem hiding this comment.
Well great, (revert: try not downcasting polars) did indeed fix tubular ci
But there's some version branching needed for at least polars==0.20.21
There was a problem hiding this comment.
@MarcoGorelli I think this was fixed in this polars PR you reviewed
If so, should I re-add the cast fix I had for polars<0.20.26 only - or is an xfail preferred?
There was a problem hiding this comment.
In (fix: backport respect dtype, optimize branching) - I'm taking the view that Series.from_iterable/nw.new_series is a critical entrypoint.
So the aim is to both:
- Make sure it has wide version-compatibility
- Add as little overhead as possible, especially on the happy path
The fix in (48cf17b) may have been the source of (#2933 (comment))
Trying to keep this entry point as lightweight as possible, while being reliable across versions #2933 (comment)
FBruzzesi
left a comment
There was a problem hiding this comment.
Thanks @dangotbanned - I left only one comment regarding the tests run time.
One question I have (which might be written somewhere but I missed it): are we planning to replace the code in new_series with this eventually? Everything (except the trial to see if UNKNOWN implementation which I commented below) is the same as _new_series_impl
| else: # pragma: no cover | ||
| if (not SERIES_ACCEPTS_PD_INDEX) and is_pandas_index(values): | ||
| values = values.to_series() | ||
| native = pl.Series(name, values) | ||
| if dtype_pl: | ||
| native = native.cast(dtype_pl) |
There was a problem hiding this comment.
I tried something new here when thinking about (#2933 (comment)), which might not be visible from the diff alone
- Keeps the compat code fast (we evaulate this once)
- Doesn't clutter the impl with comments
- Can use markdown to explain the problem + link to issue/PR
There was a problem hiding this comment.
Trying to find a compromise here to Marco's comment on these tests running for too long. To run half of them, one idea is to use only eager_implementation. At the end of the day it does not really matter how we instantiate the Implementation, and using eager_backend just duplicates the number of tests.
If anything, we should test somewhere else that Implementation.from_backend(Implementation.PANDAS) and Implementation.from_backend("pandas") return the same implementation. Once that guaranteed (which I am pretty sure it is), we can run half of these tests
There was a problem hiding this comment.
Thanks @FBruzzesi
Did you know that I've been doing that in this PR since (efc5c4e)?
I added eager_implementation in this PR see diff, and I agree that we should use it more elsewhere 🙂
There was a problem hiding this comment.
My bad! There is still one use of eager_backend but it's for a small number of tests anyway
There was a problem hiding this comment.
No worries!
Btw, I am open to reducing the test runtime and investigated a little in (#2933 (comment))
My preference is to defer this to another issue - but if you see it as blocking - then I can work on it here?
Absolutely! (#2933 (comment)) should cover most of it I guess the plan in my head is:
So we'll then have this and the extension point will work everywhere 🥳 from typing import Any
from narwhals._utils import Version, Implementation
from narwhals.typing import IntoBackend
from narwhals.series import Series
some_version = Version.MAIN
def new_series(
name: str,
values: Any,
dtype: IntoDType | None = None,
*,
backend: IntoBackend, # <-- or whatever we end up annotating
) -> Series[Any]:
return some_version.series.from_iterable(name, values, dtype, backend=backend) |
|
@dangotbanned @MarcoGorelli what's left in this PR? Is the concern only on test speed? Marco maybe run it once again while not live streaming, I can expect the results to be much different 😂 |


Closes #2925
What type of PR is this? (check all applicable)
Related issues
nw.new_seriesin favour ofSeries.from_iterable#2642Checklist
Tasks
v1v2pyarrowversionSeries.from_iterable#2933 (comment))