feat: Add {nw,DataFrame}.from_dicts#3148
Conversation
86400b3 to
0e49901
Compare
{nw, DataFrame}.from_dicts
There was a problem hiding this comment.
Thanks @felixgwilliams this looks promising and already close to the finish line.
I left a few comments, arguably nitpicks.
A couple of things that are missing:
- Exposing the functionalities in
stable.v1 - A test to check for both empty data and no schema resulting in a
shape=(0,0)dataframe.
The same as:
def test_from_dict_empty(eager_backend: EagerAllowed) -> None:
result = nw.DataFrame.from_dict({}, backend=eager_backend)
assert result.shape == (0, 0)- expose from_dicts in v1 change docstrings for consistency - add test for empty data without a schema and get it running for polars - remove native namespace argument so that tests pass - Not done: replace Sequence[dict[str, Any]] with Sequence[dict[str, PythonLiteral]]
|
Thanks for all the adjustments @felixgwilliams - I would stall for a moment waiting for an opinion on #3148 (comment), aside that the feature seems ready 👌🏼 |
… tests for a non-dict mapping
…mappings with `from_dicts`
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
Closes pola-rs#24583 Downstream in `narwhals`, we discovered the typing wasn't updated alongside the runtime support added in `1.30.0` ### Related - pola-rs#22638 - pola-rs#19322 - narwhals-dev/narwhals#3148 (comment) - narwhals-dev/narwhals#3148 (comment)
This comment was marked as resolved.
This comment was marked as resolved.
- `pl.DataFrame(schema)` defaults to `None` - Having a single branch for empty `data`, means we can safely index into it
Co-authored-by: Dan Redding <125183946+dangotbanned@users.noreply.github.com>
I should really update the others soon
Seems a few of these were missed in narwhals-dev#2981
Will add an IDE preview
The ids look like `"polars0-dict"`, `"pyarrow0-mappingproxy"`
|
I noticed that when the schemas of the dicts is not consistent and schema is not specified, the behavior is different when using the PyArrow backend, because PyArrow only looks at the first row to establish the (see here), whereas pandas uses all the rows and polars seems to use the first 100 by default. Do you think it's worth highlighting the differences in the docstring or do you think "If not specified, the schema will be inferred by the native library." covers it? I think it's surprising enough to be worth mentioning, but I'm conscious of not wanting to be too verbose. |
Co-authored-by: Dan Redding <125183946+dangotbanned@users.noreply.github.com>
Well spotted!
Agreed, this does seem like it would be helpful to document, considering:
Someone will get burned by this eventually 😳 What to do?DocsAlthough we could specify these differences in the docstring of
If that's the case, then we could benefit from some narrative docs in the user guide. Note Might be best to split that out into a follow-up issue? |
Tests
Add tests for this then 😄 Each should have
I'd recommend using this fixture instead of Lines 320 to 323 in 63c5022 |
I'll write the tests this evening. Thanks for the pointers 👍. As for the docstrings, are we happy with leaving them as is and covering the issue separately? Otherwise I wrote some bullet points explaining the differences that I haven't committed yet. I don't really have a good feeling for what is too long for a narwhals docstring. |
I'm not 100% sure if you need more permissions for it, but I'd normally use a suggestion (step 7) for that kinda thing I'm happy to take a look |
…ctionary keys in from_dicts
|
@felixgwilliams sorry for the delay, I'm hoping to review this later today 🙏 |
dangotbanned
left a comment
There was a problem hiding this comment.
Thanks for the PR @felixgwilliams
I've only got a few minor suggestions - I think we're pretty much good to go
Co-authored-by: Dan Redding <125183946+dangotbanned@users.noreply.github.com>
|
Thanks @dangotbanned for your helpful suggestions. I'll be sure to remember the tip about comments being part of the code. |
dangotbanned
left a comment
There was a problem hiding this comment.
Thanks again @felixgwilliams, welcome aboard 😉
{nw, DataFrame}.from_dicts{nw,DataFrame}.from_dicts

What type of PR is this? (check all applicable)
Related issues
nw.from_dictsto convert a sequence of dictionaries representing rows to a data frame #3145nw.from_dictsto convert a sequence of dictionaries representing rows to a data frame #3145Checklist
If you have comments or can explain your changes, please do so below
This PR adds a
from_dictsfunction and methods, that can be used to create a data frame from a sequence of dicts that represent rows. It is available for eager backends.Tracking
from_dictstoIterable[Mapping[str, Any]]pola-rs/polars#24584