refactor: Add CompliantDataFrame.from_arrow#2306
Conversation
- Defined `ArrowStreamExportable` at runtime here - Currently, just a typing-only def in `nw.functions`
Marked as draft since I'll do another refactor pass
Couldn't understand the `TypeVar` that `_PandasConvertible` has resolved when partially binding the method
Makes more sense for the logic to stay there
- Still need to refine the error handling for each backend - Most of it can live in `ArrowDataFrame.from_arrow` - Also want to widen this to accept `pa.Table` - since we support `pyarrow>=11`, I think this function still provide some value for those earlier versions
Helper that `polars` and `pandas` can share, without needing to worry about missing imports
- First is for non-`pa.Table` before `14` - Second is just for exhaustiveness
I only needed it when using a callable, but removed that route
I only wanted a one line description 😞 https://results.pre-commit.ci/run/github/760058710/1743170690.NGBs6mxrTqGQAiGuhrhWqQ
|
@MarcoGorelli I'm close to finishing, just wanted to check in and see if you had any thoughts on supporting I left some notes in the description on why it makes sense to me 🙂 Would be very easy to support across all versions |
|
thanks @dangotbanned ! sure, should be good to accept pa.table pre-pycapsule |
Perfect, thanks @MarcoGorelli Still need to test how UpdateYeah, all good!
|
- Following the new changes, this path should work (adb6bf1) - Unclear on whether the `pandas` import will occur though
tests/from_pycapsule_test.py
Outdated
| monkeypatch.delitem(sys.modules, "pandas") | ||
| df = nw.from_native(tbl, eager_only=True) | ||
| result = nw.from_arrow(df, backend=pl) | ||
| if PYARROW_VERSION < (14,): |
There was a problem hiding this comment.
https://github.com/narwhals-dev/narwhals/actions/runs/14134927029/job/39604239875?pr=2306
Feel like this would be good to have coverage for, but not sure how feasible that is with the current CI.
Adding a # pragma: no cover seems wrong
There was a problem hiding this comment.
Also, it means the previous xfail(s) were never triggered?
There was a problem hiding this comment.
@MarcoGorelli I was planning to adapt the rest of the tests in a similar way following
Do you have a preference on what to do here?
CompliantDataFrame.from_arrowCompliantDataFrame.from_arrow
Pending #2306 (comment)
| backend_version = context._backend_version | ||
| if isinstance(data, pa.Table): | ||
| native = data | ||
| elif backend_version >= (14,) or isinstance(data, Collection): |
There was a problem hiding this comment.
Supporting Collection here might seem strange.
My thinking is that were a list | tuple | dict of pyarrow objects passed - pa.table would still be capable of turning that into a pa.Table:
https://github.com/apache/arrow/blob/7c1b96e26910dd553ee0ccf0792e1854af6b0021/python/pyarrow/table.pxi#L4993-L5000
MarcoGorelli
left a comment
There was a problem hiding this comment.
nice, thanks @dangotbanned !
| msg = "congratulations, you entered unreachable code - please report a bug" | ||
| raise AssertionError(msg) |
There was a problem hiding this comment.
lol, can't believe i wrote this, we may want to make a utility with a consistent message for these at some point
There was a problem hiding this comment.
Certainly made me smile when I copied it over 😄

What type of PR is this? (check all applicable)
Related issues
Checklist
If you have comments or can explain your changes, please do so below
ArrowDataFrame.from_arrowpa.Tablepyarrow>=11, I think this function can still provide some value for those earlier versionsArrow*impl has similar logic topa.tableinpyarrow==13.0.0