Commit ab045a4
authored
[Data] (De)serialization of PyArrow Extension Arrays (#51972)
<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->
<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->
## Why are these changes needed?
This feature adds the ability to (de)serialize arbitrary PyArrow
extension arrays. This is needed to use Ray in code bases that use
extension arrays.
~The serialization already seemed sufficiently general, but as far as I
can tell, the deserialization can not be done in generality. Hence, this
setup allows registration of custom deserializers for extension types.~
~For serialization, the selector has been changed from `ExtensionType`
to `BaseExtensionType` to accommodate for non-Python ExtensionArrays,
like `pyarrow.FixedShapeTensorArray`.~
~This is at the moment a proof-of-concept. If you like the idea, I
suppose the registration function may need to move to a better place,
and docs need adding.~
The implementation now works without registration on any extension type.
## Related issue number
Closes #51959
## Checks
- [X] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [X] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
corresponding `.rst` file.
- [X] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
- [X] Unit tests
- [ ] Release tests
- [ ] This PR is not tested :(
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> Generalizes Arrow array (de)serialization to any
`pyarrow.BaseExtensionType`, removing tensor-specific handling and
adding tests for fixed/variable-shape tensors.
>
> - **Arrow (De)serialization**:
> - Switch from tensor-specific checks to generic
`pyarrow.BaseExtensionType` handling.
> - Reconstruct extension arrays via `type.wrap_array(storage)`;
serialize via storage payload wrapped with extension metadata.
> - Remove `ray.air.util.tensor_extensions.arrow` dependencies and
special-casing.
> - **Tests**:
> - Add roundtrip tests for `pa.FixedShapeTensorArray` and a custom
variable-shape `ExtensionType`.
> - Import `PicklableArrayPayload` in tests for constructing payloads.
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
4bbcdbe. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
---------
Signed-off-by: Pim de Haan <pim@cusp.ai>1 parent d09174e commit ab045a4
File tree
2 files changed
+64
-36
lines changed- python/ray
- _private
- data/tests
2 files changed
+64
-36
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
16 | | - | |
17 | | - | |
18 | 16 | | |
19 | 17 | | |
20 | 18 | | |
| |||
240 | 238 | | |
241 | 239 | | |
242 | 240 | | |
243 | | - | |
244 | | - | |
245 | 241 | | |
246 | 242 | | |
247 | | - | |
248 | | - | |
249 | 243 | | |
250 | 244 | | |
251 | 245 | | |
| |||
258 | 252 | | |
259 | 253 | | |
260 | 254 | | |
261 | | - | |
262 | | - | |
263 | | - | |
264 | | - | |
265 | | - | |
266 | | - | |
267 | | - | |
| 255 | + | |
268 | 256 | | |
269 | 257 | | |
270 | | - | |
| 258 | + | |
271 | 259 | | |
272 | 260 | | |
273 | 261 | | |
| |||
288 | 276 | | |
289 | 277 | | |
290 | 278 | | |
291 | | - | |
292 | | - | |
293 | | - | |
294 | | - | |
295 | 279 | | |
296 | 280 | | |
297 | 281 | | |
| |||
319 | 303 | | |
320 | 304 | | |
321 | 305 | | |
322 | | - | |
323 | | - | |
324 | | - | |
| 306 | + | |
325 | 307 | | |
326 | 308 | | |
327 | 309 | | |
| |||
630 | 612 | | |
631 | 613 | | |
632 | 614 | | |
633 | | - | |
634 | | - | |
635 | | - | |
636 | | - | |
637 | | - | |
| 615 | + | |
| 616 | + | |
| 617 | + | |
638 | 618 | | |
639 | 619 | | |
640 | 620 | | |
| |||
646 | 626 | | |
647 | 627 | | |
648 | 628 | | |
649 | | - | |
650 | | - | |
651 | | - | |
652 | | - | |
653 | | - | |
654 | | - | |
655 | | - | |
656 | | - | |
657 | | - | |
658 | | - | |
659 | 629 | | |
660 | 630 | | |
661 | 631 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
| 19 | + | |
19 | 20 | | |
20 | 21 | | |
21 | 22 | | |
| |||
595 | 596 | | |
596 | 597 | | |
597 | 598 | | |
| 599 | + | |
| 600 | + | |
| 601 | + | |
| 602 | + | |
| 603 | + | |
| 604 | + | |
| 605 | + | |
| 606 | + | |
| 607 | + | |
| 608 | + | |
| 609 | + | |
| 610 | + | |
| 611 | + | |
| 612 | + | |
| 613 | + | |
| 614 | + | |
| 615 | + | |
| 616 | + | |
| 617 | + | |
| 618 | + | |
| 619 | + | |
| 620 | + | |
| 621 | + | |
| 622 | + | |
| 623 | + | |
| 624 | + | |
| 625 | + | |
| 626 | + | |
| 627 | + | |
| 628 | + | |
| 629 | + | |
| 630 | + | |
| 631 | + | |
| 632 | + | |
| 633 | + | |
| 634 | + | |
| 635 | + | |
| 636 | + | |
| 637 | + | |
| 638 | + | |
| 639 | + | |
| 640 | + | |
| 641 | + | |
| 642 | + | |
| 643 | + | |
| 644 | + | |
| 645 | + | |
| 646 | + | |
| 647 | + | |
| 648 | + | |
| 649 | + | |
| 650 | + | |
| 651 | + | |
| 652 | + | |
| 653 | + | |
| 654 | + | |
| 655 | + | |
0 commit comments