Apple M1 support by palday · Pull Request #350 · apache/arrow-julia

palday · 2022-10-28T15:21:55Z

closes #345

So far, this only changes the default alignment. But that isn't sufficient because apparently that alignment isn't respected everywhere.

have tests pass locally on Apple silicon
add tests that use alignment=16 to simulate testing on M1

bkamins · 2022-10-28T16:31:03Z

@palday - can you please rebase this PR against main (to re-run the tests and make sure all passes)?

palday · 2022-10-28T16:39:41Z

@bkamins They don't pass on M1 yet -- I was mostly just pushing what I had so far in order to reduce duplication of effort. In other words, it's definitely not ready for merge.

quinnj · 2022-10-28T23:06:05Z

Thanks for starting this PR @palday; it looks like the file case is failing specifically. I'll try to look into it a little later tonight.

quinnj · 2022-11-02T13:42:56Z

Sorry for the slow response here; I've been digging into this, but it's turning into a rabbit hole where I suspect there's actually a Julia core issue happening? I'll keep digging today.

Fixes #345. Alternative to #350. Whew, a bit of a rabbit hole here. It turns out this issue doesn't have anything to do w/ the _arrow_ alignment when writing, but actually goes back to a core Julia issue, specifically the inconsistency reported in JuliaLang/julia#42326. Essentially, the issue is that different platforms (intel vs. arm64) are requiring different alignments for types like `UInt128` (8 for intel, 16 for arm64). And it turns out this isn't even a Julia issue, but all the way back to LLVM. So why does this crop up in Arrow.jl? Because when you try to serialize/deserialize a `Arrow.Primitive{Int128}` array, we're going to write it out in the proper arrow format, but when you read it back in, we've been using the zero-copy technique of: ```julia unsafe_wrap(Array, Ptr{Int128}(arrow_ptr), len) ``` in order to give the user a Julia `Vector{Int128}` to the underlying arrow data. _BUT_, in Julia when you allocate any `Vector` with an eltype size of less than 4, then only 8 bytes of alignment are specified. So the fact that most users will pass arrow data as a file (which is mmapped as `Vector{UInt8}`), a byte vector directly (`Vector{UInt8}`), or an IO (which we read as `Vector{UInt8}`), means these vectors are only 8-byte aligned. This then throws the fatal error reported in the original #350 issue about the pointer not being 16-byte aligned. So one option to consider is allowing users to pass in a 16-byte aligned arrow "source" and then that would just work, right? Well, except Julia doesn't expose any way of "upcasting" an array's alignment, so it's purely based off the array eltype. Which in turn is a struggle because the current Arrow.jl architecture assumes the source will be exactly a `Vector{UInt8}` everywhere. So essentially it requires some heruclean effort to try and pass your own aligned array; I think the only option I can think of is if you did something like: * Mmap an arrow file into Vector{UInt8} * Allocate your own Vector{UInt32} and copy arrow data into it * Use unsafe_wrap to make a Vector{UInt8} of the Vector{UInt32} data * But you _MUST_ keep a reference to the original Vector{UInt32} array otherwise, your new Vector{UInt8} gets corrupted * Pass the new Vector{UInt8} into Arrow.Table to read Anyway, not easy to say the least. The proposed solution here is as follows: * We check `sizeof(T)` to see if it's > 16 bytes, which we're using as a proxy for the alignment, regardless of platform (core devs are leaning towards the 8-byte alignment on intel actually being a bug for 16-byte primitives and I agree) * We use sizeof since the `jl_datatype_t` alignment field isn't part of the public Julia API and thus subject to change (and in fact I think it did just change location in Julia#master) * It's a good enough proxy for our purposes anyway * If the original arrow pointer _isn't_ 16-byte aligned, then we'll allocate a new `Vector{T}`, which _will_ be aligned, then copy the arrow data directly into it via pointers. Simple, easy, just one extra allocation/copy. If Julia _does_ get the ability in the future to specify a custom larger-than-eltype-required alignemnt for arrays, then we could potentially do that ourselves when reading, but it's a little tricky because we really only need to do that if there are 16-byte primitives we'll be deserializing and we don't know that until we read a schema message. So *shrug*.

quinnj · 2022-11-03T03:11:20Z

Whew, rabbit hole plunged: #357

Fixes #345. Alternative to #350. Whew, a bit of a rabbit hole here. It turns out this issue doesn't have anything to do w/ the _arrow_ alignment when writing, but actually goes back to a core Julia issue, specifically the inconsistency reported in JuliaLang/julia#42326. Essentially, the issue is that different platforms (intel vs. arm64) are requiring different alignments for types like `UInt128` (8 for intel, 16 for arm64). And it turns out this isn't even a Julia issue, but all the way back to LLVM. So why does this crop up in Arrow.jl? Because when you try to serialize/deserialize a `Arrow.Primitive{Int128}` array, we're going to write it out in the proper arrow format, but when you read it back in, we've been using the zero-copy technique of: ```julia unsafe_wrap(Array, Ptr{Int128}(arrow_ptr), len) ``` in order to give the user a Julia `Vector{Int128}` to the underlying arrow data. _BUT_, in Julia when you allocate any `Vector` with an eltype size of less than 4, then only 8 bytes of alignment are specified. So the fact that most users will pass arrow data as a file (which is mmapped as `Vector{UInt8}`), a byte vector directly (`Vector{UInt8}`), or an IO (which we read as `Vector{UInt8}`), means these vectors are only 8-byte aligned. This then throws the fatal error reported in the original #350 issue about the pointer not being 16-byte aligned. So one option to consider is allowing users to pass in a 16-byte aligned arrow "source" and then that would just work, right? Well, except Julia doesn't expose any way of "upcasting" an array's alignment, so it's purely based off the array eltype. Which in turn is a struggle because the current Arrow.jl architecture assumes the source will be exactly a `Vector{UInt8}` everywhere. So essentially it requires some heruclean effort to try and pass your own aligned array; I think the only option I can think of is if you did something like: * Mmap an arrow file into Vector{UInt8} * Allocate your own Vector{UInt32} and copy arrow data into it * Use unsafe_wrap to make a Vector{UInt8} of the Vector{UInt32} data * But you _MUST_ keep a reference to the original Vector{UInt32} array otherwise, your new Vector{UInt8} gets corrupted * Pass the new Vector{UInt8} into Arrow.Table to read Anyway, not easy to say the least. The proposed solution here is as follows: * We check `sizeof(T)` to see if it's > 16 bytes, which we're using as a proxy for the alignment, regardless of platform (core devs are leaning towards the 8-byte alignment on intel actually being a bug for 16-byte primitives and I agree) * We use sizeof since the `jl_datatype_t` alignment field isn't part of the public Julia API and thus subject to change (and in fact I think it did just change location in Julia#master) * It's a good enough proxy for our purposes anyway * If the original arrow pointer _isn't_ 16-byte aligned, then we'll allocate a new `Vector{T}`, which _will_ be aligned, then copy the arrow data directly into it via pointers. Simple, easy, just one extra allocation/copy. If Julia _does_ get the ability in the future to specify a custom larger-than-eltype-required alignemnt for arrays, then we could potentially do that ourselves when reading, but it's a little tricky because we really only need to do that if there are 16-byte primitives we'll be deserializing and we don't know that until we read a schema message. So *shrug*.

update default alignment

927fcd8

bkamins mentioned this pull request Oct 28, 2022

fix Julia 1.6 #349

Merged

Merge branch 'main' of github.com:apache/arrow-julia into pa/apple

6d4361a

palday marked this pull request as draft October 28, 2022 16:53

quinnj mentioned this pull request Nov 3, 2022

Ensure Julia types have alignment respected #357

Merged

quinnj closed this Nov 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apple M1 support#350

Apple M1 support#350
palday wants to merge 2 commits intoapache:mainfrom
palday:pa/apple

palday commented Oct 28, 2022 •

edited

Loading

Uh oh!

bkamins commented Oct 28, 2022 •

edited

Loading

Uh oh!

palday commented Oct 28, 2022

Uh oh!

quinnj commented Oct 28, 2022

Uh oh!

quinnj commented Nov 2, 2022

Uh oh!

quinnj commented Nov 3, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

palday commented Oct 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bkamins commented Oct 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

palday commented Oct 28, 2022

Uh oh!

quinnj commented Oct 28, 2022

Uh oh!

quinnj commented Nov 2, 2022

Uh oh!

quinnj commented Nov 3, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

palday commented Oct 28, 2022 •

edited

Loading

bkamins commented Oct 28, 2022 •

edited

Loading