llvm: Fix alloca alignment and type selection in AllocOpt#60699
llvm: Fix alloca alignment and type selection in AllocOpt#60699oscardssmith merged 3 commits intomasterfrom
Conversation
Inherit alignment from the original GC allocation with JL_SMALL_BYTE_ALIGNMENT as the minimum. Use alignment-sized integer chunks for the alloca type (matching emit_static_alloca) so SROA splits allocations into aligned pieces for better performance and vectorization. Also adds the missing setAlignment call in splitOnStack. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
| if (sz > 1) | ||
| align = MinAlign(JL_SMALL_BYTE_ALIGNMENT, NextPowerOf2(sz)); | ||
| // Inherit alignment from the original allocation, with GC alignment as minimum. | ||
| Align align(std::max((unsigned)orig_inst->getRetAlign().valueOrOne().value(), (unsigned)JL_SMALL_BYTE_ALIGNMENT)); |
There was a problem hiding this comment.
This feels loosely unsound, since this is the minimum known alignment, and not the required alignment. The JL_SMALL_BYTE_ALIGNMENT value is the largest value that julia.gc_alloc_obj is permitted to return, so it is sometimes reasonable that we can use this as a hint, but we should be sure to clarify that this overalignment is merely a hint to the layout (although being more than 16 will penalize performance since it requires a more expensive stack adjustment on entry)
(unsigned)orig_inst->getRetAlign().valueOrOne().value()
There was a problem hiding this comment.
But if the allocation required a larger alignment wouldn't we inherit that, that's why we prefer to inherit and just baseline to gc align
There was a problem hiding this comment.
getRetAlign is not the required alignment, it is the minimum, so if it requires it, this would introduce a bug here
but that said, the function isn't capable of giving more than JL_SMALL_BYTE_ALIGNMENT (16) so having getRetAlign return more than 16 here would be a miscompile, so this always gives the correct answer anyways (and increasing from there is only a runtime performance penalty, not a correctness issue)
There was a problem hiding this comment.
It’s the alignment we emitted no? If it required more it would already be a bug no? Unless you mean a gc alloc aligned that wouldn’t tell LLVM
test/llvmpasses/alloc-opt-gcframe.ll
Outdated
|
|
||
| ; CHECK-LABEL: @ccall_ptr | ||
| ; CHECK: alloca i64 | ||
| ; CHECK: alloca i128, align 16 |
There was a problem hiding this comment.
cc: @maleadt
This might cause issues for the intel backend? If I recall correctly, they don't like i128?
There was a problem hiding this comment.
Pretty sure Our Int128 emits i128.
There was a problem hiding this comment.
So this i128 should be storage only most of the time, I just followed whatever we do for base Julia
Some backends don't support integer types larger than 64 bits, so cap the element size used in emit_static_alloca and AllocOpt at i64. For allocations larger than 8 bytes, use arrays of i64 instead of i128/i256. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Inherit alignment from the original GC allocation with JL_SMALL_BYTE_ALIGNMENT as the minimum. Use alignment-sized integer chunks for the alloca type (matching emit_static_alloca) so SROA splits allocations into aligned pieces for better performance and vectorization. Also adds the missing setAlignment call in splitOnStack. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> (cherry picked from commit 54fde7e)
Inherit alignment from the original GC allocation with JL_SMALL_BYTE_ALIGNMENT as the minimum. Use alignment-sized integer chunks for the alloca type (matching emit_static_alloca) so SROA splits allocations into aligned pieces for better performance and vectorization. Also adds the missing setAlignment call in splitOnStack. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> (cherry picked from commit 54fde7e)
Inherit alignment from the original GC allocation with JL_SMALL_BYTE_ALIGNMENT
as the minimum. Use alignment-sized integer chunks for the alloca type
(matching emit_static_alloca) so SROA splits allocations into aligned pieces
for better performance and vectorization.
Also adds the missing setAlignment call in splitOnStack.
Co-Authored-By: Claude Opus 4.5 noreply@anthropic.com