Conversation
Fixes #328 ARM64 NEON/SVE assembly (mulNeon_10xN, mulSve_10xN) unconditionally reads 10 input pointers and 10 inputs' worth of matrix coefficients, regardless of actual input count. When len(in) < 10, both the in slice and the matrix are read out-of-bounds. Three-part approach with zero allocation overhead: 1. Matrix padding at source (galois.go): genCodeGenMatrix now sizes the matrix for max(inputs, codeGenPadInputs) inputs. Extra coefficients are zeroed — 0 × anything = 0 in GF(2^8), so they don't affect output. On amd64 (codeGenPadInputs=1), the compiler eliminates the dead branch. 2. Stack-array input padding (galois_gen_switch_arm64.go): Each dispatch function pads in using a [10][]byte on the stack (240 bytes, no escape via //go:noescape asm). Padding entries reuse in[0] — valid memory, irrelevant data since matrix coefficients are zero. 3. Pool buffer sizing (reedsolomon.go): Pool rounds dataShards up to the next multiple of codeGenPadInputs. Needed because codeSomeShardsAVXP carves sequential chunk matrices from one pool buffer — e.g. 17 inputs chunked [10, 7] needs ceil(17/10)×10 = 20 input slots
|
@fwessels Please review proposed fix. You are welcome to send an alternative to fix the assembly. |
1. Output-dependent step sizes — Changed from hardcoded size, stepsize = 64, 64 to 32, 32 for outputs >= 4 (matching testGenGaloisUpto10x10). This fixes the SIGSEGV because testGenGaloisEarlyAbort with size-1 = 31 is below the 32-byte assembly minimum, so it returns without touching the nil matrix. 2. Inner loop over all input counts 1..10 — Instead of only testing input = codeGenMaxInputs (10), the function now iterates input from 1 to codeGenMaxInputs, exercising the matrix-padding and in-slice-padding paths for inputs < 10.
|
@coderabbitai Why aren't you reviewing this? |
|
✅ Actions performedReview triggered.
|
📝 WalkthroughWalkthroughThis PR addresses an ARM64 SIGSEGV crash in Changes
Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
galois_gen_switch_arm64.go (1)
102-109: Duplicated padding logic across functions.The padding blocks in
galMulSlicesSveXor,galMulSlicesNeon, andgalMulSlicesNeonXorare identical togalMulSlicesSve. Consider extracting a helper to reduce repetition, though not blocking given the generated-code context.Also applies to: 153-160, 204-211
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@galois_gen_switch_arm64.go` around lines 102 - 109, The padding logic used in galMulSlicesSve, galMulSlicesSveXor, galMulSlicesNeon, and galMulSlicesNeonXor is duplicated; extract it into a small helper (e.g., padToCodeGenMaxInputs or padInputs) that takes the incoming slice [][]byte and returns a []byte-slice of length codeGenMaxInputs with elements copied and padded by repeating in[0]; replace the inline blocks in each of those functions with a call to that helper to centralize behavior and eliminate repetition while preserving the current padding semantics.galois_test.go (1)
404-434: Consider extracting common logic to reduce duplication.
testGenGalois10xNis nearly identical totestGenGaloisUpto10x10. The only apparent difference is the function name and doc comment. Both iterate the same loops, create the same matrix, and run the same tests.If there's no behavioral difference, consider consolidating into a single parameterized function or calling one from the other.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@galois_test.go` around lines 404 - 434, The two nearly identical functions testGenGalois10xN and testGenGaloisUpto10x10 should be consolidated to remove duplication: create a single parameterized helper (e.g., runGenGaloisTests) that accepts the function-name/variant identifier (or simply the f and fXor callbacks and vectorLength) and contains the shared loop/matrix creation and test calls, then replace both testGenGalois10xN and testGenGaloisUpto10x10 with thin wrappers that call this helper (or remove one and call the other directly) so all shared logic lives in one place and both behaviors remain identical.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@galois_gen_switch_arm64.go`:
- Around line 102-109: The padding logic used in galMulSlicesSve,
galMulSlicesSveXor, galMulSlicesNeon, and galMulSlicesNeonXor is duplicated;
extract it into a small helper (e.g., padToCodeGenMaxInputs or padInputs) that
takes the incoming slice [][]byte and returns a []byte-slice of length
codeGenMaxInputs with elements copied and padded by repeating in[0]; replace the
inline blocks in each of those functions with a call to that helper to
centralize behavior and eliminate repetition while preserving the current
padding semantics.
In `@galois_test.go`:
- Around line 404-434: The two nearly identical functions testGenGalois10xN and
testGenGaloisUpto10x10 should be consolidated to remove duplication: create a
single parameterized helper (e.g., runGenGaloisTests) that accepts the
function-name/variant identifier (or simply the f and fXor callbacks and
vectorLength) and contains the shared loop/matrix creation and test calls, then
replace both testGenGalois10xN and testGenGaloisUpto10x10 with thin wrappers
that call this helper (or remove one and call the other directly) so all shared
logic lives in one place and both behaviors remain identical.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c0522111-7099-435d-bdec-151dc4de22d7
⛔ Files ignored due to path filters (1)
_gen/gen.gois excluded by!**/_gen/**
📒 Files selected for processing (9)
galois.gogalois_arm64_test.gogalois_gen_none.gogalois_gen_switch_amd64.gogalois_gen_switch_arm64.gogalois_gen_switch_nopshufb_amd64.gogalois_gen_switch_nopshufb_arm64.gogalois_test.goreedsolomon.go
|
Good catch -- and the fix properly addresses it, pls merge. |
Fixes #328
ARM64 NEON/SVE assembly (mulNeon_10xN, mulSve_10xN) unconditionally reads 10 input pointers and 10 inputs' worth of matrix coefficients, regardless of actual input count. When len(in) < 10, both the in slice and the matrix are read out-of-bounds.
Three-part approach with zero allocation overhead:
Matrix padding at source (galois.go): genCodeGenMatrix now sizes the matrix for max(inputs, codeGenPadInputs) inputs. Extra coefficients are zeroed — 0 × anything = 0 in GF(2^8), so they don't affect output. On amd64 (codeGenPadInputs=1), the compiler eliminates the dead branch.
Stack-array input padding (galois_gen_switch_arm64.go): Each dispatch function pads in using a [10][]byte on the stack (240 bytes, no escape via //go:noescape asm). Padding entries reuse in[0] — valid memory, irrelevant data since matrix coefficients are zero.
Pool buffer sizing (reedsolomon.go): Pool rounds dataShards up to the next multiple of codeGenPadInputs. Needed because codeSomeShardsAVXP carves sequential chunk matrices from one pool buffer — e.g. 17 inputs chunked [10, 7] needs ceil(17/10)×10 = 20 input slots
Summary by CodeRabbit