Skip to content

Fix arm64 OOB read#329

Merged
klauspost merged 2 commits intomasterfrom
arm64-oob-read
Mar 6, 2026
Merged

Fix arm64 OOB read#329
klauspost merged 2 commits intomasterfrom
arm64-oob-read

Conversation

@klauspost
Copy link
Owner

@klauspost klauspost commented Mar 6, 2026

Fixes #328

ARM64 NEON/SVE assembly (mulNeon_10xN, mulSve_10xN) unconditionally reads 10 input pointers and 10 inputs' worth of matrix coefficients, regardless of actual input count. When len(in) < 10, both the in slice and the matrix are read out-of-bounds.

Three-part approach with zero allocation overhead:

  1. Matrix padding at source (galois.go): genCodeGenMatrix now sizes the matrix for max(inputs, codeGenPadInputs) inputs. Extra coefficients are zeroed — 0 × anything = 0 in GF(2^8), so they don't affect output. On amd64 (codeGenPadInputs=1), the compiler eliminates the dead branch.

  2. Stack-array input padding (galois_gen_switch_arm64.go): Each dispatch function pads in using a [10][]byte on the stack (240 bytes, no escape via //go:noescape asm). Padding entries reuse in[0] — valid memory, irrelevant data since matrix coefficients are zero.

  3. Pool buffer sizing (reedsolomon.go): Pool rounds dataShards up to the next multiple of codeGenPadInputs. Needed because codeSomeShardsAVXP carves sequential chunk matrices from one pool buffer — e.g. 17 inputs chunked [10, 7] needs ceil(17/10)×10 = 20 input slots

Summary by CodeRabbit

  • Performance Improvements
    • Optimized Galois field multiplication with enhanced lookup tables for improved speed
    • More efficient memory allocation through intelligent padding optimization
    • Enhanced ARM64 assembly code paths with better input handling

Fixes #328

ARM64 NEON/SVE assembly (mulNeon_10xN, mulSve_10xN) unconditionally reads 10 input pointers and 10 inputs' worth of matrix coefficients, regardless of actual input count. When len(in) < 10, both the in slice and the matrix are read out-of-bounds.

Three-part approach with zero allocation overhead:

  1. Matrix padding at source (galois.go): genCodeGenMatrix now sizes the matrix for max(inputs, codeGenPadInputs) inputs. Extra coefficients are zeroed — 0 × anything = 0 in GF(2^8), so they don't affect output. On amd64
  (codeGenPadInputs=1), the compiler eliminates the dead branch.

  2. Stack-array input padding (galois_gen_switch_arm64.go): Each dispatch function pads in using a [10][]byte on the stack (240 bytes, no escape via //go:noescape asm). Padding entries reuse in[0] — valid memory, irrelevant data since matrix coefficients are zero.

  3. Pool buffer sizing (reedsolomon.go): Pool rounds dataShards up to the next multiple of codeGenPadInputs. Needed because codeSomeShardsAVXP carves sequential chunk matrices from one pool buffer — e.g. 17 inputs chunked [10, 7] needs ceil(17/10)×10 = 20 input slots
@klauspost
Copy link
Owner Author

@fwessels Please review proposed fix. You are welcome to send an alternative to fix the assembly.

  1. Output-dependent step sizes — Changed from hardcoded size, stepsize = 64, 64 to 32, 32 for outputs >= 4 (matching
  testGenGaloisUpto10x10). This fixes the SIGSEGV because testGenGaloisEarlyAbort with size-1 = 31 is below the 32-byte
  assembly minimum, so it returns without touching the nil matrix.
  2. Inner loop over all input counts 1..10 — Instead of only testing input = codeGenMaxInputs (10), the function now
  iterates input from 1 to codeGenMaxInputs, exercising the matrix-padding and in-slice-padding paths for inputs < 10.
@klauspost
Copy link
Owner Author

@coderabbitai Why aren't you reviewing this?

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 6, 2026

@klauspost Apologies for the delay. Triggering a review now.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 6, 2026

📝 Walkthrough

Walkthrough

This PR addresses an ARM64 SIGSEGV crash in mulNeon_10x10 by introducing input padding logic across code generation paths. It adds codeGenPadInputs constants to define platform-specific padding requirements and modifies ARM64 functions to pad input slices to match assembly expectations, alongside updating test infrastructure.

Changes

Cohort / File(s) Summary
Galois Field Optimization
galois.go
Adds mulTable16, mulTable16Init, and getMulTable16 for lookup-table-based Galois field multiplication. Modifies genCodeGenMatrix to apply input padding logic using paddedInputs variable and clears stale coefficients in the buffer when padding is applied.
Platform Padding Constants
galois_gen_none.go, galois_gen_switch_amd64.go, galois_gen_switch_nopshufb_amd64.go, galois_gen_switch_nopshufb_arm64.go
Adds codeGenPadInputs constant set to 1 across multiple platform-specific configuration files to define baseline padding requirements.
ARM64 Runtime Padding Implementation
galois_gen_switch_arm64.go
Sets codeGenPadInputs = codeGenMaxInputs and adds runtime padding logic to galMulSlicesSve, galMulSlicesSveXor, galMulSlicesNeon, and galMulSlicesNeonXor to pad input slices to fixed length for consistent assembly code-generation paths.
Test Infrastructure Updates
galois_test.go, galois_arm64_test.go
Modifies testGenGaloisUpto10x10 matrix dimension construction and heuristic logic; adds new testGenGalois10xN function to exercise ARM64 assembly variants limited to 10xN shapes; updates TestGenGalois to call the new test function.
Memory Pool Sizing
reedsolomon.go
Adjusts memory pool allocation in the codeGen path to account for input padding by computing paddedDS based on codeGenPadInputs and using it for buffer size calculations.

Possibly related PRs

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Fix arm64 OOB read' directly addresses the main objective stated in the PR: preventing out-of-bounds reads in ARM64 assembly routines.
Linked Issues check ✅ Passed The PR implements all three coding requirements from issue #328: matrix padding to prevent OOB access, stack-array input padding in dispatch functions, and pool buffer sizing adjustments.
Out of Scope Changes check ✅ Passed All changes directly support the OOB fix objective. Code modifications in galois.go, galois_gen_*.go, reedsolomon.go address padding logic; test changes support validation of the fix for inputs < 10.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch arm64-oob-read

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
galois_gen_switch_arm64.go (1)

102-109: Duplicated padding logic across functions.

The padding blocks in galMulSlicesSveXor, galMulSlicesNeon, and galMulSlicesNeonXor are identical to galMulSlicesSve. Consider extracting a helper to reduce repetition, though not blocking given the generated-code context.

Also applies to: 153-160, 204-211

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@galois_gen_switch_arm64.go` around lines 102 - 109, The padding logic used in
galMulSlicesSve, galMulSlicesSveXor, galMulSlicesNeon, and galMulSlicesNeonXor
is duplicated; extract it into a small helper (e.g., padToCodeGenMaxInputs or
padInputs) that takes the incoming slice [][]byte and returns a []byte-slice of
length codeGenMaxInputs with elements copied and padded by repeating in[0];
replace the inline blocks in each of those functions with a call to that helper
to centralize behavior and eliminate repetition while preserving the current
padding semantics.
galois_test.go (1)

404-434: Consider extracting common logic to reduce duplication.

testGenGalois10xN is nearly identical to testGenGaloisUpto10x10. The only apparent difference is the function name and doc comment. Both iterate the same loops, create the same matrix, and run the same tests.

If there's no behavioral difference, consider consolidating into a single parameterized function or calling one from the other.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@galois_test.go` around lines 404 - 434, The two nearly identical functions
testGenGalois10xN and testGenGaloisUpto10x10 should be consolidated to remove
duplication: create a single parameterized helper (e.g., runGenGaloisTests) that
accepts the function-name/variant identifier (or simply the f and fXor callbacks
and vectorLength) and contains the shared loop/matrix creation and test calls,
then replace both testGenGalois10xN and testGenGaloisUpto10x10 with thin
wrappers that call this helper (or remove one and call the other directly) so
all shared logic lives in one place and both behaviors remain identical.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@galois_gen_switch_arm64.go`:
- Around line 102-109: The padding logic used in galMulSlicesSve,
galMulSlicesSveXor, galMulSlicesNeon, and galMulSlicesNeonXor is duplicated;
extract it into a small helper (e.g., padToCodeGenMaxInputs or padInputs) that
takes the incoming slice [][]byte and returns a []byte-slice of length
codeGenMaxInputs with elements copied and padded by repeating in[0]; replace the
inline blocks in each of those functions with a call to that helper to
centralize behavior and eliminate repetition while preserving the current
padding semantics.

In `@galois_test.go`:
- Around line 404-434: The two nearly identical functions testGenGalois10xN and
testGenGaloisUpto10x10 should be consolidated to remove duplication: create a
single parameterized helper (e.g., runGenGaloisTests) that accepts the
function-name/variant identifier (or simply the f and fXor callbacks and
vectorLength) and contains the shared loop/matrix creation and test calls, then
replace both testGenGalois10xN and testGenGaloisUpto10x10 with thin wrappers
that call this helper (or remove one and call the other directly) so all shared
logic lives in one place and both behaviors remain identical.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c0522111-7099-435d-bdec-151dc4de22d7

📥 Commits

Reviewing files that changed from the base of the PR and between 182993e and 0a5e903.

⛔ Files ignored due to path filters (1)
  • _gen/gen.go is excluded by !**/_gen/**
📒 Files selected for processing (9)
  • galois.go
  • galois_arm64_test.go
  • galois_gen_none.go
  • galois_gen_switch_amd64.go
  • galois_gen_switch_arm64.go
  • galois_gen_switch_nopshufb_amd64.go
  • galois_gen_switch_nopshufb_arm64.go
  • galois_test.go
  • reedsolomon.go

@fwessels
Copy link
Contributor

fwessels commented Mar 6, 2026

Good catch -- and the fix properly addresses it, pls merge.

@klauspost klauspost merged commit a7ce274 into master Mar 6, 2026
16 checks passed
@klauspost klauspost deleted the arm64-oob-read branch March 6, 2026 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

arm64 signal SIGSEGV in mulNeon_10x10

2 participants