Fix arm64 OOB read by klauspost · Pull Request #329 · klauspost/reedsolomon

klauspost · 2026-03-06T14:37:27Z

Fixes #328

ARM64 NEON/SVE assembly (mulNeon_10xN, mulSve_10xN) unconditionally reads 10 input pointers and 10 inputs' worth of matrix coefficients, regardless of actual input count. When len(in) < 10, both the in slice and the matrix are read out-of-bounds.

Three-part approach with zero allocation overhead:

Matrix padding at source (galois.go): genCodeGenMatrix now sizes the matrix for max(inputs, codeGenPadInputs) inputs. Extra coefficients are zeroed — 0 × anything = 0 in GF(2^8), so they don't affect output. On amd64 (codeGenPadInputs=1), the compiler eliminates the dead branch.
Stack-array input padding (galois_gen_switch_arm64.go): Each dispatch function pads in using a [10][]byte on the stack (240 bytes, no escape via //go:noescape asm). Padding entries reuse in[0] — valid memory, irrelevant data since matrix coefficients are zero.
Pool buffer sizing (reedsolomon.go): Pool rounds dataShards up to the next multiple of codeGenPadInputs. Needed because codeSomeShardsAVXP carves sequential chunk matrices from one pool buffer — e.g. 17 inputs chunked [10, 7] needs ceil(17/10)×10 = 20 input slots

Summary by CodeRabbit

Performance Improvements
- Optimized Galois field multiplication with enhanced lookup tables for improved speed
- More efficient memory allocation through intelligent padding optimization
- Enhanced ARM64 assembly code paths with better input handling

Fixes #328 ARM64 NEON/SVE assembly (mulNeon_10xN, mulSve_10xN) unconditionally reads 10 input pointers and 10 inputs' worth of matrix coefficients, regardless of actual input count. When len(in) < 10, both the in slice and the matrix are read out-of-bounds. Three-part approach with zero allocation overhead: 1. Matrix padding at source (galois.go): genCodeGenMatrix now sizes the matrix for max(inputs, codeGenPadInputs) inputs. Extra coefficients are zeroed — 0 × anything = 0 in GF(2^8), so they don't affect output. On amd64 (codeGenPadInputs=1), the compiler eliminates the dead branch. 2. Stack-array input padding (galois_gen_switch_arm64.go): Each dispatch function pads in using a [10][]byte on the stack (240 bytes, no escape via //go:noescape asm). Padding entries reuse in[0] — valid memory, irrelevant data since matrix coefficients are zero. 3. Pool buffer sizing (reedsolomon.go): Pool rounds dataShards up to the next multiple of codeGenPadInputs. Needed because codeSomeShardsAVXP carves sequential chunk matrices from one pool buffer — e.g. 17 inputs chunked [10, 7] needs ceil(17/10)×10 = 20 input slots

klauspost · 2026-03-06T14:38:32Z

@fwessels Please review proposed fix. You are welcome to send an alternative to fix the assembly.

1. Output-dependent step sizes — Changed from hardcoded size, stepsize = 64, 64 to 32, 32 for outputs >= 4 (matching testGenGaloisUpto10x10). This fixes the SIGSEGV because testGenGaloisEarlyAbort with size-1 = 31 is below the 32-byte assembly minimum, so it returns without touching the nil matrix. 2. Inner loop over all input counts 1..10 — Instead of only testing input = codeGenMaxInputs (10), the function now iterates input from 1 to codeGenMaxInputs, exercising the matrix-padding and in-slice-padding paths for inputs < 10.

klauspost · 2026-03-06T15:04:26Z

@coderabbitai Why aren't you reviewing this?

coderabbitai · 2026-03-06T15:04:39Z

@klauspost Apologies for the delay. Triggering a review now.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai · 2026-03-06T15:04:49Z

📝 Walkthrough

Walkthrough

This PR addresses an ARM64 SIGSEGV crash in mulNeon_10x10 by introducing input padding logic across code generation paths. It adds codeGenPadInputs constants to define platform-specific padding requirements and modifies ARM64 functions to pad input slices to match assembly expectations, alongside updating test infrastructure.

Changes

Cohort / File(s)	Summary
Galois Field Optimization `galois.go`	Adds `mulTable16`, `mulTable16Init`, and `getMulTable16` for lookup-table-based Galois field multiplication. Modifies `genCodeGenMatrix` to apply input padding logic using `paddedInputs` variable and clears stale coefficients in the buffer when padding is applied.
Platform Padding Constants `galois_gen_none.go`, `galois_gen_switch_amd64.go`, `galois_gen_switch_nopshufb_amd64.go`, `galois_gen_switch_nopshufb_arm64.go`	Adds `codeGenPadInputs` constant set to 1 across multiple platform-specific configuration files to define baseline padding requirements.
ARM64 Runtime Padding Implementation `galois_gen_switch_arm64.go`	Sets `codeGenPadInputs = codeGenMaxInputs` and adds runtime padding logic to `galMulSlicesSve`, `galMulSlicesSveXor`, `galMulSlicesNeon`, and `galMulSlicesNeonXor` to pad input slices to fixed length for consistent assembly code-generation paths.
Test Infrastructure Updates `galois_test.go`, `galois_arm64_test.go`	Modifies `testGenGaloisUpto10x10` matrix dimension construction and heuristic logic; adds new `testGenGalois10xN` function to exercise ARM64 assembly variants limited to 10xN shapes; updates `TestGenGalois` to call the new test function.
Memory Pool Sizing `reedsolomon.go`	Adjusts memory pool allocation in the codeGen path to account for input padding by computing `paddedDS` based on `codeGenPadInputs` and using it for buffer size calculations.

Possibly related PRs

Add Leopard16 GFNI on amd64 #317: Adds GF16 multiply support and precomputed multiplication tables, complementing the mulTable16 optimization introduced here.
Bump go versions #326: Refactors the getMulTable16 implementation in galois.go for different loop iteration patterns.
Improve pure Go speed in low shard count #306: Introduces low-level 16-bit multiplication lookup table (mulTable16) and related pure-Go paths used across galMulSlice implementations.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Fix arm64 OOB read' directly addresses the main objective stated in the PR: preventing out-of-bounds reads in ARM64 assembly routines.
Linked Issues check	✅ Passed	The PR implements all three coding requirements from issue `#328`: matrix padding to prevent OOB access, stack-array input padding in dispatch functions, and pool buffer sizing adjustments.
Out of Scope Changes check	✅ Passed	All changes directly support the OOB fix objective. Code modifications in galois.go, galois_gen_*.go, reedsolomon.go address padding logic; test changes support validation of the fix for inputs < 10.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch arm64-oob-read

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (2)

galois_gen_switch_arm64.go (1)
102-109: Duplicated padding logic across functions.

The padding blocks in galMulSlicesSveXor, galMulSlicesNeon, and galMulSlicesNeonXor are identical to galMulSlicesSve. Consider extracting a helper to reduce repetition, though not blocking given the generated-code context.

Also applies to: 153-160, 204-211
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@galois_gen_switch_arm64.go` around lines 102 - 109, The padding logic used in
galMulSlicesSve, galMulSlicesSveXor, galMulSlicesNeon, and galMulSlicesNeonXor
is duplicated; extract it into a small helper (e.g., padToCodeGenMaxInputs or
padInputs) that takes the incoming slice [][]byte and returns a []byte-slice of
length codeGenMaxInputs with elements copied and padded by repeating in[0];
replace the inline blocks in each of those functions with a call to that helper
to centralize behavior and eliminate repetition while preserving the current
padding semantics.
galois_test.go (1)
404-434: Consider extracting common logic to reduce duplication.

testGenGalois10xN is nearly identical to testGenGaloisUpto10x10. The only apparent difference is the function name and doc comment. Both iterate the same loops, create the same matrix, and run the same tests.

If there's no behavioral difference, consider consolidating into a single parameterized function or calling one from the other.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@galois_test.go` around lines 404 - 434, The two nearly identical functions
testGenGalois10xN and testGenGaloisUpto10x10 should be consolidated to remove
duplication: create a single parameterized helper (e.g., runGenGaloisTests) that
accepts the function-name/variant identifier (or simply the f and fXor callbacks
and vectorLength) and contains the shared loop/matrix creation and test calls,
then replace both testGenGalois10xN and testGenGaloisUpto10x10 with thin
wrappers that call this helper (or remove one and call the other directly) so
all shared logic lives in one place and both behaviors remain identical.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@galois_gen_switch_arm64.go`:
- Around line 102-109: The padding logic used in galMulSlicesSve,
galMulSlicesSveXor, galMulSlicesNeon, and galMulSlicesNeonXor is duplicated;
extract it into a small helper (e.g., padToCodeGenMaxInputs or padInputs) that
takes the incoming slice [][]byte and returns a []byte-slice of length
codeGenMaxInputs with elements copied and padded by repeating in[0]; replace the
inline blocks in each of those functions with a call to that helper to
centralize behavior and eliminate repetition while preserving the current
padding semantics.

In `@galois_test.go`:
- Around line 404-434: The two nearly identical functions testGenGalois10xN and
testGenGaloisUpto10x10 should be consolidated to remove duplication: create a
single parameterized helper (e.g., runGenGaloisTests) that accepts the
function-name/variant identifier (or simply the f and fXor callbacks and
vectorLength) and contains the shared loop/matrix creation and test calls, then
replace both testGenGalois10xN and testGenGaloisUpto10x10 with thin wrappers
that call this helper (or remove one and call the other directly) so all shared
logic lives in one place and both behaviors remain identical.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c0522111-7099-435d-bdec-151dc4de22d7

📥 Commits

Reviewing files that changed from the base of the PR and between 182993e and 0a5e903.

⛔ Files ignored due to path filters (1)

_gen/gen.go is excluded by !**/_gen/**

📒 Files selected for processing (9)

galois.go
galois_arm64_test.go
galois_gen_none.go
galois_gen_switch_amd64.go
galois_gen_switch_arm64.go
galois_gen_switch_nopshufb_amd64.go
galois_gen_switch_nopshufb_arm64.go
galois_test.go
reedsolomon.go

fwessels · 2026-03-06T18:30:56Z

Good catch -- and the fix properly addresses it, pls merge.

coderabbitai bot reviewed Mar 6, 2026

View reviewed changes

klauspost merged commit a7ce274 into master Mar 6, 2026
16 checks passed

klauspost deleted the arm64-oob-read branch March 6, 2026 18:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix arm64 OOB read#329

Fix arm64 OOB read#329
klauspost merged 2 commits intomasterfrom
arm64-oob-read

klauspost commented Mar 6, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

klauspost commented Mar 6, 2026

Uh oh!

klauspost commented Mar 6, 2026

Uh oh!

coderabbitai bot commented Mar 6, 2026

Uh oh!

coderabbitai bot commented Mar 6, 2026 •

edited

Loading

Walkthrough

Changes

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

fwessels commented Mar 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

klauspost commented Mar 6, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

klauspost commented Mar 6, 2026

Uh oh!

klauspost commented Mar 6, 2026

Uh oh!

coderabbitai bot commented Mar 6, 2026

Uh oh!

coderabbitai bot commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

fwessels commented Mar 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

klauspost commented Mar 6, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 6, 2026 •

edited

Loading