Skip to content

Experiments: adding SIMD helpers#801

Merged
samber merged 42 commits intomasterfrom
feat/exp-simd
Feb 21, 2026
Merged

Experiments: adding SIMD helpers#801
samber merged 42 commits intomasterfrom
feat/exp-simd

Conversation

@samber
Copy link
Owner

@samber samber commented Feb 16, 2026

Introducing github.com/samber/lo/exp/simd package.

Go 1.26 brings SIMD instructions to Go.

The feature is still experimental, so i'm not adding this to the core package of lo.

# usage
import losimd "github.com/samber/lo/exp/simd"

result := losimd.SumInt8x64([]int8{1, 2, 3, ...})

Requirements:

  • amd64 architecture
  • AVX2 or AVX512 CPU
  • build with GOEXPERIMENT=simd

I've added SumInt8 variants that route the implementation to the right architecture automatically. I would be interested by your opinion on this.

Should we support vaargs ? If yes, should losimd.SumInt8x64(arg1, arg2, arg3) returns a slice of 3 results or a single result?

@Jorropo @d-enk

Warning: The experimental nature of this sub-package implies that we might have breaking changes in the future.

TODO:

  • Benchmarks
  • Documentation: it is an experimental feature, and we might break the API in the future

Copilot AI review requested due to automatic review settings February 16, 2026 14:32
@samber samber added discussion General discussion, proposal, or open-ended topic not tied to a specific bug. performance Issues or pull requests related to measuring or improving performance. go Pull requests that update go code labels Feb 16, 2026
@codecov
Copy link

codecov bot commented Feb 16, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.05%. Comparing base (3e269f8) to head (72c24d8).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #801   +/-   ##
=======================================
  Coverage   92.05%   92.05%           
=======================================
  Files          32       32           
  Lines        4205     4205           
=======================================
  Hits         3871     3871           
  Misses        255      255           
  Partials       79       79           
Flag Coverage Δ
unittests 92.05% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces experimental SIMD (Single Instruction Multiple Data) support for Go 1.26, adding a new github.com/samber/lo/exp/simd package with optimized mathematical and search operations using SSE, AVX2, and AVX-512 instruction sets.

Changes:

  • SIMD-accelerated math operations (Sum, Mean, Clamp, Min, Max) for all numeric types
  • SIMD-accelerated Contains operations for searching
  • Automatic CPU feature detection and dispatching to optimal implementation
  • Comprehensive test coverage for all SIMD variants

Reviewed changes

Copilot reviewed 17 out of 18 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
exp/simd/math.go Dispatcher functions that route to optimal SIMD implementation
exp/simd/math_sse.go SSE (128-bit) SIMD implementations
exp/simd/math_avx2.go AVX2 (256-bit) SIMD implementations
exp/simd/math_avx512.go AVX-512 (512-bit) SIMD implementations
exp/simd/intersect_*.go SIMD Contains implementations for SSE/AVX2/AVX-512
exp/simd/*_test.go Comprehensive test files for all SIMD variants
exp/simd/cpu_amd64_test.go CPU feature detection helpers for tests
exp/simd/README.md Documentation on CPU compatibility and usage
exp/simd/go.mod Module definition with Go 1.26 requirement

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

)

func TestContainsInt8x16(t *testing.T) {
requireAVX512(t)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong requireAVX512 for sse

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It uses mask.ToBits() which requires avx512.

We could imagine 2 conditional branches: AVX512 and non-AVX512?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't really look into it

The test just worked successfully for me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only TestContainsUint16x8, TestContainsUint16x8 not works by illegal instruction

Copy link
Owner Author

@samber samber Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you copy your /proc/cpuinfo please?

Since I code on MacOS, I need to start a VM to test all architectures. 🥵

Having AVX512 instructions but not AVX2 seems weired.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

archsimd.X86Features

HasAES: true
HasADX: true
HasAVX: true
HasAVXVNNI: false
HasAVX2: true
HasAVX512: false
HasAVX512F: false
HasAVX512CD: false
HasAVX512BW: false
HasAVX512DQ: false
HasAVX512VL: false
HasAVX512GFNI: false
HasAVX512VAES: false
HasAVX512VNNI: false
HasAVX512VBMI: false
HasAVX512VBMI2: false
HasAVX512BITALG: false
HasAVX512VPOPCNTDQ: false
HasAVX512VPCLMULQDQ: false
HasBMI1: true
HasBMI2: true
HasERMS: true
HasFSRM: false
HasFMA: true
HasGFNI: false
HasOSXSAVE: true
HasPCLMULQDQ: true
HasPOPCNT: true
HasRDTSCP: true
HasSHA: false
HasSSE3: true
HasSSSE3: true
HasSSE41: true
HasSSE42: true
HasVAES: false

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm pretty sure to have fixed all architecture checks

Can you run tests again please?

For CI, I moved to ubicloud, which supports AVX512. Github seems to have multiple hardware configs and none supports AVX512 during the weekend :-(

@samber samber force-pushed the feat/exp-simd branch 4 times, most recently from f900aab to d6d55e0 Compare February 20, 2026 18:40
@d-enk
Copy link
Contributor

d-enk commented Feb 21, 2026

You can also use for optimization:

  • const lanes
  • uint index

For example, it became ~30% faster.

func SumInt16x8[T ~int16](collection []T) T {
	length := uint(len(collection))
	if length == 0 {
		return 0
	}

	const lanes = 8

	base := unsafeSliceInt16(collection, int(length))
	var acc archsimd.Int16x8

	i := uint(0)
	for ; i+lanes <= length; i += lanes {
		v := archsimd.LoadInt16x8Slice(base[i : i+lanes])
		acc = acc.Add(v)
	}

	var buf [lanes]int16
	acc.Store(&buf)
	var sum T
	for k := range uint(lanes) {
		sum += T(buf[k])
	}

	for ; i < length; i++ {
		sum += collection[i]
	}

	return sum
}

@samber samber merged commit 035f1b3 into master Feb 21, 2026
14 checks passed
@samber samber deleted the feat/exp-simd branch February 21, 2026 18:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

discussion General discussion, proposal, or open-ended topic not tied to a specific bug. go Pull requests that update go code performance Issues or pull requests related to measuring or improving performance.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants