Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #801 +/- ##
=======================================
Coverage 92.05% 92.05%
=======================================
Files 32 32
Lines 4205 4205
=======================================
Hits 3871 3871
Misses 255 255
Partials 79 79
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR introduces experimental SIMD (Single Instruction Multiple Data) support for Go 1.26, adding a new github.com/samber/lo/exp/simd package with optimized mathematical and search operations using SSE, AVX2, and AVX-512 instruction sets.
Changes:
- SIMD-accelerated math operations (Sum, Mean, Clamp, Min, Max) for all numeric types
- SIMD-accelerated Contains operations for searching
- Automatic CPU feature detection and dispatching to optimal implementation
- Comprehensive test coverage for all SIMD variants
Reviewed changes
Copilot reviewed 17 out of 18 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| exp/simd/math.go | Dispatcher functions that route to optimal SIMD implementation |
| exp/simd/math_sse.go | SSE (128-bit) SIMD implementations |
| exp/simd/math_avx2.go | AVX2 (256-bit) SIMD implementations |
| exp/simd/math_avx512.go | AVX-512 (512-bit) SIMD implementations |
| exp/simd/intersect_*.go | SIMD Contains implementations for SSE/AVX2/AVX-512 |
| exp/simd/*_test.go | Comprehensive test files for all SIMD variants |
| exp/simd/cpu_amd64_test.go | CPU feature detection helpers for tests |
| exp/simd/README.md | Documentation on CPU compatibility and usage |
| exp/simd/go.mod | Module definition with Go 1.26 requirement |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
cd22e04 to
4de17f5
Compare
2e9cafe to
a86fc33
Compare
exp/simd/intersect_sse_test.go
Outdated
| ) | ||
|
|
||
| func TestContainsInt8x16(t *testing.T) { | ||
| requireAVX512(t) |
There was a problem hiding this comment.
It uses mask.ToBits() which requires avx512.
We could imagine 2 conditional branches: AVX512 and non-AVX512?
There was a problem hiding this comment.
I didn't really look into it
The test just worked successfully for me.
There was a problem hiding this comment.
Only TestContainsUint16x8, TestContainsUint16x8 not works by illegal instruction
There was a problem hiding this comment.
Can you copy your /proc/cpuinfo please?
Since I code on MacOS, I need to start a VM to test all architectures. 🥵
Having AVX512 instructions but not AVX2 seems weired.
There was a problem hiding this comment.
archsimd.X86Features
HasAES: true
HasADX: true
HasAVX: true
HasAVXVNNI: false
HasAVX2: true
HasAVX512: false
HasAVX512F: false
HasAVX512CD: false
HasAVX512BW: false
HasAVX512DQ: false
HasAVX512VL: false
HasAVX512GFNI: false
HasAVX512VAES: false
HasAVX512VNNI: false
HasAVX512VBMI: false
HasAVX512VBMI2: false
HasAVX512BITALG: false
HasAVX512VPOPCNTDQ: false
HasAVX512VPCLMULQDQ: false
HasBMI1: true
HasBMI2: true
HasERMS: true
HasFSRM: false
HasFMA: true
HasGFNI: false
HasOSXSAVE: true
HasPCLMULQDQ: true
HasPOPCNT: true
HasRDTSCP: true
HasSHA: false
HasSSE3: true
HasSSSE3: true
HasSSE41: true
HasSSE42: true
HasVAES: falseThere was a problem hiding this comment.
i'm pretty sure to have fixed all architecture checks
Can you run tests again please?
For CI, I moved to ubicloud, which supports AVX512. Github seems to have multiple hardware configs and none supports AVX512 during the weekend :-(
f900aab to
d6d55e0
Compare
|
You can also use for optimization:
For example, it became ~30% faster. func SumInt16x8[T ~int16](collection []T) T {
length := uint(len(collection))
if length == 0 {
return 0
}
const lanes = 8
base := unsafeSliceInt16(collection, int(length))
var acc archsimd.Int16x8
i := uint(0)
for ; i+lanes <= length; i += lanes {
v := archsimd.LoadInt16x8Slice(base[i : i+lanes])
acc = acc.Add(v)
}
var buf [lanes]int16
acc.Store(&buf)
var sum T
for k := range uint(lanes) {
sum += T(buf[k])
}
for ; i < length; i++ {
sum += collection[i]
}
return sum
} |
694357a to
a074d91
Compare
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
a074d91 to
55c57c7
Compare
71079dc to
c18f20e
Compare
69e695d to
1fab9b9
Compare
Introducing
github.com/samber/lo/exp/simdpackage.Go 1.26 brings SIMD instructions to Go.
The feature is still experimental, so i'm not adding this to the core package of
lo.Requirements:
GOEXPERIMENT=simdI've added
SumInt8variants that route the implementation to the right architecture automatically. I would be interested by your opinion on this.Should we support vaargs ? If yes, should
losimd.SumInt8x64(arg1, arg2, arg3)returns a slice of 3 results or a single result?@Jorropo @d-enk
Warning: The experimental nature of this sub-package implies that we might have breaking changes in the future.
TODO: