[OpenBLAS] update multithreading cutoff by oscardssmith · Pull Request #7189 · JuliaPackaging/Yggdrasil

oscardssmith · 2023-08-08T18:36:07Z

400 is a much better cutoff than 50 more most modern machines. Note that 100 is way too small even for modern 4 core machines (I think the 50 limit was found pre AVX2 (and possibly pre fma)). 400 is probably a bit larger than optimal on small machines but only gives up ~13% performance single core compared to 8 core (and on laptops it will probably be better because the single core can turbo higher). It also mitigates the horrible performance cliff of using 16 or more threads on medium sized matrices (between roughly 400 and 1600). Of course the better answer would be to make it so BLAS's threading is integrated with julia's (and we use an appropriate number of threads based on the matrix size), but for now this is a pretty noticeable improvement.

julia> BLAS.set_num_threads(32)

julia> peakflops(400)
1.1644410982935661e10

julia> BLAS.set_num_threads(16)

julia> peakflops(400)
1.5580026746524042e10

julia> BLAS.set_num_threads(8)

julia> peakflops(400)
2.210268354206555e10

julia> BLAS.set_num_threads(4)

julia> peakflops(400)
1.937951340161483e10

julia> BLAS.set_num_threads(1)

julia> peakflops(400)
1.740427478902416e10

julia> BLAS.set_num_threads(32)

julia> peakflops(100)
1.9949726688744364e9

julia> BLAS.set_num_threads(16)

julia> peakflops(100)
2.9579541605843735e9

julia> BLAS.set_num_threads(8)

julia> peakflops(100)
4.373630506947512e9

julia> BLAS.set_num_threads(4)

julia> peakflops(100)
3.924300248211991e9

julia> BLAS.set_num_threads(1)

julia> peakflops(100)
1.0693014253788e10

400 is a much better cutoff than 50 more most modern machines. Note that 100 is way too small even for modern 4 core machines (I think the 50 limit was found pre AVX2 (and possibly pre fma)). 400 is probably a bit larger than optimal on small machines but only gives up ~13% performance single core compared to 8 core (and on laptops it will probably be better because the single core can turbo higher). It also mitigates the horrible performance cliff of using 16 or more threads on medium sized matrices (between roughly 400 and 1600). Of course the better answer would be to make it so BLAS's threading is integrated with julia's (and we use an appropriate number of threads based on the matrix size), but for now this is a pretty noticeable improvement. ``` julia> BLAS.set_num_threads(32) julia> peakflops(400) 1.1644410982935661e10 julia> BLAS.set_num_threads(16) julia> peakflops(400) 1.5580026746524042e10 julia> BLAS.set_num_threads(8) julia> peakflops(400) 2.210268354206555e10 julia> BLAS.set_num_threads(4) julia> peakflops(400) 1.937951340161483e10 julia> BLAS.set_num_threads(1) julia> peakflops(400) 1.740427478902416e10 julia> BLAS.set_num_threads(32) julia> peakflops(100) 1.9949726688744364e9 julia> BLAS.set_num_threads(16) julia> peakflops(100) 2.9579541605843735e9 julia> BLAS.set_num_threads(8) julia> peakflops(100) 4.373630506947512e9 julia> BLAS.set_num_threads(4) julia> peakflops(100) 3.924300248211991e9 julia> BLAS.set_num_threads(1) julia> peakflops(100) 1.0693014253788e10

@oscardssmith

…50844) Detailed discussion and benchmarks by @oscardssmith in JuliaPackaging/Yggdrasil#7189

@oscardssmith

…50844) Detailed discussion and benchmarks by @oscardssmith in JuliaPackaging/Yggdrasil#7189 (cherry picked from commit 626f687)

oscardssmith added 3 commits August 8, 2023 14:13

rebuild

49df654

Update build_tarballs.jl

fd71f86

ViralBShah approved these changes Aug 8, 2023

View reviewed changes

ViralBShah mentioned this pull request Aug 8, 2023

Changing the default LU? SciML/LinearSolve.jl#357

Closed

ViralBShah changed the title ~~update multithreading cutoff~~ [OpenBLAS] update multithreading cutoff Aug 8, 2023

ViralBShah merged commit b02a6e7 into JuliaPackaging:master Aug 8, 2023

oscardssmith deleted the oscardssmith-change-threading-cuttoff branch August 8, 2023 19:54

ViralBShah added a commit to JuliaLang/julia that referenced this pull request Aug 8, 2023

Sync with JuliaPackaging/Yggdrasil#7189

3e67875

ViralBShah mentioned this pull request Aug 8, 2023

Bump OpenBLAS binaries to use the new GEMM multithreading threshold JuliaLang/julia#50844

Merged

ViralBShah added a commit to JuliaLang/julia that referenced this pull request Aug 9, 2023

Bump OpenBLAS binaries to use the new GEMM multithreading threshold (#…

626f687

…50844) Detailed discussion and benchmarks by @oscardssmith in JuliaPackaging/Yggdrasil#7189

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OpenBLAS] update multithreading cutoff#7189

[OpenBLAS] update multithreading cutoff#7189
ViralBShah merged 3 commits intoJuliaPackaging:masterfrom
oscardssmith:oscardssmith-change-threading-cuttoff

oscardssmith commented Aug 8, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

oscardssmith commented Aug 8, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants