Skip to content

[OpenBLAS] update multithreading cutoff#7189

Merged
ViralBShah merged 3 commits intoJuliaPackaging:masterfrom
oscardssmith:oscardssmith-change-threading-cuttoff
Aug 8, 2023
Merged

[OpenBLAS] update multithreading cutoff#7189
ViralBShah merged 3 commits intoJuliaPackaging:masterfrom
oscardssmith:oscardssmith-change-threading-cuttoff

Conversation

@oscardssmith
Copy link
Copy Markdown
Contributor

400 is a much better cutoff than 50 more most modern machines. Note that 100 is way too small even for modern 4 core machines (I think the 50 limit was found pre AVX2 (and possibly pre fma)). 400 is probably a bit larger than optimal on small machines but only gives up ~13% performance single core compared to 8 core (and on laptops it will probably be better because the single core can turbo higher). It also mitigates the horrible performance cliff of using 16 or more threads on medium sized matrices (between roughly 400 and 1600). Of course the better answer would be to make it so BLAS's threading is integrated with julia's (and we use an appropriate number of threads based on the matrix size), but for now this is a pretty noticeable improvement.

julia> BLAS.set_num_threads(32)

julia> peakflops(400)
1.1644410982935661e10

julia> BLAS.set_num_threads(16)

julia> peakflops(400)
1.5580026746524042e10

julia> BLAS.set_num_threads(8)

julia> peakflops(400)
2.210268354206555e10

julia> BLAS.set_num_threads(4)

julia> peakflops(400)
1.937951340161483e10

julia> BLAS.set_num_threads(1)

julia> peakflops(400)
1.740427478902416e10

julia> BLAS.set_num_threads(32)

julia> peakflops(100)
1.9949726688744364e9

julia> BLAS.set_num_threads(16)

julia> peakflops(100)
2.9579541605843735e9

julia> BLAS.set_num_threads(8)

julia> peakflops(100)
4.373630506947512e9

julia> BLAS.set_num_threads(4)

julia> peakflops(100)
3.924300248211991e9

julia> BLAS.set_num_threads(1)

julia> peakflops(100)
1.0693014253788e10

400 is a much better cutoff than 50 more most modern machines. Note that 100 is way too small even for modern 4 core machines (I think the 50 limit was found pre AVX2 (and possibly pre fma)). 400 is probably a bit larger than optimal on small machines but only gives up ~13% performance single core compared to 8 core (and on laptops it will probably be better because the single core can turbo higher). It also mitigates the horrible performance cliff of using 16 or more threads on medium sized matrices (between roughly 400 and 1600). Of course the better answer would be to make it so BLAS's threading is integrated with julia's (and we use an appropriate number of threads based on the matrix size), but for now this is a pretty noticeable improvement.
```
julia> BLAS.set_num_threads(32)

julia> peakflops(400)
1.1644410982935661e10

julia> BLAS.set_num_threads(16)

julia> peakflops(400)
1.5580026746524042e10

julia> BLAS.set_num_threads(8)

julia> peakflops(400)
2.210268354206555e10

julia> BLAS.set_num_threads(4)

julia> peakflops(400)
1.937951340161483e10

julia> BLAS.set_num_threads(1)

julia> peakflops(400)
1.740427478902416e10

julia> BLAS.set_num_threads(32)

julia> peakflops(100)
1.9949726688744364e9

julia> BLAS.set_num_threads(16)

julia> peakflops(100)
2.9579541605843735e9

julia> BLAS.set_num_threads(8)

julia> peakflops(100)
4.373630506947512e9

julia> BLAS.set_num_threads(4)

julia> peakflops(100)
3.924300248211991e9

julia> BLAS.set_num_threads(1)

julia> peakflops(100)
1.0693014253788e10
@ViralBShah ViralBShah changed the title update multithreading cutoff [OpenBLAS] update multithreading cutoff Aug 8, 2023
@ViralBShah ViralBShah merged commit b02a6e7 into JuliaPackaging:master Aug 8, 2023
@oscardssmith oscardssmith deleted the oscardssmith-change-threading-cuttoff branch August 8, 2023 19:54
ViralBShah added a commit to JuliaLang/julia that referenced this pull request Aug 8, 2023
ViralBShah added a commit to JuliaLang/julia that referenced this pull request Aug 9, 2023
KristofferC pushed a commit to JuliaLang/julia that referenced this pull request Aug 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants