Add preference to use CUDA_Runtime_Discovery for the compiler binaries#3080
Add preference to use CUDA_Runtime_Discovery for the compiler binaries#3080apozharski wants to merge 11 commits intoJuliaGPU:masterfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #3080 +/- ##
===========================================
- Coverage 90.42% 10.16% -80.27%
===========================================
Files 141 114 -27
Lines 11993 9123 -2870
===========================================
- Hits 10845 927 -9918
- Misses 1148 8196 +7048 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
The above concerns about mixing the local compiler and local runtime preferences has been addressed and everything should work correctly now. |
maleadt
left a comment
There was a problem hiding this comment.
Needs a bump in Project.toml for CUDA_Runtime_Discovery
|
Compat needs to be bumped in subpackages as well. |
11c1c6e to
9629948
Compare
|
Hmmm it seems that a |
There was a problem hiding this comment.
CUDA.jl Benchmarks
Details
| Benchmark suite | Current: 9a7c2c6 | Previous: e0e295f | Ratio |
|---|---|---|---|
array/accumulate/Float32/1d |
100746 ns |
100878 ns |
1.00 |
array/accumulate/Float32/dims=1 |
77158 ns |
75855 ns |
1.02 |
array/accumulate/Float32/dims=1L |
1586087.5 ns |
1585504 ns |
1.00 |
array/accumulate/Float32/dims=2 |
144338 ns |
143115.5 ns |
1.01 |
array/accumulate/Float32/dims=2L |
658356 ns |
657101 ns |
1.00 |
array/accumulate/Int64/1d |
118165.5 ns |
118250 ns |
1.00 |
array/accumulate/Int64/dims=1 |
80273 ns |
79820.5 ns |
1.01 |
array/accumulate/Int64/dims=1L |
1695183 ns |
1694871 ns |
1.00 |
array/accumulate/Int64/dims=2 |
156182 ns |
155746 ns |
1.00 |
array/accumulate/Int64/dims=2L |
961534 ns |
961802 ns |
1.00 |
array/broadcast |
20370 ns |
20486 ns |
0.99 |
array/construct |
1258.7 ns |
1263.9 ns |
1.00 |
array/copy |
18040 ns |
17962 ns |
1.00 |
array/copyto!/cpu_to_gpu |
213158 ns |
214197 ns |
1.00 |
array/copyto!/gpu_to_cpu |
282748 ns |
281343 ns |
1.00 |
array/copyto!/gpu_to_gpu |
10585.666666666666 ns |
10794 ns |
0.98 |
array/iteration/findall/bool |
134671 ns |
134478 ns |
1.00 |
array/iteration/findall/int |
149809 ns |
149314.5 ns |
1.00 |
array/iteration/findfirst/bool |
81561 ns |
81113 ns |
1.01 |
array/iteration/findfirst/int |
83914 ns |
83293 ns |
1.01 |
array/iteration/findmin/1d |
87680 ns |
84555 ns |
1.04 |
array/iteration/findmin/2d |
117130.5 ns |
116516 ns |
1.01 |
array/iteration/logical |
199715.5 ns |
197262.5 ns |
1.01 |
array/iteration/scalar |
67619 ns |
67092 ns |
1.01 |
array/permutedims/2d |
51960 ns |
52211 ns |
1.00 |
array/permutedims/3d |
52466.5 ns |
52764 ns |
0.99 |
array/permutedims/4d |
51059.5 ns |
51452 ns |
0.99 |
array/random/rand/Float32 |
12481 ns |
12943 ns |
0.96 |
array/random/rand/Int64 |
24755 ns |
24996 ns |
0.99 |
array/random/rand!/Float32 |
8327.666666666666 ns |
8402.333333333334 ns |
0.99 |
array/random/rand!/Int64 |
21598 ns |
21937 ns |
0.98 |
array/random/randn/Float32 |
36533 ns |
36954 ns |
0.99 |
array/random/randn!/Float32 |
30546 ns |
30982 ns |
0.99 |
array/reductions/mapreduce/Float32/1d |
34038.5 ns |
34678 ns |
0.98 |
array/reductions/mapreduce/Float32/dims=1 |
39027.5 ns |
39206 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=1L |
51121 ns |
51259.5 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=2 |
56152 ns |
56274 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=2L |
68951.5 ns |
69346 ns |
0.99 |
array/reductions/mapreduce/Int64/1d |
42066 ns |
42412 ns |
0.99 |
array/reductions/mapreduce/Int64/dims=1 |
42253 ns |
42188 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=1L |
87281 ns |
87287 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=2 |
59211 ns |
59630 ns |
0.99 |
array/reductions/mapreduce/Int64/dims=2L |
84425 ns |
84743 ns |
1.00 |
array/reductions/reduce/Float32/1d |
34425 ns |
34235 ns |
1.01 |
array/reductions/reduce/Float32/dims=1 |
40149.5 ns |
39618.5 ns |
1.01 |
array/reductions/reduce/Float32/dims=1L |
51438 ns |
51305 ns |
1.00 |
array/reductions/reduce/Float32/dims=2 |
56323 ns |
56667 ns |
0.99 |
array/reductions/reduce/Float32/dims=2L |
69313 ns |
69784 ns |
0.99 |
array/reductions/reduce/Int64/1d |
41907 ns |
42369 ns |
0.99 |
array/reductions/reduce/Int64/dims=1 |
42514 ns |
42478 ns |
1.00 |
array/reductions/reduce/Int64/dims=1L |
87055 ns |
87248 ns |
1.00 |
array/reductions/reduce/Int64/dims=2 |
59346 ns |
59729 ns |
0.99 |
array/reductions/reduce/Int64/dims=2L |
84476 ns |
84769 ns |
1.00 |
array/reverse/1d |
17766 ns |
18015.5 ns |
0.99 |
array/reverse/1dL |
68348 ns |
68638 ns |
1.00 |
array/reverse/1dL_inplace |
65744 ns |
65779 ns |
1.00 |
array/reverse/1d_inplace |
10264.166666666668 ns |
8649.666666666666 ns |
1.19 |
array/reverse/2d |
21025 ns |
20711 ns |
1.02 |
array/reverse/2dL |
73112 ns |
72634 ns |
1.01 |
array/reverse/2dL_inplace |
65864 ns |
65985 ns |
1.00 |
array/reverse/2d_inplace |
10000 ns |
10088 ns |
0.99 |
array/sorting/1d |
2735645 ns |
2734295 ns |
1.00 |
array/sorting/2d |
1069524 ns |
1068343 ns |
1.00 |
array/sorting/by |
3304721 ns |
3304353 ns |
1.00 |
cuda/synchronization/context/auto |
1140.3 ns |
1159.9 ns |
0.98 |
cuda/synchronization/context/blocking |
924.5 ns |
896.4878048780488 ns |
1.03 |
cuda/synchronization/context/nonblocking |
7086.4 ns |
7409.1 ns |
0.96 |
cuda/synchronization/stream/auto |
995.3333333333334 ns |
1027.578947368421 ns |
0.97 |
cuda/synchronization/stream/blocking |
844.040404040404 ns |
841.2941176470588 ns |
1.00 |
cuda/synchronization/stream/nonblocking |
7321 ns |
7567.799999999999 ns |
0.97 |
integration/byval/reference |
143733 ns |
143876 ns |
1.00 |
integration/byval/slices=1 |
145590.5 ns |
145738.5 ns |
1.00 |
integration/byval/slices=2 |
284469 ns |
284423 ns |
1.00 |
integration/byval/slices=3 |
423004 ns |
423173 ns |
1.00 |
integration/cudadevrt |
102276 ns |
102437 ns |
1.00 |
integration/volumerhs |
23425811 ns |
23470585 ns |
1.00 |
kernel/indexing |
13108 ns |
13311 ns |
0.98 |
kernel/indexing_checked |
13927 ns |
14095 ns |
0.99 |
kernel/launch |
2114 ns |
2235.1111111111113 ns |
0.95 |
kernel/occupancy |
669.24375 ns |
693.6190476190476 ns |
0.96 |
kernel/rand |
17017 ns |
18172.5 ns |
0.94 |
latency/import |
3817165248.5 ns |
3820990542 ns |
1.00 |
latency/precompile |
4582580994.5 ns |
4593009584 ns |
1.00 |
latency/ttfp |
4399286810.5 ns |
4397252952 ns |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
|
Could you report use of a local compiler in the And the failure is probably because you need to rebase on latest master in order to include #3088. |
9b06022 to
3139554
Compare
|
Added additional info for compiler in |
f00d80d to
9a7c2c6
Compare
This works for my use case of ahead of time compilation, but now that I think about it may break if you use a local runtime and the artifact compiler, which I need to check, as well as not having any checks for compatibility of the compiler version and runtime version. It also might need some docs. As such, keeping this as [wip] for now.