Skip to content

Add preference to use CUDA_Runtime_Discovery for the compiler binaries#3080

Open
apozharski wants to merge 11 commits intoJuliaGPU:masterfrom
apozharski:ap/local-compiler
Open

Add preference to use CUDA_Runtime_Discovery for the compiler binaries#3080
apozharski wants to merge 11 commits intoJuliaGPU:masterfrom
apozharski:ap/local-compiler

Conversation

@apozharski
Copy link
Copy Markdown
Contributor

This works for my use case of ahead of time compilation, but now that I think about it may break if you use a local runtime and the artifact compiler, which I need to check, as well as not having any checks for compatibility of the compiler version and runtime version. It also might need some docs. As such, keeping this as [wip] for now.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 10.16%. Comparing base (a79b516) to head (58f4c8e).

Additional details and impacted files
@@             Coverage Diff             @@
##           master    #3080       +/-   ##
===========================================
- Coverage   90.42%   10.16%   -80.27%     
===========================================
  Files         141      114       -27     
  Lines       11993     9123     -2870     
===========================================
- Hits        10845      927     -9918     
- Misses       1148     8196     +7048     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@apozharski apozharski changed the title [WIP] Add preference to use CUDA_Runtime_Discovery for the compiler binaries Add preference to use CUDA_Runtime_Discovery for the compiler binaries Apr 3, 2026
@apozharski apozharski marked this pull request as ready for review April 3, 2026 07:25
@apozharski
Copy link
Copy Markdown
Contributor Author

The above concerns about mixing the local compiler and local runtime preferences has been addressed and everything should work correctly now.

maleadt
maleadt previously requested changes Apr 9, 2026
Copy link
Copy Markdown
Member

@maleadt maleadt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a bump in Project.toml for CUDA_Runtime_Discovery

Comment thread CUDACore/src/CUDACore.jl Outdated
@maleadt
Copy link
Copy Markdown
Member

maleadt commented Apr 14, 2026

Compat needs to be bumped in subpackages as well.

@apozharski
Copy link
Copy Markdown
Contributor Author

apozharski commented Apr 20, 2026

Hmmm it seems that a GPUArrays test fails consistently in CI and I can't repro it locally 🤔

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Details
Benchmark suite Current: 9a7c2c6 Previous: e0e295f Ratio
array/accumulate/Float32/1d 100746 ns 100878 ns 1.00
array/accumulate/Float32/dims=1 77158 ns 75855 ns 1.02
array/accumulate/Float32/dims=1L 1586087.5 ns 1585504 ns 1.00
array/accumulate/Float32/dims=2 144338 ns 143115.5 ns 1.01
array/accumulate/Float32/dims=2L 658356 ns 657101 ns 1.00
array/accumulate/Int64/1d 118165.5 ns 118250 ns 1.00
array/accumulate/Int64/dims=1 80273 ns 79820.5 ns 1.01
array/accumulate/Int64/dims=1L 1695183 ns 1694871 ns 1.00
array/accumulate/Int64/dims=2 156182 ns 155746 ns 1.00
array/accumulate/Int64/dims=2L 961534 ns 961802 ns 1.00
array/broadcast 20370 ns 20486 ns 0.99
array/construct 1258.7 ns 1263.9 ns 1.00
array/copy 18040 ns 17962 ns 1.00
array/copyto!/cpu_to_gpu 213158 ns 214197 ns 1.00
array/copyto!/gpu_to_cpu 282748 ns 281343 ns 1.00
array/copyto!/gpu_to_gpu 10585.666666666666 ns 10794 ns 0.98
array/iteration/findall/bool 134671 ns 134478 ns 1.00
array/iteration/findall/int 149809 ns 149314.5 ns 1.00
array/iteration/findfirst/bool 81561 ns 81113 ns 1.01
array/iteration/findfirst/int 83914 ns 83293 ns 1.01
array/iteration/findmin/1d 87680 ns 84555 ns 1.04
array/iteration/findmin/2d 117130.5 ns 116516 ns 1.01
array/iteration/logical 199715.5 ns 197262.5 ns 1.01
array/iteration/scalar 67619 ns 67092 ns 1.01
array/permutedims/2d 51960 ns 52211 ns 1.00
array/permutedims/3d 52466.5 ns 52764 ns 0.99
array/permutedims/4d 51059.5 ns 51452 ns 0.99
array/random/rand/Float32 12481 ns 12943 ns 0.96
array/random/rand/Int64 24755 ns 24996 ns 0.99
array/random/rand!/Float32 8327.666666666666 ns 8402.333333333334 ns 0.99
array/random/rand!/Int64 21598 ns 21937 ns 0.98
array/random/randn/Float32 36533 ns 36954 ns 0.99
array/random/randn!/Float32 30546 ns 30982 ns 0.99
array/reductions/mapreduce/Float32/1d 34038.5 ns 34678 ns 0.98
array/reductions/mapreduce/Float32/dims=1 39027.5 ns 39206 ns 1.00
array/reductions/mapreduce/Float32/dims=1L 51121 ns 51259.5 ns 1.00
array/reductions/mapreduce/Float32/dims=2 56152 ns 56274 ns 1.00
array/reductions/mapreduce/Float32/dims=2L 68951.5 ns 69346 ns 0.99
array/reductions/mapreduce/Int64/1d 42066 ns 42412 ns 0.99
array/reductions/mapreduce/Int64/dims=1 42253 ns 42188 ns 1.00
array/reductions/mapreduce/Int64/dims=1L 87281 ns 87287 ns 1.00
array/reductions/mapreduce/Int64/dims=2 59211 ns 59630 ns 0.99
array/reductions/mapreduce/Int64/dims=2L 84425 ns 84743 ns 1.00
array/reductions/reduce/Float32/1d 34425 ns 34235 ns 1.01
array/reductions/reduce/Float32/dims=1 40149.5 ns 39618.5 ns 1.01
array/reductions/reduce/Float32/dims=1L 51438 ns 51305 ns 1.00
array/reductions/reduce/Float32/dims=2 56323 ns 56667 ns 0.99
array/reductions/reduce/Float32/dims=2L 69313 ns 69784 ns 0.99
array/reductions/reduce/Int64/1d 41907 ns 42369 ns 0.99
array/reductions/reduce/Int64/dims=1 42514 ns 42478 ns 1.00
array/reductions/reduce/Int64/dims=1L 87055 ns 87248 ns 1.00
array/reductions/reduce/Int64/dims=2 59346 ns 59729 ns 0.99
array/reductions/reduce/Int64/dims=2L 84476 ns 84769 ns 1.00
array/reverse/1d 17766 ns 18015.5 ns 0.99
array/reverse/1dL 68348 ns 68638 ns 1.00
array/reverse/1dL_inplace 65744 ns 65779 ns 1.00
array/reverse/1d_inplace 10264.166666666668 ns 8649.666666666666 ns 1.19
array/reverse/2d 21025 ns 20711 ns 1.02
array/reverse/2dL 73112 ns 72634 ns 1.01
array/reverse/2dL_inplace 65864 ns 65985 ns 1.00
array/reverse/2d_inplace 10000 ns 10088 ns 0.99
array/sorting/1d 2735645 ns 2734295 ns 1.00
array/sorting/2d 1069524 ns 1068343 ns 1.00
array/sorting/by 3304721 ns 3304353 ns 1.00
cuda/synchronization/context/auto 1140.3 ns 1159.9 ns 0.98
cuda/synchronization/context/blocking 924.5 ns 896.4878048780488 ns 1.03
cuda/synchronization/context/nonblocking 7086.4 ns 7409.1 ns 0.96
cuda/synchronization/stream/auto 995.3333333333334 ns 1027.578947368421 ns 0.97
cuda/synchronization/stream/blocking 844.040404040404 ns 841.2941176470588 ns 1.00
cuda/synchronization/stream/nonblocking 7321 ns 7567.799999999999 ns 0.97
integration/byval/reference 143733 ns 143876 ns 1.00
integration/byval/slices=1 145590.5 ns 145738.5 ns 1.00
integration/byval/slices=2 284469 ns 284423 ns 1.00
integration/byval/slices=3 423004 ns 423173 ns 1.00
integration/cudadevrt 102276 ns 102437 ns 1.00
integration/volumerhs 23425811 ns 23470585 ns 1.00
kernel/indexing 13108 ns 13311 ns 0.98
kernel/indexing_checked 13927 ns 14095 ns 0.99
kernel/launch 2114 ns 2235.1111111111113 ns 0.95
kernel/occupancy 669.24375 ns 693.6190476190476 ns 0.96
kernel/rand 17017 ns 18172.5 ns 0.94
latency/import 3817165248.5 ns 3820990542 ns 1.00
latency/precompile 4582580994.5 ns 4593009584 ns 1.00
latency/ttfp 4399286810.5 ns 4397252952 ns 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@maleadt
Copy link
Copy Markdown
Member

maleadt commented Apr 20, 2026

Could you report use of a local compiler in the versioninfo() output like we do for local toolkits?

And the failure is probably because you need to rebase on latest master in order to include #3088.

@apozharski
Copy link
Copy Markdown
Contributor Author

Added additional info for compiler in versioninfo (exactly same way with additional line as for the runtime, let me know if you want something different).

Comment thread CUDATools/src/utilities.jl
@maleadt maleadt dismissed their stale review April 20, 2026 13:34

Changes implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants