feat: Expose opt_level compiler config option#769
Open
haakon-e wants to merge 1 commit intoJuliaGPU:mainfrom
Open
feat: Expose opt_level compiler config option#769haakon-e wants to merge 1 commit intoJuliaGPU:mainfrom
opt_level compiler config option#769haakon-e wants to merge 1 commit intoJuliaGPU:mainfrom
Conversation
Pass the `opt_level` keyword argument through the compiler pipeline, allowing users to control the optimization level for Metal shader compilation via `_compiler_config` and the `@metal` macro.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #769 +/- ##
==========================================
+ Coverage 80.45% 80.67% +0.21%
==========================================
Files 61 61
Lines 2855 2846 -9
==========================================
- Hits 2297 2296 -1
+ Misses 558 550 -8 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Contributor
There was a problem hiding this comment.
Metal Benchmarks
Details
| Benchmark suite | Current: 01439d5 | Previous: acbbd34 | Ratio |
|---|---|---|---|
array/accumulate/Float32/1d |
1122834 ns |
1132709 ns |
0.99 |
array/accumulate/Float32/dims=1 |
1538875 ns |
1560896 ns |
0.99 |
array/accumulate/Float32/dims=1L |
9839917 ns |
9836000 ns |
1.00 |
array/accumulate/Float32/dims=2 |
1885583 ns |
1881562.5 ns |
1.00 |
array/accumulate/Float32/dims=2L |
7231041.5 ns |
7213791.5 ns |
1.00 |
array/accumulate/Int64/1d |
1255833 ns |
1251709 ns |
1.00 |
array/accumulate/Int64/dims=1 |
1842146 ns |
1843416.5 ns |
1.00 |
array/accumulate/Int64/dims=1L |
11667812.5 ns |
11749875 ns |
0.99 |
array/accumulate/Int64/dims=2 |
2160791 ns |
2128375 ns |
1.02 |
array/accumulate/Int64/dims=2L |
9731500 ns |
9792208 ns |
0.99 |
array/broadcast |
609250 ns |
600250 ns |
1.01 |
array/construct |
6000 ns |
6375 ns |
0.94 |
array/permutedims/2d |
1170208 ns |
1169958 ns |
1.00 |
array/permutedims/3d |
1670500 ns |
1647292 ns |
1.01 |
array/permutedims/4d |
2387667 ns |
2390042 ns |
1.00 |
array/private/copy |
568792 ns |
547895.5 ns |
1.04 |
array/private/copyto!/cpu_to_gpu |
803958 ns |
789667 ns |
1.02 |
array/private/copyto!/gpu_to_cpu |
807584 ns |
802833.5 ns |
1.01 |
array/private/copyto!/gpu_to_gpu |
648479.5 ns |
645417 ns |
1.00 |
array/private/iteration/findall/bool |
1421937 ns |
1401271 ns |
1.01 |
array/private/iteration/findall/int |
1564396 ns |
1552853.5 ns |
1.01 |
array/private/iteration/findfirst/bool |
2033625 ns |
2046729.5 ns |
0.99 |
array/private/iteration/findfirst/int |
2069500 ns |
2093500 ns |
0.99 |
array/private/iteration/findmin/1d |
2510000 ns |
2506709 ns |
1.00 |
array/private/iteration/findmin/2d |
1792875 ns |
1792000 ns |
1.00 |
array/private/iteration/logical |
2596709 ns |
2638750 ns |
0.98 |
array/private/iteration/scalar |
5593562.5 ns |
4889812.5 ns |
1.14 |
array/random/rand/Float32 |
1170625 ns |
1157334 ns |
1.01 |
array/random/rand/Int64 |
1333667 ns |
1281416 ns |
1.04 |
array/random/rand!/Float32 |
909667 ns |
933708 ns |
0.97 |
array/random/rand!/Int64 |
868750 ns |
877500 ns |
0.99 |
array/random/randn/Float32 |
1072333.5 ns |
1075625 ns |
1.00 |
array/random/randn!/Float32 |
815041 ns |
821083.5 ns |
0.99 |
array/reductions/mapreduce/Float32/1d |
1041812.5 ns |
1026395.5 ns |
1.02 |
array/reductions/mapreduce/Float32/dims=1 |
831041.5 ns |
847750 ns |
0.98 |
array/reductions/mapreduce/Float32/dims=1L |
1333709 ns |
1320958 ns |
1.01 |
array/reductions/mapreduce/Float32/dims=2 |
851291 ns |
856062.5 ns |
0.99 |
array/reductions/mapreduce/Float32/dims=2L |
1819833 ns |
1802542 ns |
1.01 |
array/reductions/mapreduce/Int64/1d |
1523666.5 ns |
1356854.5 ns |
1.12 |
array/reductions/mapreduce/Int64/dims=1 |
1097000 ns |
1113917 ns |
0.98 |
array/reductions/mapreduce/Int64/dims=1L |
2010833 ns |
2048021 ns |
0.98 |
array/reductions/mapreduce/Int64/dims=2 |
1140104 ns |
1157312.5 ns |
0.99 |
array/reductions/mapreduce/Int64/dims=2L |
3618167 ns |
3584708 ns |
1.01 |
array/reductions/reduce/Float32/1d |
1045417 ns |
1030312.5 ns |
1.01 |
array/reductions/reduce/Float32/dims=1 |
823437.5 ns |
849459 ns |
0.97 |
array/reductions/reduce/Float32/dims=1L |
1317459 ns |
1282250 ns |
1.03 |
array/reductions/reduce/Float32/dims=2 |
852459 ns |
779458.5 ns |
1.09 |
array/reductions/reduce/Float32/dims=2L |
1807771 ns |
1811125 ns |
1.00 |
array/reductions/reduce/Int64/1d |
1534833 ns |
1352583.5 ns |
1.13 |
array/reductions/reduce/Int64/dims=1 |
1093833 ns |
1105625.5 ns |
0.99 |
array/reductions/reduce/Int64/dims=1L |
2018458 ns |
2038209 ns |
0.99 |
array/reductions/reduce/Int64/dims=2 |
1149583.5 ns |
1165084 ns |
0.99 |
array/reductions/reduce/Int64/dims=2L |
4224500.5 ns |
4246833.5 ns |
0.99 |
array/shared/copy |
251875 ns |
230667 ns |
1.09 |
array/shared/copyto!/cpu_to_gpu |
80791 ns |
83667 ns |
0.97 |
array/shared/copyto!/gpu_to_cpu |
81250 ns |
82334 ns |
0.99 |
array/shared/copyto!/gpu_to_gpu |
81583 ns |
83209 ns |
0.98 |
array/shared/iteration/findall/bool |
1428500 ns |
1414771 ns |
1.01 |
array/shared/iteration/findall/int |
1559895.5 ns |
1566833 ns |
1.00 |
array/shared/iteration/findfirst/bool |
1631874.5 ns |
1634542 ns |
1.00 |
array/shared/iteration/findfirst/int |
1650417 ns |
1650729 ns |
1.00 |
array/shared/iteration/findmin/1d |
2092875 ns |
2110166.5 ns |
0.99 |
array/shared/iteration/findmin/2d |
1795021 ns |
1792229 ns |
1.00 |
array/shared/iteration/logical |
2452125 ns |
2237270.5 ns |
1.10 |
array/shared/iteration/scalar |
199917 ns |
207375 ns |
0.96 |
integration/byval/reference |
1563375 ns |
1550250 ns |
1.01 |
integration/byval/slices=1 |
1579687 ns |
1587604.5 ns |
1.00 |
integration/byval/slices=2 |
2612854.5 ns |
2607167 ns |
1.00 |
integration/byval/slices=3 |
7805520.5 ns |
7747541 ns |
1.01 |
integration/metaldevrt |
864395.5 ns |
880084 ns |
0.98 |
kernel/indexing |
621750 ns |
597895.5 ns |
1.04 |
kernel/indexing_checked |
629124.5 ns |
586250 ns |
1.07 |
kernel/launch |
11083 ns |
11917 ns |
0.93 |
kernel/rand |
571167 ns |
576375 ns |
0.99 |
latency/import |
1422002520.5 ns |
1422194646 ns |
1.00 |
latency/precompile |
25434593458 ns |
25428193000 ns |
1.00 |
latency/ttfp |
2340542917 ns |
2340052813 ns |
1.00 |
metal/synchronization/context |
19667 ns |
20166.5 ns |
0.98 |
metal/synchronization/stream |
18584 ns |
19250 ns |
0.97 |
This comment was automatically generated by workflow using github-action-benchmark.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pass the
opt_levelkeyword argument through the compiler pipeline, allowing users to control the optimization level for Metal shader compilation via_compiler_configand the@metalmacro.If this feature is of interest, I'm happy to add tests, docs, etc., as needed.