Skip to content

feat: Expose opt_level compiler config option#769

Open
haakon-e wants to merge 1 commit intoJuliaGPU:mainfrom
haakon-e:he/feat-expose-opt-level-compiler
Open

feat: Expose opt_level compiler config option#769
haakon-e wants to merge 1 commit intoJuliaGPU:mainfrom
haakon-e:he/feat-expose-opt-level-compiler

Conversation

@haakon-e
Copy link
Copy Markdown

Pass the opt_level keyword argument through the compiler pipeline, allowing users to control the optimization level for Metal shader compilation via _compiler_config and the @metal macro.


If this feature is of interest, I'm happy to add tests, docs, etc., as needed.

Pass the `opt_level` keyword argument through the compiler pipeline,
allowing users to control the optimization level for Metal shader
compilation via `_compiler_config` and the `@metal` macro.
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 22, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 80.67%. Comparing base (65fac52) to head (01439d5).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #769      +/-   ##
==========================================
+ Coverage   80.45%   80.67%   +0.21%     
==========================================
  Files          61       61              
  Lines        2855     2846       -9     
==========================================
- Hits         2297     2296       -1     
+ Misses        558      550       -8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metal Benchmarks

Details
Benchmark suite Current: 01439d5 Previous: acbbd34 Ratio
array/accumulate/Float32/1d 1122834 ns 1132709 ns 0.99
array/accumulate/Float32/dims=1 1538875 ns 1560896 ns 0.99
array/accumulate/Float32/dims=1L 9839917 ns 9836000 ns 1.00
array/accumulate/Float32/dims=2 1885583 ns 1881562.5 ns 1.00
array/accumulate/Float32/dims=2L 7231041.5 ns 7213791.5 ns 1.00
array/accumulate/Int64/1d 1255833 ns 1251709 ns 1.00
array/accumulate/Int64/dims=1 1842146 ns 1843416.5 ns 1.00
array/accumulate/Int64/dims=1L 11667812.5 ns 11749875 ns 0.99
array/accumulate/Int64/dims=2 2160791 ns 2128375 ns 1.02
array/accumulate/Int64/dims=2L 9731500 ns 9792208 ns 0.99
array/broadcast 609250 ns 600250 ns 1.01
array/construct 6000 ns 6375 ns 0.94
array/permutedims/2d 1170208 ns 1169958 ns 1.00
array/permutedims/3d 1670500 ns 1647292 ns 1.01
array/permutedims/4d 2387667 ns 2390042 ns 1.00
array/private/copy 568792 ns 547895.5 ns 1.04
array/private/copyto!/cpu_to_gpu 803958 ns 789667 ns 1.02
array/private/copyto!/gpu_to_cpu 807584 ns 802833.5 ns 1.01
array/private/copyto!/gpu_to_gpu 648479.5 ns 645417 ns 1.00
array/private/iteration/findall/bool 1421937 ns 1401271 ns 1.01
array/private/iteration/findall/int 1564396 ns 1552853.5 ns 1.01
array/private/iteration/findfirst/bool 2033625 ns 2046729.5 ns 0.99
array/private/iteration/findfirst/int 2069500 ns 2093500 ns 0.99
array/private/iteration/findmin/1d 2510000 ns 2506709 ns 1.00
array/private/iteration/findmin/2d 1792875 ns 1792000 ns 1.00
array/private/iteration/logical 2596709 ns 2638750 ns 0.98
array/private/iteration/scalar 5593562.5 ns 4889812.5 ns 1.14
array/random/rand/Float32 1170625 ns 1157334 ns 1.01
array/random/rand/Int64 1333667 ns 1281416 ns 1.04
array/random/rand!/Float32 909667 ns 933708 ns 0.97
array/random/rand!/Int64 868750 ns 877500 ns 0.99
array/random/randn/Float32 1072333.5 ns 1075625 ns 1.00
array/random/randn!/Float32 815041 ns 821083.5 ns 0.99
array/reductions/mapreduce/Float32/1d 1041812.5 ns 1026395.5 ns 1.02
array/reductions/mapreduce/Float32/dims=1 831041.5 ns 847750 ns 0.98
array/reductions/mapreduce/Float32/dims=1L 1333709 ns 1320958 ns 1.01
array/reductions/mapreduce/Float32/dims=2 851291 ns 856062.5 ns 0.99
array/reductions/mapreduce/Float32/dims=2L 1819833 ns 1802542 ns 1.01
array/reductions/mapreduce/Int64/1d 1523666.5 ns 1356854.5 ns 1.12
array/reductions/mapreduce/Int64/dims=1 1097000 ns 1113917 ns 0.98
array/reductions/mapreduce/Int64/dims=1L 2010833 ns 2048021 ns 0.98
array/reductions/mapreduce/Int64/dims=2 1140104 ns 1157312.5 ns 0.99
array/reductions/mapreduce/Int64/dims=2L 3618167 ns 3584708 ns 1.01
array/reductions/reduce/Float32/1d 1045417 ns 1030312.5 ns 1.01
array/reductions/reduce/Float32/dims=1 823437.5 ns 849459 ns 0.97
array/reductions/reduce/Float32/dims=1L 1317459 ns 1282250 ns 1.03
array/reductions/reduce/Float32/dims=2 852459 ns 779458.5 ns 1.09
array/reductions/reduce/Float32/dims=2L 1807771 ns 1811125 ns 1.00
array/reductions/reduce/Int64/1d 1534833 ns 1352583.5 ns 1.13
array/reductions/reduce/Int64/dims=1 1093833 ns 1105625.5 ns 0.99
array/reductions/reduce/Int64/dims=1L 2018458 ns 2038209 ns 0.99
array/reductions/reduce/Int64/dims=2 1149583.5 ns 1165084 ns 0.99
array/reductions/reduce/Int64/dims=2L 4224500.5 ns 4246833.5 ns 0.99
array/shared/copy 251875 ns 230667 ns 1.09
array/shared/copyto!/cpu_to_gpu 80791 ns 83667 ns 0.97
array/shared/copyto!/gpu_to_cpu 81250 ns 82334 ns 0.99
array/shared/copyto!/gpu_to_gpu 81583 ns 83209 ns 0.98
array/shared/iteration/findall/bool 1428500 ns 1414771 ns 1.01
array/shared/iteration/findall/int 1559895.5 ns 1566833 ns 1.00
array/shared/iteration/findfirst/bool 1631874.5 ns 1634542 ns 1.00
array/shared/iteration/findfirst/int 1650417 ns 1650729 ns 1.00
array/shared/iteration/findmin/1d 2092875 ns 2110166.5 ns 0.99
array/shared/iteration/findmin/2d 1795021 ns 1792229 ns 1.00
array/shared/iteration/logical 2452125 ns 2237270.5 ns 1.10
array/shared/iteration/scalar 199917 ns 207375 ns 0.96
integration/byval/reference 1563375 ns 1550250 ns 1.01
integration/byval/slices=1 1579687 ns 1587604.5 ns 1.00
integration/byval/slices=2 2612854.5 ns 2607167 ns 1.00
integration/byval/slices=3 7805520.5 ns 7747541 ns 1.01
integration/metaldevrt 864395.5 ns 880084 ns 0.98
kernel/indexing 621750 ns 597895.5 ns 1.04
kernel/indexing_checked 629124.5 ns 586250 ns 1.07
kernel/launch 11083 ns 11917 ns 0.93
kernel/rand 571167 ns 576375 ns 0.99
latency/import 1422002520.5 ns 1422194646 ns 1.00
latency/precompile 25434593458 ns 25428193000 ns 1.00
latency/ttfp 2340542917 ns 2340052813 ns 1.00
metal/synchronization/context 19667 ns 20166.5 ns 0.98
metal/synchronization/stream 18584 ns 19250 ns 0.97

This comment was automatically generated by workflow using github-action-benchmark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant