Skip to content

Various pointer-related improvements#765

Merged
christiangnrd merged 10 commits intomainfrom
memrefactor
Apr 11, 2026
Merged

Various pointer-related improvements#765
christiangnrd merged 10 commits intomainfrom
memrefactor

Conversation

@christiangnrd
Copy link
Copy Markdown
Member

Each commit should be a very quick review. Best viewed with whitespace hidden.

Some code coverage improvements, bug fixes, and simplification.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 11, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 83.02%. Comparing base (1af3aef) to head (a5acbf5).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #765      +/-   ##
==========================================
+ Coverage   82.66%   83.02%   +0.36%     
==========================================
  Files          62       62              
  Lines        2872     2852      -20     
==========================================
- Hits         2374     2368       -6     
+ Misses        498      484      -14     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metal Benchmarks

Details
Benchmark suite Current: a5acbf5 Previous: 1af3aef Ratio
array/accumulate/Float32/1d 1157708 ns 1113770.5 ns 1.04
array/accumulate/Float32/dims=1 1574625.5 ns 1551084 ns 1.02
array/accumulate/Float32/dims=1L 9826688 ns 9823458.5 ns 1.00
array/accumulate/Float32/dims=2 1918875 ns 1869125 ns 1.03
array/accumulate/Float32/dims=2L 7253791.5 ns 7234500 ns 1.00
array/accumulate/Int64/1d 1270791 ns 1259292 ns 1.01
array/accumulate/Int64/dims=1 1819479.5 ns 1833292 ns 0.99
array/accumulate/Int64/dims=1L 11682958 ns 11548646 ns 1.01
array/accumulate/Int64/dims=2 2200208.5 ns 2156000 ns 1.02
array/accumulate/Int64/dims=2L 9796271 ns 9765500 ns 1.00
array/broadcast 610125 ns 608792 ns 1.00
array/construct 6709 ns 6000 ns 1.12
array/permutedims/2d 1191167 ns 1167020.5 ns 1.02
array/permutedims/3d 1699625 ns 1679667 ns 1.01
array/permutedims/4d 2419334 ns 2380542 ns 1.02
array/private/copy 585583 ns 596666 ns 0.98
array/private/copyto!/cpu_to_gpu 806583 ns 786083.5 ns 1.03
array/private/copyto!/gpu_to_cpu 812000 ns 796625 ns 1.02
array/private/copyto!/gpu_to_gpu 647958 ns 631500 ns 1.03
array/private/iteration/findall/bool 1408584 ns 1411938 ns 1.00
array/private/iteration/findall/int 1578875 ns 1563375 ns 1.01
array/private/iteration/findfirst/bool 2068437.5 ns 2023833 ns 1.02
array/private/iteration/findfirst/int 2133458.5 ns 2045020.5 ns 1.04
array/private/iteration/findmin/1d 2530542 ns 2470417 ns 1.02
array/private/iteration/findmin/2d 1828667 ns 1774854 ns 1.03
array/private/iteration/logical 2701334 ns 2618500 ns 1.03
array/private/iteration/scalar 4795375 ns 4934917 ns 0.97
array/random/rand/Float32 584250 ns 634500 ns 0.92
array/random/rand/Int64 794375 ns 771417 ns 1.03
array/random/rand!/Float32 584833 ns 578709 ns 1.01
array/random/rand!/Int64 553916 ns 547250 ns 1.01
array/random/randn/Float32 976292 ns 1008833.5 ns 0.97
array/random/randn!/Float32 755750 ns 714625 ns 1.06
array/reductions/mapreduce/Float32/1d 1073333 ns 1041062.5 ns 1.03
array/reductions/mapreduce/Float32/dims=1 846583.5 ns 838292 ns 1.01
array/reductions/mapreduce/Float32/dims=1L 1346709 ns 1563812.5 ns 0.86
array/reductions/mapreduce/Float32/dims=2 861917 ns 864854 ns 1.00
array/reductions/mapreduce/Float32/dims=2L 1828000 ns 1798229.5 ns 1.02
array/reductions/mapreduce/Int64/1d 1546000 ns 1534250 ns 1.01
array/reductions/mapreduce/Int64/dims=1 1113250 ns 1102709 ns 1.01
array/reductions/mapreduce/Int64/dims=1L 2035687 ns 1996770.5 ns 1.02
array/reductions/mapreduce/Int64/dims=2 1175167 ns 1183062.5 ns 0.99
array/reductions/mapreduce/Int64/dims=2L 3639562.5 ns 3617646 ns 1.01
array/reductions/reduce/Float32/1d 1051062 ns 1029479.5 ns 1.02
array/reductions/reduce/Float32/dims=1 843166 ns 830333 ns 1.02
array/reductions/reduce/Float32/dims=1L 1333499.5 ns 1317584 ns 1.01
array/reductions/reduce/Float32/dims=2 867958.5 ns 856145.5 ns 1.01
array/reductions/reduce/Float32/dims=2L 1816875 ns 1795250 ns 1.01
array/reductions/reduce/Int64/1d 1399437.5 ns 1500500 ns 0.93
array/reductions/reduce/Int64/dims=1 1112792 ns 1106458 ns 1.01
array/reductions/reduce/Int64/dims=1L 2017437.5 ns 2016083 ns 1.00
array/reductions/reduce/Int64/dims=2 1173833 ns 1154354.5 ns 1.02
array/reductions/reduce/Int64/dims=2L 4246374.5 ns 4213709 ns 1.01
array/shared/copy 243000 ns 276625 ns 0.88
array/shared/copyto!/cpu_to_gpu 82042 ns 97833 ns 0.84
array/shared/copyto!/gpu_to_cpu 83958 ns 98208 ns 0.85
array/shared/copyto!/gpu_to_gpu 83666 ns 122000 ns 0.69
array/shared/iteration/findall/bool 1432541 ns 1433249.5 ns 1.00
array/shared/iteration/findall/int 1577875 ns 1571812.5 ns 1.00
array/shared/iteration/findfirst/bool 1653354.5 ns 1630083 ns 1.01
array/shared/iteration/findfirst/int 1676834 ns 1635084 ns 1.03
array/shared/iteration/findmin/1d 2134292 ns 2098979 ns 1.02
array/shared/iteration/findmin/2d 1819958.5 ns 1779458 ns 1.02
array/shared/iteration/logical 2265937.5 ns 2292083 ns 0.99
array/shared/iteration/scalar 213375 ns 202833 ns 1.05
integration/byval/reference 1589042 ns 1549208 ns 1.03
integration/byval/slices=1 1604833 ns 1564292 ns 1.03
integration/byval/slices=2 2669875 ns 2597625 ns 1.03
integration/byval/slices=3 10157500 ns 8449250 ns 1.20
integration/metaldevrt 881833 ns 863917 ns 1.02
kernel/indexing 608500 ns 594792 ns 1.02
kernel/indexing_checked 621708 ns 606541.5 ns 1.03
kernel/launch 12667 ns 11833 ns 1.07
kernel/rand 570459 ns 569167 ns 1.00
latency/import 1424736708.5 ns 1420952187.5 ns 1.00
latency/precompile 25499036000 ns 25612710875 ns 1.00
latency/ttfp 2349867458.5 ns 2339829875 ns 1.00
metal/synchronization/context 20541 ns 20000 ns 1.03
metal/synchronization/stream 19667 ns 19000 ns 1.04

This comment was automatically generated by workflow using github-action-benchmark.

@christiangnrd
Copy link
Copy Markdown
Member Author

Copy-related performance improvements and regressions seem to be noise.

@christiangnrd christiangnrd merged commit b94fd4b into main Apr 11, 2026
16 checks passed
@christiangnrd christiangnrd deleted the memrefactor branch April 11, 2026 19:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant