[arm64] JIT: Recognize sbfiz/ubfiz idioms#61045
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsThis PR recognizes '(ulong)x << cns' idioms in order to emit sbfiz/ubfiz. This patterns shows up in array accesses and while I am planning to fix array accesses in #61026 differently it still makes sense to have it. // explicit pattern:
static ulong Test1(uint x) => ((ulong)x) << 2;
// implicit pattern:
static int Test2(int[] array, int i) => array[i];Codegen diff: ; Method Prog:Test1(int):long
G_M16463_IG01:
stp fp, lr, [sp,#-16]!
mov fp, sp
G_M16463_IG02:
- mov w0, w0
- lsl x0, x0, #2
+ ubfiz x0, x0, #2, #32
G_M16463_IG03:
ldp fp, lr, [sp],#16
ret lr
-; Total bytes of code: 24
+; Total bytes of code: 20
; Method Prog:Test2(System.Int32[],int):int
G_M15622_IG01:
stp fp, lr, [sp,#-16]!
mov fp, sp
G_M15622_IG02:
ldr w2, [x0,#8]
cmp w1, w2
bhs G_M15622_IG04
- mov w1, w1
- lsl x1, x1, #2
+ ubfiz x1, x1, #2, #32
add x1, x1, #16
ldr w0, [x0, x1]
G_M15622_IG03:
ldp fp, lr, [sp],#16
ret lr
G_M15622_IG04:
bl CORINFO_HELP_RNGCHKFAIL
bkpt
-; Total bytes of code: 52
+; Total bytes of code: 48Diffs are impressive and gives us a hint we should implement proper "addressing modes" for arm64 asap 🙂 coreclr_tests.pmi.Linux.arm64.checked.mch:Detail diffslibraries.crossgen2.Linux.arm64.checked.mch:Detail diffslibraries.pmi.Linux.arm64.checked.mch:Detail diffslibraries_tests.pmi.Linux.arm64.checked.mch:Detail diffs
|
|
has most regressions. is there anything special about this pattern which disagrees with bfiz optimization? |
@kasperk81 it's loop-alignment artifacts, e.g. https://www.diffchecker.com/La4x8Sgz |
|
@SingleAccretion could you please take another look, I added smallint support (diffs are updated) and removed redundant checks. |
using System.Collections.Generic;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
public class Program
{
static void Main(string[] args) =>
BenchmarkSwitcher.FromAssembly(typeof(Program).Assembly).Run(args);
public IEnumerable<object[]> TestData()
{
yield return new object[] { new int[100], new int[100] };
yield return new object[] { new int[1000], new int[1000] };
yield return new object[] { new int[100000], new int[100000] };
}
[Benchmark]
[ArgumentsSource(nameof(TestData))]
public void CopyArray(int[] src, int[] dst)
{
for (int i = 0; i < src.Length; i++)
dst[i] = src[i];
}
}Results on Apple M1 arm64: (tested with and without loop alignment) codegen diff: https://www.diffchecker.com/pLkuGMn6 (yes, address calculation is still not perfect and is not hoisted, but I work on it) |
SingleAccretion
left a comment
There was a problem hiding this comment.
Looks great to me (modulo one note)!
|
@dotnet/jit-contrib PTAL, should be ready to review/merge |
|
arm64 improvements: dotnet/perf-autofiling-issues#2247 and dotnet/perf-autofiling-issues#2248 |
This PR recognizes '(ulong)x << cns' idioms in order to emit sbfiz/ubfiz. This patterns shows up in array accesses and while I am planning to fix array accesses in #61026 differently it still makes sense to have this peephole for explicit patterns.
Example:
Codegen diff:
; Method Prog:Test1(int):long G_M16463_IG01: stp fp, lr, [sp,#-16]! mov fp, sp G_M16463_IG02: - mov w0, w0 - lsl x0, x0, #2 + ubfiz x0, x0, #2, #32 G_M16463_IG03: ldp fp, lr, [sp],#16 ret lr -; Total bytes of code: 24 +; Total bytes of code: 20 ; Method Prog:Test2(System.Int32[],int):int G_M15622_IG01: stp fp, lr, [sp,#-16]! mov fp, sp G_M15622_IG02: ldr w2, [x0,#8] cmp w1, w2 bhs G_M15622_IG04 - mov w1, w1 - lsl x1, x1, #2 + ubfiz x1, x1, #2, #32 add x1, x1, #16 ldr w0, [x0, x1] G_M15622_IG03: ldp fp, lr, [sp],#16 ret lr G_M15622_IG04: bl CORINFO_HELP_RNGCHKFAIL bkpt -; Total bytes of code: 52 +; Total bytes of code: 48Diffs are impressive and gives us a hint we should implement proper "addressing modes" for arm64 asap 🙂
coreclr_tests.pmi.Linux.arm64.checked.mch:
Detail diffs
libraries.crossgen2.Linux.arm64.checked.mch:
Detail diffs
libraries.pmi.Linux.arm64.checked.mch:
Detail diffs
libraries_tests.pmi.Linux.arm64.checked.mch:
Detail diffs