JIT ARM64-SVE: Add simple bitwise ops by a74nh · Pull Request #101762 · dotnet/runtime

a74nh · 2024-05-01T12:46:14Z

And, AndAcross, Or, OrAcross, Xor, XorAcross

Test results:

❯ ~/stress_tester.py $CORE_ROOT/corerun ./artifacts/tests/coreclr/linux.arm64.Checked/JIT/HardwareIntrinsics/HardwareIntrinsics_Arm_ro/HardwareIntrinsics_Arm_ro.dll Sve_And
Starting test: /home/alahay01/dotnet/runtime_sve_api/artifacts/tests/coreclr/linux.arm64.Checked/Tests/Core_Root/corerun ./artifacts/tests/coreclr/linux.arm64.Checked/JIT/HardwareIntrinsics/HardwareIntrinsics_Arm_ro/HardwareIntrinsics_Arm_ro.dll Sve_And
===================Running default===================
------------------- {} -------------------
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_And_sbyte() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_And_short() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_And_int() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_And_long() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_And_byte() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_And_ushort() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_And_uint() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_And_ulong() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_AndAcross_sbyte() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_AndAcross_short() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_AndAcross_int() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_AndAcross_long() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_AndAcross_byte() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_AndAcross_ushort() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_AndAcross_uint() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_AndAcross_ulong() : 7
===================Running jitstress===================
------------------- {'JitMinOpts': '1'} -------------------
------------------- {'JitStress': '1'} -------------------
------------------- {'JitStress': '2'} -------------------
------------------- {'JitStress': '1', 'TieredCompilation': '1'} -------------------
------------------- {'JitStress': '2', 'TieredCompilation': '1'} -------------------
------------------- {'TailcallStress': '1'} -------------------
------------------- {'ReadyToRun': '0'} -------------------
===================Running jitstressregs===================
------------------- {'JitStressRegs': '1'} -------------------
------------------- {'JitStressRegs': '2'} -------------------
------------------- {'JitStressRegs': '3'} -------------------
------------------- {'JitStressRegs': '4'} -------------------
------------------- {'JitStressRegs': '8'} -------------------
------------------- {'JitStressRegs': '0x10'} -------------------
------------------- {'JitStressRegs': '0x80'} -------------------
------------------- {'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStressRegs': '0x2000'} -------------------
===================Running jitstress2-jitstressregs===================
------------------- {'JitStress': '2', 'JitStressRegs': '1'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '2'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '3'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '4'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '8'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x10'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x80'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x2000'} -------------------

❯ ~/stress_tester.py $CORE_ROOT/corerun ./artifacts/tests/coreclr/linux.arm64.Checked/JIT/HardwareIntrinsics/HardwareIntrinsics_Arm_ro/HardwareIntrinsics_Arm_ro.dll Sve_Or
Starting test: /home/alahay01/dotnet/runtime_sve_api/artifacts/tests/coreclr/linux.arm64.Checked/Tests/Core_Root/corerun ./artifacts/tests/coreclr/linux.arm64.Checked/JIT/HardwareIntrinsics/HardwareIntrinsics_Arm_ro/HardwareIntrinsics_Arm_ro.dll Sve_Or
===================Running default===================
------------------- {} -------------------
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Or_sbyte() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Or_short() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Or_int() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Or_long() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Or_byte() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Or_ushort() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Or_uint() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Or_ulong() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_OrAcross_sbyte() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_OrAcross_short() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_OrAcross_int() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_OrAcross_long() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_OrAcross_byte() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_OrAcross_ushort() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_OrAcross_uint() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_OrAcross_ulong() : 7
===================Running jitstress===================
------------------- {'JitMinOpts': '1'} -------------------
------------------- {'JitStress': '1'} -------------------
------------------- {'JitStress': '2'} -------------------
------------------- {'JitStress': '1', 'TieredCompilation': '1'} -------------------
------------------- {'JitStress': '2', 'TieredCompilation': '1'} -------------------
------------------- {'TailcallStress': '1'} -------------------
------------------- {'ReadyToRun': '0'} -------------------
===================Running jitstressregs===================
------------------- {'JitStressRegs': '1'} -------------------
------------------- {'JitStressRegs': '2'} -------------------
------------------- {'JitStressRegs': '3'} -------------------
------------------- {'JitStressRegs': '4'} -------------------
------------------- {'JitStressRegs': '8'} -------------------
------------------- {'JitStressRegs': '0x10'} -------------------
------------------- {'JitStressRegs': '0x80'} -------------------
------------------- {'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStressRegs': '0x2000'} -------------------
===================Running jitstress2-jitstressregs===================
------------------- {'JitStress': '2', 'JitStressRegs': '1'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '2'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '3'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '4'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '8'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x10'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x80'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x2000'} -------------------


❯ ~/stress_tester.py $CORE_ROOT/corerun ./artifacts/tests/coreclr/linux.arm64.Checked/JIT/HardwareIntrinsics/HardwareIntrinsics_Arm_ro/HardwareIntrinsics_Arm_ro.dll Sve_Xor
Starting test: /home/alahay01/dotnet/runtime_sve_api/artifacts/tests/coreclr/linux.arm64.Checked/Tests/Core_Root/corerun ./artifacts/tests/coreclr/linux.arm64.Checked/JIT/HardwareIntrinsics/HardwareIntrinsics_Arm_ro/HardwareIntrinsics_Arm_ro.dll Sve_Xor
===================Running default===================
------------------- {} -------------------
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Xor_sbyte() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Xor_short() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Xor_int() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Xor_long() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Xor_byte() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Xor_ushort() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Xor_uint() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Xor_ulong() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_XorAcross_sbyte() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_XorAcross_short() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_XorAcross_int() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_XorAcross_long() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_XorAcross_byte() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_XorAcross_ushort() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_XorAcross_uint() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_XorAcross_ulong() : 7
===================Running jitstress===================
------------------- {'JitMinOpts': '1'} -------------------
------------------- {'JitStress': '1'} -------------------
------------------- {'JitStress': '2'} -------------------
------------------- {'JitStress': '1', 'TieredCompilation': '1'} -------------------
------------------- {'JitStress': '2', 'TieredCompilation': '1'} -------------------
------------------- {'TailcallStress': '1'} -------------------
------------------- {'ReadyToRun': '0'} -------------------
===================Running jitstressregs===================
------------------- {'JitStressRegs': '1'} -------------------
------------------- {'JitStressRegs': '2'} -------------------
------------------- {'JitStressRegs': '3'} -------------------
------------------- {'JitStressRegs': '4'} -------------------
------------------- {'JitStressRegs': '8'} -------------------
------------------- {'JitStressRegs': '0x10'} -------------------
------------------- {'JitStressRegs': '0x80'} -------------------
------------------- {'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStressRegs': '0x2000'} -------------------
===================Running jitstress2-jitstressregs===================
------------------- {'JitStress': '2', 'JitStressRegs': '1'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '2'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '3'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '4'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '8'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x10'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x80'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x2000'} -------------------

And,AndAcross,Or,OrAcross,Xor,XorAcross

ghost · 2024-05-01T12:46:21Z

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

dotnet-policy-service · 2024-05-01T12:46:44Z

Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics
See info in area-owners.md if you want to be subscribed.

a74nh · 2024-05-01T12:48:07Z

src/coreclr/jit/codegenarm64test.cpp

                              INS_OPTS_SCALABLE_D); // CLASTB  <Zdn>.<T>, <Pg>, <Zdn>.<T>, <Zm>.<T>

    // IF_SVE_CN_3A
-    theEmitter->emitIns_R_R_R(INS_sve_clasta, EA_2BYTE, REG_V12, REG_P1, REG_V15, INS_OPTS_SCALABLE_H,


Same changes as for AddAcross in previous PR - the size arg is not used, as the sizes are dependant on opts.

a74nh · 2024-05-01T12:48:38Z

src/coreclr/jit/emitarm64sve.cpp

            if (sopt == INS_SCALABLE_OPTS_UNPREDICATED)
            {
-                assert(opt == INS_OPTS_SCALABLE_D);
+                // The instruction only has a .D variant. However, this doesn't matter as


Doing this prevents adding special cases in hwinstrinccodegen.

a74nh · 2024-05-01T12:49:12Z

@dotnet/arm64-contrib @kunalspathak

kunalspathak

LGTM. Some nit comments

src/libraries/System.Runtime.Intrinsics/ref/System.Runtime.Intrinsics.cs

kunalspathak · 2024-05-02T05:28:45Z

src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Arm/Sve.cs

+        ///   MOVPRFX Zresult, Zop1; AND Zresult.B, Pg/M, Zresult.B, Zop2.B
+        /// svuint8_t svand[_u8]_x(svbool_t pg, svuint8_t op1, svuint8_t op2)
+        ///   AND Ztied1.B, Pg/M, Ztied1.B, Zop2.B
+        ///   AND Ztied2.B, Pg/M, Ztied2.B, Zop1.B


Suggested change

/// AND Ztied2.B, Pg/M, Ztied2.B, Zop1.B

Why do we have 2 entries of the predicated version? Here and elsewhere.

Line 250 is saying a = AND(a, b), whereas line 251 is showing b = AND(b, a)

It's a little awkward.

I don't think we need to list every possible variant here nor the mov instructions required to handle RMW cases, we definitely don't do that for any other intrinsics across Arm64, x64, or WASM.

The main intent is really just to give a brief overview of the C/C++ intrinsic and the primary hardware instruction emitted so that users can map things more easily and know the primary location to lookup to understand the instruction (kind of like a see-also).

Ideally we'd be able to basically quote the Arm64 architecture manual and give a better description (with the notes we currently have as actual see-also), but said manuals come with an explicit copyright/proprietary notice and so cannot be reproduced without express written permission (which means getting legal of both companies involved and getting the relevant agreement put together). So this is the next best thing.

Which is to say, I think we can just do what we do for other ISAs and simplify it down to a few lines:

/// svuint8_t svand[_u8]_m(svbool_t pg, svuint8_t op1, svuint8_t op2) /// svuint8_t svand[_u8]_x(svbool_t pg, svuint8_t op1, svuint8_t op2) /// svuint8_t svand[_u8]_z(svbool_t pg, svuint8_t op1, svuint8_t op2) /// AND <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T> /// AND <Zd>.D, <Zn>.D, <Zm>.D /// AND <Zdn>.<T>, <Zdn>.<T>, #<const> /// svbool_t svand[_b]_z(svbool_t pg, svbool_t op1, svbool_t op2) /// AND <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B

Which covers the 4x C/C++ intrinsics that map to this API and the 4x instruction entries that map, without getting into the implementation details of exactly how operands map to registers, how boilerplate instructions to handle RMW considerations are emitted (like mov, movprfx, etc), and without getting into how predication maps to the instructions (which is something to handle in a general conceptual doc that intrinsics can link to, not repeated per doc page).

I agree. I was allowing movprfx because that is something special in SVE land, but again I think it is an implementation RMW detail which we do not need in the summary docs.

Annoyingly I don't think that can be scripted. But, agreed with the approach, we'll have to simplify manually

Ok, fixed, but it's in the same style as existing:

/// <summary> /// svuint8_t svand[_u8]_m(svbool_t pg, svuint8_t op1, svuint8_t op2) /// svuint8_t svand[_u8]_x(svbool_t pg, svuint8_t op1, svuint8_t op2) /// svuint8_t svand[_u8]_z(svbool_t pg, svuint8_t op1, svuint8_t op2) /// AND Ztied1.B, Pg/M, Ztied1.B, Zop2.B /// AND Zresult.D, Zop1.D, Zop2.D /// svbool_t svand[_b]_z(svbool_t pg, svbool_t op1, svbool_t op2) /// AND Presult.B, Pg/Z, Pop1.B, Pop2.B /// </summary>

Easier to update from the existing autogenerated and the tied is useful information.

for the autogenerated, you can skip outputting the ones that has movprfx. Can you refresh my memory of what is tied?

for the autogenerated, you can skip outputting the ones that has movprfx.

I'll do that

Can you refresh my memory of what is tied?

Both args marked as tied are the same register. RW semantics.

src/coreclr/jit/emitarm64sve.cpp

kunalspathak

LGTM

* JIT ARM64-SVE: Add simple bitwise ops And,AndAcross,Or,OrAcross,Xor,XorAcross * Fix fadda * Fix unpkh/fexpa/frecpe * Reorder System.Runtime.Intrinsics.cs * Fix API head comments

JIT ARM64-SVE: Add simple bitwise ops

c442956

And,AndAcross,Or,OrAcross,Xor,XorAcross

ghost added area-System.Runtime.Intrinsics new-api-needs-documentation labels May 1, 2024

dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label May 1, 2024

a74nh commented May 1, 2024

View reviewed changes

a74nh marked this pull request as ready for review May 1, 2024 12:49

kunalspathak added the arm-sve Work related to arm64 SVE/SVE2 support label May 2, 2024

kunalspathak reviewed May 2, 2024

View reviewed changes

a74nh added 3 commits May 2, 2024 09:58

Fix fadda

f312257

Fix unpkh/fexpa/frecpe

f26f7ce

Reorder System.Runtime.Intrinsics.cs

7a23e95

a74nh mentioned this pull request May 2, 2024

JIT: ARM64 Assertion failed 'isScalableVectorSize(size)' #101786

Closed

build-analysis bot mentioned this pull request May 2, 2024

System.IO.Net5Compat.Tests and System.IO.Tests suddenly exiting with error 137 #100558

Closed

kunalspathak reviewed May 2, 2024

View reviewed changes

src/coreclr/jit/emitarm64sve.cpp Show resolved Hide resolved

Fix API head comments

8a92556

This was referenced May 2, 2024

Tracking issue for "WORKLOAD TIMED OUT" #90309

Closed

[8.0][ios][arm64][mono] Build error: SecKeychainUnlock signing-certs.keychain-db: user name or passphrase incorrect #101830

Closed

kunalspathak approved these changes May 3, 2024

View reviewed changes

kunalspathak merged commit f01a146 into dotnet:main May 3, 2024

kunalspathak mentioned this pull request May 3, 2024

Arm64: Implement SVE APIs #99957

Closed

a74nh deleted the simple_bitwise_github branch May 3, 2024 15:55

github-actions bot locked and limited conversation to collaborators Jun 4, 2024

Conversation

a74nh commented May 1, 2024

Uh oh!

ghost commented May 1, 2024

Uh oh!

dotnet-policy-service bot commented May 1, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

a74nh commented May 1, 2024

Uh oh!

kunalspathak left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kunalspathak left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants