JIT ARM64-SVE: Add simple bitwise ops#101762
Conversation
And,AndAcross,Or,OrAcross,Xor,XorAcross
|
Note regarding the |
|
Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics |
| INS_OPTS_SCALABLE_D); // CLASTB <Zdn>.<T>, <Pg>, <Zdn>.<T>, <Zm>.<T> | ||
|
|
||
| // IF_SVE_CN_3A | ||
| theEmitter->emitIns_R_R_R(INS_sve_clasta, EA_2BYTE, REG_V12, REG_P1, REG_V15, INS_OPTS_SCALABLE_H, |
There was a problem hiding this comment.
Same changes as for AddAcross in previous PR - the size arg is not used, as the sizes are dependant on opts.
| if (sopt == INS_SCALABLE_OPTS_UNPREDICATED) | ||
| { | ||
| assert(opt == INS_OPTS_SCALABLE_D); | ||
| // The instruction only has a .D variant. However, this doesn't matter as |
There was a problem hiding this comment.
Doing this prevents adding special cases in hwinstrinccodegen.
|
@dotnet/arm64-contrib @kunalspathak |
kunalspathak
left a comment
There was a problem hiding this comment.
LGTM. Some nit comments
src/libraries/System.Runtime.Intrinsics/ref/System.Runtime.Intrinsics.cs
Outdated
Show resolved
Hide resolved
| /// MOVPRFX Zresult, Zop1; AND Zresult.B, Pg/M, Zresult.B, Zop2.B | ||
| /// svuint8_t svand[_u8]_x(svbool_t pg, svuint8_t op1, svuint8_t op2) | ||
| /// AND Ztied1.B, Pg/M, Ztied1.B, Zop2.B | ||
| /// AND Ztied2.B, Pg/M, Ztied2.B, Zop1.B |
There was a problem hiding this comment.
| /// AND Ztied2.B, Pg/M, Ztied2.B, Zop1.B |
Why do we have 2 entries of the predicated version? Here and elsewhere.
There was a problem hiding this comment.
Line 250 is saying a = AND(a, b), whereas line 251 is showing b = AND(b, a)
It's a little awkward.
There was a problem hiding this comment.
I don't think we need to list every possible variant here nor the mov instructions required to handle RMW cases, we definitely don't do that for any other intrinsics across Arm64, x64, or WASM.
The main intent is really just to give a brief overview of the C/C++ intrinsic and the primary hardware instruction emitted so that users can map things more easily and know the primary location to lookup to understand the instruction (kind of like a see-also).
Ideally we'd be able to basically quote the Arm64 architecture manual and give a better description (with the notes we currently have as actual see-also), but said manuals come with an explicit copyright/proprietary notice and so cannot be reproduced without express written permission (which means getting legal of both companies involved and getting the relevant agreement put together). So this is the next best thing.
There was a problem hiding this comment.
Which is to say, I think we can just do what we do for other ISAs and simplify it down to a few lines:
/// svuint8_t svand[_u8]_m(svbool_t pg, svuint8_t op1, svuint8_t op2)
/// svuint8_t svand[_u8]_x(svbool_t pg, svuint8_t op1, svuint8_t op2)
/// svuint8_t svand[_u8]_z(svbool_t pg, svuint8_t op1, svuint8_t op2)
/// AND <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>
/// AND <Zd>.D, <Zn>.D, <Zm>.D
/// AND <Zdn>.<T>, <Zdn>.<T>, #<const>
/// svbool_t svand[_b]_z(svbool_t pg, svbool_t op1, svbool_t op2)
/// AND <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.BWhich covers the 4x C/C++ intrinsics that map to this API and the 4x instruction entries that map, without getting into the implementation details of exactly how operands map to registers, how boilerplate instructions to handle RMW considerations are emitted (like mov, movprfx, etc), and without getting into how predication maps to the instructions (which is something to handle in a general conceptual doc that intrinsics can link to, not repeated per doc page).
There was a problem hiding this comment.
I agree. I was allowing movprfx because that is something special in SVE land, but again I think it is an implementation RMW detail which we do not need in the summary docs.
There was a problem hiding this comment.
Annoyingly I don't think that can be scripted. But, agreed with the approach, we'll have to simplify manually
There was a problem hiding this comment.
Ok, fixed, but it's in the same style as existing:
/// <summary>
/// svuint8_t svand[_u8]_m(svbool_t pg, svuint8_t op1, svuint8_t op2)
/// svuint8_t svand[_u8]_x(svbool_t pg, svuint8_t op1, svuint8_t op2)
/// svuint8_t svand[_u8]_z(svbool_t pg, svuint8_t op1, svuint8_t op2)
/// AND Ztied1.B, Pg/M, Ztied1.B, Zop2.B
/// AND Zresult.D, Zop1.D, Zop2.D
/// svbool_t svand[_b]_z(svbool_t pg, svbool_t op1, svbool_t op2)
/// AND Presult.B, Pg/Z, Pop1.B, Pop2.B
/// </summary>
Easier to update from the existing autogenerated and the tied is useful information.
There was a problem hiding this comment.
for the autogenerated, you can skip outputting the ones that has movprfx. Can you refresh my memory of what is tied?
There was a problem hiding this comment.
for the autogenerated, you can skip outputting the ones that has
movprfx.
I'll do that
Can you refresh my memory of what is
tied?
Both args marked as tied are the same register. RW semantics.
* JIT ARM64-SVE: Add simple bitwise ops And,AndAcross,Or,OrAcross,Xor,XorAcross * Fix fadda * Fix unpkh/fexpa/frecpe * Reorder System.Runtime.Intrinsics.cs * Fix API head comments
* JIT ARM64-SVE: Add simple bitwise ops And,AndAcross,Or,OrAcross,Xor,XorAcross * Fix fadda * Fix unpkh/fexpa/frecpe * Reorder System.Runtime.Intrinsics.cs * Fix API head comments
And, AndAcross, Or, OrAcross, Xor, XorAcross
Test results: