Skip to content

JIT: Merge more stores/loads#95823

Closed
EgorBo wants to merge 6 commits intodotnet:mainfrom
EgorBo:merge-more-stores
Closed

JIT: Merge more stores/loads#95823
EgorBo wants to merge 6 commits intodotnet:mainfrom
EgorBo:merge-more-stores

Conversation

@EgorBo
Copy link
Member

@EgorBo EgorBo commented Dec 9, 2023

class MyClass
{
    public byte A;
    public byte B;
    public char C;
    public int D;
}

void Test(MyClass c1, MyClass c2)
{
    c1.A = c2.A;
    c1.B = c2.B;
    c1.C = c2.C;
    c1.D = c2.D;
}

Codegen diff for Test:

; Method Program:Test(MyClass,MyClass):this (FullOpts)
-      movzx    rax, byte  ptr [r8+0x0E]
-      mov      byte  ptr [rdx+0x0E], al
-      movzx    rax, byte  ptr [r8+0x0F]
-      mov      byte  ptr [rdx+0x0F], al
-      movzx    rax, word  ptr [r8+0x0C]
-      mov      word  ptr [rdx+0x0C], ax
-      mov      eax, dword ptr [r8+0x08]
-      mov      dword ptr [rdx+0x08], eax
+      mov      rax, qword ptr [r8+0x08]
+      mov      qword ptr [rdx+0x08], rax
       ret      
-; Total bytes of code: 33
+; Total bytes of code: 9

For more diffs it needs a sort of "forward sub" in lowering to handle cases like

a[0] = b[0];
a[1] = b[1]

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Dec 9, 2023
@ghost ghost assigned EgorBo Dec 9, 2023
@ghost
Copy link

ghost commented Dec 9, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details
class MyClass
{
    public byte A;
    public byte B;
    public char C;
    public int D;
}

void Test(MyClass c1, MyClass c2)
{
    c1.A = c2.A;
    c1.B = c2.B;
    c1.C = c2.C;
    c1.D = c2.D;
}

Codegen diff for Test:

; Method Program:Test(MyClass,MyClass):this (FullOpts)
-      movzx    rax, byte  ptr [r8+0x0E]
-      mov      byte  ptr [rdx+0x0E], al
-      movzx    rax, byte  ptr [r8+0x0F]
-      mov      byte  ptr [rdx+0x0F], al
-      movzx    rax, word  ptr [r8+0x0C]
-      mov      word  ptr [rdx+0x0C], ax
-      mov      eax, dword ptr [r8+0x08]
-      mov      dword ptr [rdx+0x08], eax
+      mov      rax, qword ptr [r8+0x08]
+      mov      qword ptr [rdx+0x08], rax
       ret      
-; Total bytes of code: 33
+; Total bytes of code: 9

For more diffs it needs a sort of "forward sub" in lowering to handle cases like

a[0] = b[0];
a[1] = b[1]

etc

Author: EgorBo
Assignees: EgorBo
Labels:

area-CodeGen-coreclr

Milestone: -

@EgorBo EgorBo marked this pull request as ready for review December 9, 2023 20:42
//
// IND<byte> is always fine (and all IND<X> created here from such)
// IND<simd> is not required to be atomic per our Memory Model
const bool allowsNonAtomic = data1.allowsNonAtomic && data2.allowsNonAtomic;
Copy link
Member

@am11 am11 Dec 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make a difference (cover more cases) if it was a positive check: allowAtomic?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I follow - can you clarify?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was just wondering if the condition below actually care for "if atomics are allowed, do the transformation" instead of "if non-atomics are allowed, skip the transformation", then maybe it can be based on allowAtomic? Not sure if it's feasible or would make any difference. 😅

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea that we assume that we can't use wider loads by default, we need to find a proof that we can and we only have a few hints for that today so it's quite conservative

@EgorBo
Copy link
Member Author

EgorBo commented Dec 11, 2023

@jakobbotsch PTAL since you reviewed the initial version cc @dotnet/jit-contrib

Diffs are not too big due to conservative alias analysis, but I have some future improvements in mind which might increase the coverage (e.g. GT_STORE_LCL_FLD)

@EgorBo EgorBo requested a review from jakobbotsch December 11, 2023 16:08
Comment on lines +8058 to +8079
// * STOREIND int
// +--* LEA(b+8) byref
// | \--* LCL_VAR ref
// \--* IND int
// \--* LEA(b+8) byref
// \--* LCL_VAR ref
//
// * STOREIND int
// +--* LEA(b+12) byref
// | \--* LCL_VAR ref
// \--* IND int
// \--* LEA(b+12) byref
// \--* LCL_VAR ref
//
// is transformed into:
//
// * STOREIND long
// +--* LEA(b+8) byref
// | \--* LCL_VAR ref
// \--* IND long
// \--* LEA(b+8) byref
// \--* LCL_VAR ref
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation of these trees seems wrong.

Comment on lines +7887 to 7892
// Data is either a constant or GT_IND
// TODO-CoalescingStores: allow locals (to then broadcast them)
if (isStore && !ind->Data()->IsCnsIntOrI() && !ind->Data()->IsVectorConst() && !ind->Data()->OperIs(GT_IND))
{
return false;
}
Copy link
Member

@jakobbotsch jakobbotsch Dec 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic seems like it needs some form of interference checking somewhere when the data is GT_IND. What if the definition occurs 200 nodes before, with interfering stores? (nevermind, looks like LowerStoreIndirCoalescing takes care of it by virtue of ensuring things are contiguous)

// Get coalescing data for the previous STOREIND
GenTreeStoreInd* prevInd = prevTree->AsStoreInd();
if (!GetStoreCoalescingData(comp, prevInd->AsStoreInd(), &prevData))
if (!GetLoadStoreCoalescingData(comp, prevInd->AsStoreInd(), &prevData) || !CanBeCoalesced(prevData, currData))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I guess the loop above this indirectly ensures that there won't be any interference between the two INDs.

Comment on lines +8342 to +8352
const int storeStart = min(currData.offset, prevData.offset);
const int loadStart = min(currValueData.offset, prevValueData.offset);

const int smallerOffset = min(storeStart, loadStart);
const int largerOffset = max(storeStart, loadStart);

if ((smallerOffset != largerOffset) && ((smallerOffset + (int)genTypeSize(newType)) > largerOffset))
{
// May alias
return;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I totally understand this logic. Can it use a normal interval intersection check? Like e.g.

//------------------------------------------------------------------------
// Intersects:
// Check if this segment intersects another segment.
//
// Parameters:
// other - The other segment.
//
// Returns:
// True if so.
//
bool StructSegments::Segment::Intersects(const Segment& other) const
{
if (End <= other.Start)
{
return false;
}
if (other.End <= Start)
{
return false;
}
return true;
}

@jakobbotsch
Copy link
Member

e.g. GT_STORE_LCL_FLD

When I checked LCL_FLD in relation to #92768 I found almost no cases where they could be optimized to ldp or stp, so it seems they are rare (or my check was wrong).

@EgorBo
Copy link
Member Author

EgorBo commented Apr 30, 2024

I'll re-think this one later

@EgorBo EgorBo closed this Apr 30, 2024
@github-actions github-actions bot locked and limited conversation to collaborators May 31, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants