Skip to content

Remove serialization lock in checksum synchronization.#72928

Merged
CyrusNajmabadi merged 7 commits intodotnet:mainfrom
CyrusNajmabadi:removeLock
Apr 8, 2024
Merged

Remove serialization lock in checksum synchronization.#72928
CyrusNajmabadi merged 7 commits intodotnet:mainfrom
CyrusNajmabadi:removeLock

Conversation

@CyrusNajmabadi
Copy link
Contributor

@CyrusNajmabadi CyrusNajmabadi commented Apr 8, 2024

I collected measurements, and this lock does not seem to help at all.

Some important pieces of information:

With this lock, it takes 20s to synchronize all data. Without it, it takes 12s. That's a huge win.
With this lock, we make 3750 sync calls. Without the lock it drops to 3250. Another nice win.

This PR should be read a commit at a time.

One interesting fact about the sync info is just how many single-checksum syncs we perform. Prior to this change, we perform 1816 (so roughly 50% of the calls). With this change, it drops to 1269 (so roughly 40% of the calls).

However, for both, the numbers seem quite high. During initial sync, i would expect many more large syncs. I'd like to collect more data to see if it helps explain what's going on before/after.

@CyrusNajmabadi CyrusNajmabadi requested a review from a team as a code owner April 8, 2024 01:55
@ghost ghost added Area-IDE untriaged Issues and PRs which have not yet been triaged by a lead labels Apr 8, 2024
@CyrusNajmabadi CyrusNajmabadi requested a review from ToddGrun April 8, 2024 02:15
@CyrusNajmabadi
Copy link
Contributor Author

Plugging into google sheets, i'm slicing and dicing some of the information:

image

What this is is the count of calls where we are only asking for '1' checksum for that particular type. The type is based on hte strong-type passed into the sync call for what we want back. The 'object' line is for calls that are asking for heterogenous data.

@CyrusNajmabadi
Copy link
Contributor Author

CyrusNajmabadi commented Apr 8, 2024

Ok. I can make a trivial change to get us to:

image

Looks like DocumentStateChecksums are where we can also get some nice benefits.

@ToddGrun
Copy link
Contributor

ToddGrun commented Apr 8, 2024

I can make a trivial change to get us to:

Is there still an incoming change into this PR?

@CyrusNajmabadi
Copy link
Contributor Author

@ToddGrun I'm doing it in follow ups

@sharwell
Copy link
Contributor

sharwell commented Apr 8, 2024

Are the improvements due to actual concurrency happening now, or are they simply due to eliminating lock overhead? If the improvements are concurrency, what is the parallelized entry point?

Would a batching work queue make any difference here?

@ToddGrun
Copy link
Contributor

ToddGrun commented Apr 8, 2024

I don't understand what this comment means wrt calls for a single checksum

The 'object' line is for calls that are asking for heterogenous data.

@CyrusNajmabadi
Copy link
Contributor Author

I don't understand what this comment means wrt calls for a single checksum

Responding offline.

@CyrusNajmabadi
Copy link
Contributor Author

Are the improvements due to actual concurrency happening now,

Actual concurrency. And tehre is ample opportunity for improving that even more. We can be heavily concurrent here. I just have to measure and make sure it's worthwhile.

@sharwell
Copy link
Contributor

sharwell commented Apr 8, 2024

FWIW, historically I've found the greatest gains in cross-process communication to come from sequencing calls such that outgoing call 2 waits for outgoing message for call 1 to finish, but does not wait for the response to call 1 to be received. I've seen it's common for implementations to wait for an entire round trip to complete before sending the next outgoing message, which degrades total throughput by much more than expected.

@CyrusNajmabadi
Copy link
Contributor Author

Sure. we can continue looking further at this. I don't see why we can't make of this much more concurrent anyways. Regardless, this is a pure win, so i'd like to take thsi, while making more changes in followups.

@sharwell
Copy link
Contributor

sharwell commented Apr 8, 2024

I have no objections to the conceptual change here, but didn't review the change in detail w.r.t. allowing concurrency in the operations.

@CyrusNajmabadi
Copy link
Contributor Author

CyrusNajmabadi commented Apr 8, 2024

but didn't review the change in detail w.r.t. allowing concurrency in the operations.

OOP syncing is already concurrent (most of hte time). we just had a lock around some batch operations. This then slows down a lot of normal IDE work. For example, while we're doing the 'full bg solution sync', we're updating OOP one project at a time for it. During this time, if a call comes in to do some feature on the current project, and we decide to fully sync that project, it's gated on that same serialization lock that the full-bg-solution-sync is using.

@ToddGrun
Copy link
Contributor

ToddGrun commented Apr 8, 2024

This change LGTM, but I'm not sure if Sam wants to dig in deeper.

Copy link
Contributor

@ToddGrun ToddGrun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@CyrusNajmabadi
Copy link
Contributor Author

I'm going to move forward. Def happy to get more feedback (esp. as more changes come here).

@CyrusNajmabadi CyrusNajmabadi merged commit 86d28ec into dotnet:main Apr 8, 2024
@CyrusNajmabadi CyrusNajmabadi deleted the removeLock branch April 8, 2024 20:32
@dotnet-policy-service dotnet-policy-service bot added this to the Next milestone Apr 8, 2024
@CyrusNajmabadi
Copy link
Contributor Author

@jasonmalinowski For review when you get back.

@dibarbet dibarbet modified the milestones: Next, 17.11 P1 Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area-IDE untriaged Issues and PRs which have not yet been triaged by a lead

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants