Add continue request usage by Wraith2 · Pull Request #3605 · dotnet/SqlClient

Wraith2 · 2025-09-09T11:14:21Z

Builds on the stable base provided by #3534 and adds the optional async-continue capability back in. It is optional, it is not on by default in this build. To use it you must set a switch: AppContext.SetSwitch("Switch.Microsoft.Data.SqlClient.UseCompatibilityAsyncBehaviour", false); this will use the new multiplexer and async-continue support already defaults to enabled.

This changes the way that continue is used from the original implementation. Originally continue was always available which proved to cause problems with routes through the code that did not expect to be able to fail causing lost context. This new approach runs as if continue is disabled until a function is called which explicitly requests continue mode to be enabled. The only functions that can do this are the ones that read multi-packet string or binary data. The request is cleared when the read completes so it cannot affect other reads from the same packet.

Benchmark results:

All Reads were 10 Mib of data.

BinaryRead

Method	UseContinue	Mean	Error	StdDev	Gen0	Gen1	Gen2	Allocated
Async	False	5,840.26 ms	94.748 ms	88.627 ms	2000.0000	1000.0000	1000.0000	40.81 MB
StreamAsync	False	46.91 ms	0.925 ms	1.692 ms	3083.3333	1833.3333	1833.3333	51.8 MB
Sync	False	26.87 ms	0.534 ms	1.042 ms	1562.5000	906.2500	906.2500	30.73 MB
Async	True	50.78 ms	1.003 ms	1.561 ms	2181.8182	2090.9091	909.0909	40.82 MB
StreamAsync	True	58.23 ms	1.114 ms	2.469 ms	2888.8889	1666.6667	1666.6667	51.8 MB
Sync	True	26.50 ms	0.509 ms	0.522 ms	1562.5000	906.2500	906.2500	30.73 MB

StringRead

Method	UseContinue	Mean	Error	StdDev	Gen0	Gen1	Gen2	Allocated
Async	False	5,832.31 ms	114.512 ms	191.324 ms	1000.0000	-	-	50.8 MB
Sync	False	34.32 ms	0.686 ms	1.026 ms	1000.0000	400.0000	400.0000	40.72 MB
Async	True	66.26 ms	0.960 ms	1.895 ms	1625.0000	1500.0000	375.0000	50.81 MB
Sync	True	35.14 ms	0.694 ms	1.215 ms	1000.0000	400.0000	400.0000	40.72 MB

XmlRead

Method	UseContinue	Mean	Error	StdDev	Gen0	Gen1	Gen2	Allocated
Sync	False	92.20 ms	1.334 ms	1.183 ms	3166.6667	3000.0000	1000.0000	38.57 MB
Async	False	182,897.65 ms	3,447.733 ms	5,053.643 ms	31360000.0000	31012000.0000	9640000.0000	360347.15 MB
Sync	True	89.87 ms	0.803 ms	0.712 ms	3166.6667	3000.0000	1000.0000	38.57 MB
Async	True	128.25 ms	2.192 ms	3.213 ms	4800.0000	4600.0000	1400.0000	56.39 MB

Here are the benchmark files that I used to get these numbers. You'll need connection strings and to generate a random 10Mib xml files from somewhere on the web. Benchmarks.zip
The async xml numbers are pretty unbelievable. If you replicate or improve the benches it would be good to get perf corroboration.

@ErikEJ this will be the build you'll want to test and then possible make available to people, as discussed on the previous PR.
@edwardneal if you have the resources could you run your fantastic test suite against this version and see if you can identify any new bugs?
@dotnet/sqlclientdevteam can you run CI please.

ErikEJ · 2025-09-09T11:51:43Z

@Wraith2 This looks awesome - looks like you also fixed the issue with increased allocations..

I will give this a try against the repro VM asap

paulmedynski · 2025-09-09T12:01:19Z

/azp run

azure-pipelines · 2025-09-09T12:01:41Z

Azure Pipelines successfully started running 2 pipeline(s).

Wraith2 · 2025-09-09T12:52:48Z

@Wraith2 This looks awesome - looks like you also fixed the issue with increased allocations..

That was fixed in the original fix so it's already merged to main. 4b891d8

Since i've just reviewed the other PR i'll cc a couple of people who were interested in it and may want to test this if it generates artifacts, @MichelZ @rhuijben

edwardneal · 2025-09-09T17:52:43Z

Excellent! I've built the PR locally and kicked off my stress tests, including the XML case. These will likely take a few days to fully run, but I'll post the results when available.

Wraith2 · 2025-09-09T21:00:49Z

I've built the PR locally and kicked off my stress tests, including the XML case

The performance on the xml is shockingly bad. When I realised just how slow it was going on the current main branch I thought I must have either written the test to loop 1000 times or somehow broken SqlCachedReader.

ErikEJ · 2025-09-10T05:50:22Z

@backstromjoel @valentiniliescu @nilzzzzzz @CSharpFiasco @luca-domenichini @warappa @stevendarby @wjrogers @AnderssonPeter @p10tyr @Eli-Black-Work @BradBarnich @igbenic

You have all expressed an interest in this performance issue - please test this PR build!

gnegno84 · 2025-09-10T09:52:55Z

Builds on the stable base provided by #3534 and adds the optional async-continue capability back in. It is optional, it is not on by default in this build. To use it you must set a switch: AppContext.SetSwitch("Switch.Microsoft.Data.SqlClient.UseCompatibilityAsyncBehaviour", false); this will use the new multiplexer and async-continue support already defaults to enabled.

On our side, we tried to repro one issue that was happening with 6.1.0 (randomly receiving null values for a json serialized string field).
I've tested with the artifact in this PR and I can confirm that the issue IS NOT happening anymore with both approaches:

UseCompatibilityAsyncBehaviour set to true (expected)
UseCompatibilityAsyncBehaviour set to false

From a performance perspective, I cannot add valuable info as our test was quite simple and not benchmarked.
Thanks for the updates.

ErikEJ · 2025-09-10T11:13:28Z

@Wraith2 The 6.11.0-pull.125205 artifcat verified OK against my Sweden repro VM.

codecov · 2025-09-10T13:00:39Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 0.00%. Comparing base (cd3dbd1) to head (df52b34).
⚠️ Report is 24 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #3605       +/-   ##
==========================================
- Coverage   65.48%       0   -65.49%     
==========================================
  Files         275       0      -275     
  Lines       61518       0    -61518     
==========================================
- Hits        40288       0    -40288     
+ Misses      21230       0    -21230

Flag	Coverage Δ
addons	`?`
netcore	`?`
netfx	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ErikEJ · 2025-09-10T16:04:14Z

@Wraith2 How do I set an appcontext swith for an entire xUnit test assembly?

Wraith2 · 2025-09-10T17:05:36Z

Probably in the ctor but because of the caching in the library once you've set it and the value has been read you can't change the value. To do that you'll need to use private reflection as we do in the tests here.

paulmedynski

Just one comment!

src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/TdsParserStateObject.cs

paulmedynski · 2025-09-13T11:38:35Z

/azp run

azure-pipelines · 2025-09-13T11:38:54Z

Azure Pipelines successfully started running 2 pipeline(s).

ErikEJ · 2025-09-14T11:35:12Z

@Wraith2 The 6.11.0-pull.125205 artifact verified OK against the EF Core tests

ErikEJ · 2025-09-14T11:36:06Z

@Wraith2 FYI, I added this class to the test project:

using Xunit.Sdk;
[assembly: TestFramework("Microsoft.EntityFrameworkCore.AssemblyFixture", "Microsoft.EntityFrameworkCore.SqlServer.FunctionalTests")]

namespace Microsoft.EntityFrameworkCore;

public sealed class AssemblyFixture : XunitTestFramework
{
    public AssemblyFixture(IMessageSink messageSink)
        :base(messageSink)
    {
        AppContext.SetSwitch("Switch.Microsoft.Data.SqlClient.UseCompatibilityAsyncBehaviour", false);
    }
}

edwardneal · 2025-09-15T05:38:19Z

The stress-testing passed as expected, but it's worth noting that we currently need two AppContext switches to enable this:

AppContext.SetSwitch("Switch.Microsoft.Data.SqlClient.UseCompatibilityAsyncBehaviour", false);
AppContext.SetSwitch("Switch.Microsoft.Data.SqlClient.UseCompatibilityProcessSni", false);

If the latter switch is not explicitly set, we see increased memory usage but not the performance improvements.

Async and sync are very close to parity in normal circumstances, remote SQL Server instances saw nearly identical results in my benchmarks. I noticed that in situations where the SQL Server is able to keep up with the supply of PLP data, the speed improvements here mean that SqlClient is able to call code paths in TryReadByteArray which copy packet buffers around when building the snapshot. This is the current speed bottleneck as far as I can tell.

While nothing's needed for this PR, it looks like SQL Server will try to adjust the size of the PLP data blocks it sends to align with the number of bytes left in each TDS packet. With that in mind, we could avoid copying memory; if a read will take us precisely to the end of the current TDS packet, we could construct a correctly-sized ReadOnlyMemory<byte> over the TDS packet buffer, pass that up to the snapshot directly and allocate a new buffer for the next packet.

I also saw a minor (~5%) performance bump in XML data when I changed the SqlCachedBuffer.MaxChunkSize constant to 8192 (a power of two which is greater than the default packet size.)

ErikEJ · 2025-09-15T05:43:14Z

@edwardneal Thanks for the info about the switches, I was not aware of that.

Let me try to add both to the EF Core test suite

Wraith2 · 2025-09-15T07:42:16Z

The "Switch.Microsoft.Data.SqlClient.UseCompatibilityAsyncBehaviour" switch code is currently:

        public static bool UseCompatibilityAsyncBehaviour
        {
            get
            {
                if (UseCompatibilityProcessSni)
                {
                    // If ProcessSni compatibility mode has been enabled then the packet
                    // multiplexer has been disabled. The new async behaviour using continue
                    // point capture is only stable if the multiplexer is enabled so we must
                    // return true to enable compatibility async behaviour using only restarts.
                    return true;
                }

                if (s_useCompatibilityAsyncBehaviour == Tristate.NotInitialized)
                {
                    if (!AppContext.TryGetSwitch(UseCompatibilityAsyncBehaviourString, out bool returnedValue) || returnedValue)
                    {
                        s_useCompatibilityAsyncBehaviour = Tristate.True;
                    }
                    else
                    {
                        s_useCompatibilityAsyncBehaviour = Tristate.False;
                    }
                }
                return s_useCompatibilityAsyncBehaviour == Tristate.True;
            }
        }

which unless I can't mentally process boolean logic (it's happened) means that it defaults to true and you have to turn it off. However because "Switch.Microsoft.Data.SqlClient.UseCompatibilityProcessSni" is currently false the short circuit is active.
So as I stated in the first post If you turn OFF compatibility sni you should automatically get the new continue-async behaviour.

If you're settings both flags explicitly you'll have to manage your combinations and that's likely what's happening in our tests but for casual users it should only require one settings.

src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/TdsParser.cs

src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/TdsParserStateObject.cs

Wraith2 · 2025-09-22T21:51:42Z

Is this waiting for anything other than a second review @dotnet/sqlclientdevteam ?

mdaigle · 2025-09-22T23:08:04Z

@Wraith2 I've been out for a couple weeks. I'll start taking a look tomorrow!

add continue request usage

692fc52

Wraith2 requested a review from a team as a code owner September 9, 2025 11:14

Wraith2 changed the title ~~add continue request usage~~ Add continue request usage Sep 9, 2025

paulmedynski requested changes Sep 12, 2025

View reviewed changes

src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/TdsParserStateObject.cs Outdated Show resolved Hide resolved

paulmedynski self-assigned this Sep 12, 2025

unify TryReadByteArrayWithContinue overloads

df52b34

paulmedynski approved these changes Sep 13, 2025

View reviewed changes

ericsampson reviewed Sep 22, 2025

View reviewed changes

src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/TdsParser.cs Show resolved Hide resolved

src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/TdsParserStateObject.cs Show resolved Hide resolved

mdaigle self-assigned this Sep 22, 2025

stefanodonofrio mentioned this pull request Mar 21, 2026

Bump the remaining-deps group with 9 updates stefanodonofrio/SchoolManager#1

Open

Conversation

Wraith2 commented Sep 9, 2025

Benchmark results:

BinaryRead

StringRead

XmlRead

Uh oh!

ErikEJ commented Sep 9, 2025

Uh oh!

paulmedynski commented Sep 9, 2025

Uh oh!

azure-pipelines bot commented Sep 9, 2025

Uh oh!

Wraith2 commented Sep 9, 2025

Uh oh!

edwardneal commented Sep 9, 2025

Uh oh!

Wraith2 commented Sep 9, 2025

Uh oh!

ErikEJ commented Sep 10, 2025

Uh oh!

gnegno84 commented Sep 10, 2025

Uh oh!

ErikEJ commented Sep 10, 2025

Uh oh!

codecov bot commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ErikEJ commented Sep 10, 2025

Uh oh!

Wraith2 commented Sep 10, 2025

Uh oh!

paulmedynski left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

paulmedynski commented Sep 13, 2025

Uh oh!

azure-pipelines bot commented Sep 13, 2025

Uh oh!

ErikEJ commented Sep 14, 2025

Uh oh!

ErikEJ commented Sep 14, 2025

Uh oh!

edwardneal commented Sep 15, 2025

Uh oh!

ErikEJ commented Sep 15, 2025

Uh oh!

Wraith2 commented Sep 15, 2025

Uh oh!

Uh oh!

Uh oh!

Wraith2 commented Sep 22, 2025

Uh oh!

mdaigle commented Sep 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

codecov bot commented Sep 10, 2025 •

edited

Loading