Improve Encoding.UTF8.GetMaxByte/CharCount perf#69910
Improve Encoding.UTF8.GetMaxByte/CharCount perf#69910GrabYourPitchforks merged 4 commits intodotnet:mainfrom
Conversation
Also fixes some potential integer overflows in callers
|
Tagging subscribers to this area: @dotnet/area-system-text-encoding Issue DetailsMostly a very small perf improvement to Also addresses:
[Benchmark]
[Arguments("Hello!")]
public int GetTranscodedLenth(string input)
{
const int MaxStackLength = 64;
byte[] rentedArray = null;
int maxByteCount = Encoding.UTF8.GetMaxByteCount(input.Length);
Span<byte> scratch = ((uint)maxByteCount <= MaxStackLength)
? stackalloc byte[MaxStackLength]
: (rentedArray = ArrayPool<byte>.Shared.Rent(maxByteCount));
try
{
return Encoding.UTF8.GetBytes(input, scratch);
}
finally
{
if (rentedArray != null)
{
ArrayPool<byte>.Shared.Return(rentedArray);
}
}
}
|
src/libraries/System.Private.Xml/src/System/Xml/Xsl/XsltOld/SequentialOutput.cs
Outdated
Show resolved
Hide resolved
| } | ||
| } | ||
|
|
||
| return byteCount + 1; |
There was a problem hiding this comment.
Some callers expect GetMaxCharCount(byteCount) to be the maximum number of characters that might be converted from any call to Encoding.GetDecoder().GetChars(buffer_of_byteCount_length, some_output_buffer). StreamReader makes this assumption, for instance.
DecoderNLS can hold partial state between calls to GetChars if a non-ASCII byte is seen. There are two possible outcomes here:
-
The internal state is never completed and represents the maximum invalid subsequence of a UTF-8 buffer. The Encoding instance will replace the entire captured state with a single
'\uFFFD'character before processing the rest of the input buffer. -
The internal state is 3 bytes of a 4-byte sequence, and the first byte of the incoming buffer would complete the sequence. This means the output would contain 2 characters: the high & low surrogates.
In both scenarios, the worst-case expansion is that the internally captured state results in +1 additional character needed in the output.
There was a problem hiding this comment.
Do we need more comment here explaining that?
src/libraries/System.Private.CoreLib/src/System/Text/UTF8Encoding.Sealed.cs
Show resolved
Hide resolved
| [InlineData(int.MaxValue)] | ||
| public void GetMaxByteCount_NegativeTests(int charCount) | ||
| { | ||
| Assert.Throws<ArgumentOutOfRangeException>(nameof(charCount), () => Encoding.UTF8.GetMaxByteCount(charCount)); |
There was a problem hiding this comment.
As for any Assert.Throws for an ArgumentException, you may choose to prefer the AssertExtensions version that validates the param name as well.
|
I updated the code comments in |
|
Build & test failures are known, per runfo. |
Mostly a very small perf improvement to
GetMaxByteCountandGetMaxCharCount. It's now small enough to inline into callers when using theEncoding.UTF8accelerator.Also addresses: