Fix OOM in BinHexDecoder, Base64Decoder, and XmlSchemaValidator when throwing on large invalid input#125930
Fix OOM in BinHexDecoder, Base64Decoder, and XmlSchemaValidator when throwing on large invalid input#125930
Conversation
|
Tagging subscribers to this area: @dotnet/area-system-xml |
…essages - In the private Decode method: pass just the invalid character `ch` instead of the entire `chars` span when throwing XmlException for an invalid BinHex char. This is also more semantically correct (the message says 'char X is not valid'). - In the public static Decode method: truncate the chars string to 40 chars when building the OddCount error message to avoid creating huge strings on large inputs. Co-authored-by: danmoseley <6385855+danmoseley@users.noreply.github.com> Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/07551353-4ccc-440e-869f-1c38e684be2b
Co-authored-by: danmoseley <6385855+danmoseley@users.noreply.github.com> Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/07551353-4ccc-440e-869f-1c38e684be2b
…ation
- Update resource strings to say 'character' instead of 'text sequence'
- Drop {0} from Xml_InvalidBinHexValueOddCount (pass null instead of full input)
- Truncate stringValue to 40 chars in XmlSchemaValidator error messages
- Add error message size assertions to existing BinHex/Base64 tests
- Add ValidationErrorMessageTests for schema validation paths
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Note This comment was generated with Copilot assistance. I pushed an additional commit that audits and fixes all the places where these decoders' unbounded error messages can surface. The original two commits fix Showing the offending character rather than the full sequence is fine diagnostically: the caller already has the input, and the character identity (or "odd count") is what they need to locate the problem. The full sequence in an error message was never useful — just a copy of data the caller already holds. Product changes:
Tests:
All tests verified: fail before fix, pass after. |
There was a problem hiding this comment.
Pull request overview
Fixes sporadic OutOfMemoryException failures on 32-bit platforms by preventing very large allocations when formatting XmlException messages for invalid BinHex/Base64 input, and adds tests to ensure validation/reader error messages remain bounded.
Changes:
- Update
BinHexDecoder/Base64Decoderto report only the offending character (instead ofchars.ToString()on the full decode span) when throwingXmlException. - Bound schema-validation error details by truncating the value included in datatype validation messages.
- Add/extend tests to validate that error messages stay below a reasonable size for large invalid inputs.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/libraries/System.Private.Xml/src/System/Xml/BinHexDecoder.cs | Avoids allocating the entire decode buffer when throwing for invalid BinHex; adjusts odd-count error path. |
| src/libraries/System.Private.Xml/src/System/Xml/Base64Decoder.cs | Avoids allocating the entire decode buffer when throwing for invalid Base64 (including post-padding non-whitespace). |
| src/libraries/System.Private.Xml/src/System/Xml/Schema/XmlSchemaValidator.cs | Truncates large values included in schema validation event messages. |
| src/libraries/System.Private.Xml/src/Resources/Strings.resx | Updates resource strings to reflect “invalid character” phrasing and removes value insertion for odd-count BinHex. |
| src/libraries/System.Private.Xml/tests/XmlReaderLib/ReadBase64.cs | Adds a regression test ensuring Base64 invalid-char exceptions don’t include the entire input. |
| src/libraries/System.Private.Xml/tests/XmlReaderLib/TCReadElementContentAsBase64.cs | Registers new Base64 invalid-char bounded-message variations. |
| src/libraries/System.Private.Xml/tests/XmlReaderLib/ReadBinHex.cs | Tightens existing BinHex overflow test to fail if exception message is unexpectedly large. |
| src/libraries/System.Private.Xml/tests/XmlSchema/XmlSchemaValidatorApi/ValidationErrorMessageTests.cs | Adds schema-validation tests asserting error messages remain bounded for large invalid hexBinary values. |
| src/libraries/System.Private.Xml/tests/System.Private.Xml.Tests.csproj | Includes the new schema validation test file in the test project. |
src/libraries/System.Private.Xml/src/System/Xml/Schema/XmlSchemaValidator.cs
Outdated
Show resolved
Hide resolved
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
@tarekgh I saw you touched some Xml*.cs recently :) |
|
CC @krwq who I think still owning it. I'll take a look too anyway. |
|
Separately we should look at gutting Base64Decoder and replacing its implementation with use of Base64, rather than having a separate worse implementation. |
src/libraries/System.Private.Xml/src/System/Xml/Schema/XmlSchemaValidator.cs
Show resolved
Hide resolved
src/libraries/System.Private.Xml/tests/XmlReaderLib/ReadBase64.cs
Outdated
Show resolved
Hide resolved
tarekgh
left a comment
There was a problem hiding this comment.
left minor suggestion. LGTM otherwise.
|
@copilot address feedback for limit let's use only 100. |
Per review feedback: the decoder exception messages (which contain only the invalid character) are short, so 100 is a tighter and more appropriate bound. Schema validator tests keep 500 since those messages include structural overhead (element/type names, inner exception text). Co-authored-by: danmoseley <6385855+danmoseley@users.noreply.github.com> Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/8b521214-f14a-4ccc-8f64-29b49c990a43
Head branch was pushed to by a user without write access
Done in adfcd66 — decoder test limits changed from 1000 to 100. Kept three separate dots for ellipsis. The schema validator tests keep 500 since those messages include structural overhead beyond just the value. Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
CharCheckingReaderTest.RunTests("ReadBinHex runs into an Overflow") fails sporadically withOutOfMemoryExceptionon 32-bit platforms (x86, WASM). When encountering an invalid character,BinHexDecoderandBase64Decoderboth calledchars.ToString()on the entire decode buffer — potentially millions of characters — to populate theXmlExceptionmessage. The string allocation itself OOMs before the actual exception reaches the caller. Furthermore,XmlSchemaValidatorindependently re-embeds the full element/attribute value into its own error messages, so even fixing the inner decoder exceptions alone would not bound the observable message size for callers going through schema validation.Changes
Decoders (
BinHexDecoder.cs,Base64Decoder.cs):BinHexDecoder— invalid-character path (privateDecode):chars.ToString()→ch.ToString(). The error message'{0}' is not a valid BinHex character.now names the single bad character, not the whole buffer. Resource string updated from "text sequence" to "character" accordingly.BinHexDecoder— odd-count path (public staticDecode): Passnullinstead of the full input span. The resource stringXml_InvalidBinHexValueOddCountwas updated to drop{0}— for an odd-count error the problem is the length, not the content, so embedding the value is not useful.Base64Decoder— two throw sites:chars.ToString()→ch.ToString()for both the invalid-character path and the non-whitespace-after-padding path.Schema validator (
XmlSchemaValidator.cs):TruncateValueForErrorMessageprivate helper (max 40 chars +"...") and applied it at all three call sites that embedstringValueintoSch_ElementValueDataTypeDetailed/Sch_AttributeValueDataTypeDetailederror messages.Before / after for the primary case:
Tests:
TestReadBinHex_105376to assertex.Message.Length ≤ 100.TestReadBase64_InvalidChar_ErrorMessageBounded(same pattern, limit 100).ValidationErrorMessageTestsfor element and attribute schema validation paths (limit 500, since those messages include structural overhead: element/type names and inner exception text).Original prompt
This section details on the original issue you should resolve
<issue_title>System.Xml.CharCheckingReaderTests fails with OutOfMemoryException</issue_title>
<issue_description>## Build Information
Build: https://dev.azure.com/dnceng-public/cbb18261-c48f-4abb-8651-8cdcb5474649/_build/results?buildId=1149008
Build error leg or test failing: System.Xml.CharCheckingReaderTests.CharCheckingReaderTest.RunTests
Pull request: #119599
Error Message
Fill the error message using step by step known issues guidance.
{ "ErrorMessage": ["System.Xml.CharCheckingReaderTests","OutOfMemoryException"], "ErrorPattern": "", "BuildRetry": false, "ExcludeConsoleLog": false }Known issue validation
Build: 🔎 https://dev.azure.com/dnceng-public/public/_build/results?buildId=1149008
Error message validated:
[System.Xml.CharCheckingReaderTests OutOfMemoryException]Result validation: ✅ Known issue matched with the provided build.
Validation performed at: 9/16/2025 6:43:18 AM UTC
Report
Summary
Comments on the Issue (you are @copilot in this section)
@jkotas ``` Starting: System.Private.Xml.Tests (parallel test collections = on [4 threads], stop on fail = off) System.Xml.CharCheckingReaderTests.CharCheckingReaderTest.RunTests(testCase: ReadBinHex runs into an Overflow) [FAIL] System.OutOfMemoryException : Exception of type 'System.OutOfMemoryException' was thrown. Stack Trace: /_/src/libraries/System.Private.CoreLib/src/System/String.cs(316,0): at System.String.Ctor(ReadOnlySpan`1 value) /_/src/libraries/System.Private.CoreLib/src/System/Span.cs(382,0): at System.Span`1.ToString() /_/src/libraries/System.Private.CoreLib/src/System/String.Manipulation.cs(544,0): at System.String.FormatHelper(IFormatProvider provider, String format, ReadOnlySpan`1 args) /_/src/libraries/System.Private.CoreLib/src/System/String.Manipulation.cs(492,0): at System.String.Format(String format, ReadOnlySpan`1 args) /_/src/libraries/System.Private.Xml/src/System/Xml/XmlException.cs(202,0): at System.Xml.XmlException.CreateMessage(String res, String[] args, Int32 lineNumber, Int32 linePosition) /_/src/libraries/System.Private.Xml/src/System/Xml/XmlException.cs(167,0): at System.Xml.XmlException..ctor(String res, String[] args, Exception innerException, Int32 lineNumber, Int32 linePosition, String sourceUri) /_/src/libraries/System.Private.Xml/src/System/Xml/XmlException.cs(122,0): at System.Xml.XmlException..ctor(String res, String arg) /_/src/libraries/System.Private.Xml/src/System/Xml/BinHexDecoder.cs(165,0): at System.Xml.BinHexDecoder.Decode(ReadOnlySpan`1 chars, Span`1 bytes, Boolean& hasHalfByteCached, Byte& cachedHalfByte, Int32& charsDecoded, Int32& bytesDecoded) /_/src/libraries/System.Private.Xml/src/System/Xml/BinHexDecoder.cs(74,0): at System.Xml.BinHexDecoder.Decode(String str, Int32 startPos, Int32 len) /_/src/libraries/System.Private.Xml/src/System/Xml/Core/ReadContentAsBinaryHelper.cs(367,0): at System.Xml.ReadContentAsBinaryHelper.ReadContentAsBinary(Byte[] buffer, Int32 index, In...🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.