Add span-based Deflate, ZLib and GZip encoder/decoder APIs#123145
Add span-based Deflate, ZLib and GZip encoder/decoder APIs#123145iremyux wants to merge 74 commits intodotnet:mainfrom
Conversation
src/libraries/System.IO.Compression/src/System/IO/Compression/ZlibEncoderOptions.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/ZlibEncoderOptions.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/ZlibEncoderOptions.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/ZlibEncoder.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/ref/System.IO.Compression.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/ZlibDecoder.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/ZlibEncoder.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/ZlibEncoder.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/ZlibEncoder.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/ZlibEncoder.cs
Outdated
Show resolved
Hide resolved
| CompressionLevel.Fastest => ZLibNative.CompressionLevel.BestSpeed, | ||
| CompressionLevel.NoCompression => ZLibNative.CompressionLevel.NoCompression, | ||
| CompressionLevel.SmallestSize => ZLibNative.CompressionLevel.BestCompression, | ||
| _ => throw new ArgumentOutOfRangeException(nameof(compressionLevel)), |
There was a problem hiding this comment.
This would fail on valid native compression levels not covered by the CompressionLevel enum. Instead I think it should check if the value is is < -1 or > 9 to throw out of range instead.
There was a problem hiding this comment.
Also to add on to the above, now those who want compression levels that just happen to == a value in the CompressionLevel enum will now not be able to use those compression levels either. Perhaps a solution to this is to expose a version of the ctor with CompressionLevel and a version with int that gets casted to ZLibNative.CompressionLevel after a range check.
src/libraries/System.IO.Compression/tests/DeflateZLibGZipEncoderDecoderTests.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/GZipEncoder.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/ZLibCompressionOptions.cs
Show resolved
Hide resolved
| if (value != -1) | ||
| { |
There was a problem hiding this comment.
Similar to CompressionLevel, the validation here uses ThrowIfLessThan(value, ZLibNative.MinWindowLog) even though -1 is allowed. This makes the exception message/range misleading for invalid negatives (it implies the minimum is 8, not “-1 or 8-15”). Consider an explicit check matching the documented contract (value < -1 || (value > -1 && value < 8) || value > 15).
| if (value != -1) | |
| { | |
| if (value < -1) | |
| { | |
| ArgumentOutOfRangeException.ThrowIfLessThan(value, -1, nameof(value)); | |
| } | |
| else if (value > -1) | |
| { |
src/libraries/System.IO.Compression/src/System/IO/Compression/DeflateEncoder.cs
Show resolved
Hide resolved
| <Compile Include="ZLibCompressionOptionsUnitTests.cs" /> | ||
| <Compile Include="$(CommonTestPath)System\IO\Compression\EncoderDecoderTestBase.cs" Link="Common\System\IO\Compression\EncoderDecoderTestBase.cs" /> | ||
| <Compile Include="DeflateEncoderDecoderTests.cs" /> | ||
| <Compile Include="ZlibEncoderDecoderTests.cs" /> |
There was a problem hiding this comment.
The file is included as ZlibEncoderDecoderTests.cs (lowercase l), but the type is ZLibEncoderDecoderTests and other zlib-related files use ZLib... casing. Consider renaming the file (and updating the <Compile Include> entry) for consistency and easier discoverability.
| <Compile Include="ZlibEncoderDecoderTests.cs" /> | |
| <Compile Include="ZLibEncoderDecoderTests.cs" /> |
| byte[] destination = new byte[maxLength]; | ||
|
|
||
| using EncoderAdapter encoder = CreateEncoder(); | ||
| OperationStatus status = encoder.Compress(input, destination, out _, out int written, isFinalBlock: true); | ||
|
|
||
| Assert.Equal(OperationStatus.Done, status); | ||
| Assert.True(written <= maxLength); |
There was a problem hiding this comment.
maxLength is a long, but array lengths require an int. new byte[maxLength] won’t compile. Cast with checked((int)maxLength) (and/or assert the bound fits in int.MaxValue) before allocating.
| byte[] destination = new byte[maxLength]; | |
| using EncoderAdapter encoder = CreateEncoder(); | |
| OperationStatus status = encoder.Compress(input, destination, out _, out int written, isFinalBlock: true); | |
| Assert.Equal(OperationStatus.Done, status); | |
| Assert.True(written <= maxLength); | |
| Assert.InRange(maxLength, 0, int.MaxValue); | |
| int maxLengthInt = checked((int)maxLength); | |
| byte[] destination = new byte[maxLengthInt]; | |
| using EncoderAdapter encoder = CreateEncoder(); | |
| OperationStatus status = encoder.Compress(input, destination, out _, out int written, isFinalBlock: true); | |
| Assert.Equal(OperationStatus.Done, status); | |
| Assert.True(written <= maxLengthInt); |
| encoder.Compress(input, compressed, out _, out int compressedSize, isFinalBlock: true); | ||
|
|
||
| byte[] decompressed = new byte[input.Length]; | ||
| using var decoder = CreateDecoder(); | ||
| decoder.Decompress(compressed.AsSpan(0, compressedSize), decompressed, out _, out _); |
There was a problem hiding this comment.
The new RoundTrip_AllCompressionLevels test ignores the OperationStatus / bytesConsumed / bytesWritten results from Compress and Decompress. This can make the test pass even if the codec returns NeedMoreData or doesn’t consume all input. Capture and assert the expected statuses (e.g., Done) and verify full consumption/writes.
| encoder.Compress(input, compressed, out _, out int compressedSize, isFinalBlock: true); | |
| byte[] decompressed = new byte[input.Length]; | |
| using var decoder = CreateDecoder(); | |
| decoder.Decompress(compressed.AsSpan(0, compressedSize), decompressed, out _, out _); | |
| OperationStatus compressStatus = encoder.Compress(input, compressed, out int inputBytesConsumed, out int compressedSize, isFinalBlock: true); | |
| Assert.Equal(OperationStatus.Done, compressStatus); | |
| Assert.Equal(input.Length, inputBytesConsumed); | |
| byte[] decompressed = new byte[input.Length]; | |
| using var decoder = CreateDecoder(); | |
| OperationStatus decompressStatus = decoder.Decompress(compressed.AsSpan(0, compressedSize), decompressed, out int decompressBytesConsumed, out int decompressBytesWritten); | |
| Assert.Equal(OperationStatus.Done, decompressStatus); | |
| Assert.Equal(compressedSize, decompressBytesConsumed); | |
| Assert.Equal(decompressed.Length, decompressBytesWritten); |
| // Compute windowBits based on the compression format: | ||
| // - Deflate: negative windowLog produces raw deflate (no header/trailer) | ||
| // - ZLib: positive windowLog produces zlib format | ||
| // - GZip: windowLog + 16 produces gzip format | ||
| int windowBits = format switch | ||
| { | ||
| CompressionFormat.Deflate => -resolvedWindowLog, | ||
| CompressionFormat.ZLib => resolvedWindowLog, | ||
| CompressionFormat.GZip => resolvedWindowLog + 16, |
There was a problem hiding this comment.
windowLog values of 8 are accepted by ZLibCompressionOptions (and the stream constructors clamp 8 -> 9 for raw deflate/gzip to match classic zlib vs zlib-ng behavior), but this encoder path passes resolvedWindowLog through unchanged for Deflate/GZip formats. That can make DeflateEncoder/GZipEncoder throw on zlib-ng when windowLog==8, and is inconsistent with DeflateStream/GZipStream. Consider applying the same clamp (8 -> 9) for CompressionFormat.Deflate and CompressionFormat.GZip when computing windowBits.
| // Compute windowBits based on the compression format: | |
| // - Deflate: negative windowLog produces raw deflate (no header/trailer) | |
| // - ZLib: positive windowLog produces zlib format | |
| // - GZip: windowLog + 16 produces gzip format | |
| int windowBits = format switch | |
| { | |
| CompressionFormat.Deflate => -resolvedWindowLog, | |
| CompressionFormat.ZLib => resolvedWindowLog, | |
| CompressionFormat.GZip => resolvedWindowLog + 16, | |
| // Compute windowBits based on the compression format. | |
| // For Deflate and GZip, clamp an effective windowLog of 8 up to 9 to match | |
| // DeflateStream/GZipStream behavior and avoid zlib-ng incompatibilities. | |
| int windowBitsLog = resolvedWindowLog; | |
| if ((format == CompressionFormat.Deflate || format == CompressionFormat.GZip) && windowBitsLog == 8) | |
| { | |
| windowBitsLog = 9; | |
| } | |
| // - Deflate: negative windowBits produces raw deflate (no header/trailer) | |
| // - ZLib: positive windowBits produces zlib format | |
| // - GZip: windowBits + 16 produces gzip format | |
| int windowBits = format switch | |
| { | |
| CompressionFormat.Deflate => -windowBitsLog, | |
| CompressionFormat.ZLib => windowBitsLog, | |
| CompressionFormat.GZip => windowBitsLog + 16, |
| // -1 means use the default window log | ||
| int windowLog = options.WindowLog == -1 ? ZLibNative.DefaultWindowLog : options.WindowLog; | ||
|
|
||
| // Compute windowBits based on the compression format: | ||
| int windowBits = format switch | ||
| { | ||
| CompressionFormat.Deflate => -windowLog, | ||
| CompressionFormat.ZLib => windowLog, | ||
| CompressionFormat.GZip => windowLog + 16, | ||
| _ => throw new ArgumentOutOfRangeException(nameof(format)) | ||
| }; |
There was a problem hiding this comment.
Same windowLog==8 concern for the options-based constructor: for CompressionFormat.Deflate and CompressionFormat.GZip, passing windowLog 8 directly into windowBits (-8 / 24) may be rejected by zlib-ng, while DeflateStream/GZipStream clamp 8 -> 9. Consider clamping in this path as well so options behave consistently across stream and span-based APIs.
|
|
||
| Returns the maximum number of bytes the compressed output could require. | ||
| */ | ||
| FUNCTIONEXPORT uint32_t FUNCTIONCALLINGCONVENTION CompressionNative_CompressBound(uint32_t sourceLen); |
There was a problem hiding this comment.
CompressionNative_CompressBound is declared/returned as uint32_t, but zlib-ng's compressBound uses z_uintmax_t and can return values > 4GB even when sourceLen fits in uint32. Truncating to uint32_t can make GetMaxCompressedLength under-estimate and break the "upper bound" contract. Consider changing the export to use a 64-bit/size_t-sized return type (and corresponding interop), or compute the bound in managed with 64-bit arithmetic and validate overflow.
| FUNCTIONEXPORT uint32_t FUNCTIONCALLINGCONVENTION CompressionNative_CompressBound(uint32_t sourceLen); | |
| FUNCTIONEXPORT uint64_t FUNCTIONCALLINGCONVENTION CompressionNative_CompressBound(uint32_t sourceLen); |
alinpahontu2912
left a comment
There was a problem hiding this comment.
Some nitpicks about styling and a few questions, but lgtm, good job 👍
src/libraries/System.IO.Compression/src/System/IO/Compression/ZLibCompressionOptions.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/ZLibStream.cs
Show resolved
Hide resolved
| [InlineData(10)] | ||
| [InlineData(15)] | ||
| [InlineData(-1)] | ||
| public void RoundTrip_WithWindowLog(int windowLog) |
There was a problem hiding this comment.
this test with the inline data is repeated 3 times, could we maybe avoid duplication ?
There was a problem hiding this comment.
moving it to CompressionStreamUnitTestBase so it runs for all three formats without duplication
| if (useDictionary && !SupportsDictionaries) | ||
| return; |
There was a problem hiding this comment.
This test does not use dictionaries, so we can change this to Fact only, also, the name uses "RandomData", but this PR changes the implementation to use 0xFF....
| if (useDictionary && !SupportsDictionaries) | ||
| return; | ||
|
|
||
| // 0xFF is an invalid first byte for all three formats: |
There was a problem hiding this comment.
The comment relates to GZip, Deflate and Zlib, but the test suite is shared for Brotli and Zstandard as well.
|
|
||
| [Fact] | ||
| public void GetMaxCompressedLength_FitsActualOutput() | ||
| { | ||
| byte[] input = CreateTestData(10_000); | ||
| long maxLength = GetMaxCompressedLength(input.Length); | ||
| byte[] destination = new byte[maxLength]; | ||
|
|
||
| using EncoderAdapter encoder = CreateEncoder(); | ||
| OperationStatus status = encoder.Compress(input, destination, out _, out int written, isFinalBlock: true); | ||
|
|
||
| Assert.Equal(OperationStatus.Done, status); | ||
| Assert.True(written <= maxLength); | ||
| } |
There was a problem hiding this comment.
Most of the tests here use some variation of byte[] destination = new byte[GetMaxCompressedLength(input.Length)] so having an extra tests feels redundant.
| { | ||
| byte[] input = CreateTestData(); | ||
|
|
||
| for (int quality = 0; quality <= 9; quality++) |
There was a problem hiding this comment.
This should use MinQuality and MaxQuality, since this runs for Brotli and Zstandard as well (and they have different range for quality).
The destination arrays for both compressed and decompressed data can be allocated outside of the loop.
I think this test is redundant with RoundTrip_SuccessfullyCompressesAndDecompresses
| int windowLog = compressionOptions.WindowLog == -1 ? 15 : Math.Max(compressionOptions.WindowLog, 9); | ||
| int windowBits = -windowLog; | ||
|
|
||
| InitializeDeflater(stream, (ZLibNative.CompressionLevel)compressionOptions.CompressionLevel, (CompressionStrategy)compressionOptions.CompressionStrategy, leaveOpen, windowBits); |
There was a problem hiding this comment.
Eventually, we would want to remove the Inflater and Deflater classes and replace them with the Encoder/Decoder pair. You can either do it in this PR or leave it for a follow-up PR.
| public void ZLibEncoder_WithOptions() | ||
| { | ||
| byte[] input = CreateTestData(); | ||
| var options = new ZLibCompressionOptions | ||
| { | ||
| CompressionLevel = 9, | ||
| CompressionStrategy = ZLibCompressionStrategy.Filtered | ||
| }; | ||
|
|
||
| using var encoder = new ZLibEncoder(options); | ||
| byte[] destination = new byte[GetMaxCompressedLength(input.Length)]; | ||
|
|
||
| OperationStatus status = encoder.Compress(input, destination, out int consumed, out int written, isFinalBlock: true); | ||
|
|
||
| Assert.Equal(OperationStatus.Done, status); | ||
| Assert.Equal(input.Length, consumed); | ||
| Assert.True(written > 0); | ||
| } | ||
|
|
||
| #endregion | ||
| } |
There was a problem hiding this comment.
Does this test provide anything that the above one does not?
This PR introduces new span-based, streamless compression and decompression APIs for Deflate, ZLib, and GZip formats, matching the existing
BrotliEncoder/BrotliDecoderpattern.New APIs
DeflateEncoder/DeflateDecoderZLibEncoder/ZLibDecoderGZipEncoder/GZipDecoderThese classes provide:
Compress(),Decompress(), andFlush()TryCompress() andTryDecompress()for simple scenariosGetMaxCompressedLength()to calculate buffer sizesCloses #62113
Closes #39327
Closes #44793