Skip to content

Add span-based Deflate, ZLib and GZip encoder/decoder APIs#123145

Open
iremyux wants to merge 74 commits intodotnet:mainfrom
iremyux:62113-zlib-encoder-decoder
Open

Add span-based Deflate, ZLib and GZip encoder/decoder APIs#123145
iremyux wants to merge 74 commits intodotnet:mainfrom
iremyux:62113-zlib-encoder-decoder

Conversation

@iremyux
Copy link
Contributor

@iremyux iremyux commented Jan 13, 2026

This PR introduces new span-based, streamless compression and decompression APIs for Deflate, ZLib, and GZip formats, matching the existing BrotliEncoder/BrotliDecoder pattern.

New APIs

  • DeflateEncoder / DeflateDecoder
  • ZLibEncoder / ZLibDecoder
  • GZipEncoder / GZipDecoder

These classes provide:

  • Instance-based API for streaming/chunked compression with Compress(), Decompress(), and Flush()
  • Static one-shot API via TryCompress() and TryDecompress() for simple scenarios
  • GetMaxCompressedLength() to calculate buffer sizes

Closes #62113
Closes #39327
Closes #44793

@iremyux iremyux changed the title [WIP] Add span-based ZlibEncoder and ZlibDecoder APIs [WIP] Add span-based Deflate, ZLib and GZip encoder/decoder APIs Jan 19, 2026
CompressionLevel.Fastest => ZLibNative.CompressionLevel.BestSpeed,
CompressionLevel.NoCompression => ZLibNative.CompressionLevel.NoCompression,
CompressionLevel.SmallestSize => ZLibNative.CompressionLevel.BestCompression,
_ => throw new ArgumentOutOfRangeException(nameof(compressionLevel)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would fail on valid native compression levels not covered by the CompressionLevel enum. Instead I think it should check if the value is is < -1 or > 9 to throw out of range instead.

Copy link
Member

@AraHaan AraHaan Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also to add on to the above, now those who want compression levels that just happen to == a value in the CompressionLevel enum will now not be able to use those compression levels either. Perhaps a solution to this is to expose a version of the ctor with CompressionLevel and a version with int that gets casted to ZLibNative.CompressionLevel after a range check.

Copilot AI review requested due to automatic review settings March 11, 2026 20:32
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 30 out of 30 changed files in this pull request and generated 6 comments.

Comment on lines +76 to +77
if (value != -1)
{
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to CompressionLevel, the validation here uses ThrowIfLessThan(value, ZLibNative.MinWindowLog) even though -1 is allowed. This makes the exception message/range misleading for invalid negatives (it implies the minimum is 8, not “-1 or 8-15”). Consider an explicit check matching the documented contract (value < -1 || (value > -1 && value < 8) || value > 15).

Suggested change
if (value != -1)
{
if (value < -1)
{
ArgumentOutOfRangeException.ThrowIfLessThan(value, -1, nameof(value));
}
else if (value > -1)
{

Copilot uses AI. Check for mistakes.
<Compile Include="ZLibCompressionOptionsUnitTests.cs" />
<Compile Include="$(CommonTestPath)System\IO\Compression\EncoderDecoderTestBase.cs" Link="Common\System\IO\Compression\EncoderDecoderTestBase.cs" />
<Compile Include="DeflateEncoderDecoderTests.cs" />
<Compile Include="ZlibEncoderDecoderTests.cs" />
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file is included as ZlibEncoderDecoderTests.cs (lowercase l), but the type is ZLibEncoderDecoderTests and other zlib-related files use ZLib... casing. Consider renaming the file (and updating the <Compile Include> entry) for consistency and easier discoverability.

Suggested change
<Compile Include="ZlibEncoderDecoderTests.cs" />
<Compile Include="ZLibEncoderDecoderTests.cs" />

Copilot uses AI. Check for mistakes.
Comment on lines +359 to +365
byte[] destination = new byte[maxLength];

using EncoderAdapter encoder = CreateEncoder();
OperationStatus status = encoder.Compress(input, destination, out _, out int written, isFinalBlock: true);

Assert.Equal(OperationStatus.Done, status);
Assert.True(written <= maxLength);
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maxLength is a long, but array lengths require an int. new byte[maxLength] won’t compile. Cast with checked((int)maxLength) (and/or assert the bound fits in int.MaxValue) before allocating.

Suggested change
byte[] destination = new byte[maxLength];
using EncoderAdapter encoder = CreateEncoder();
OperationStatus status = encoder.Compress(input, destination, out _, out int written, isFinalBlock: true);
Assert.Equal(OperationStatus.Done, status);
Assert.True(written <= maxLength);
Assert.InRange(maxLength, 0, int.MaxValue);
int maxLengthInt = checked((int)maxLength);
byte[] destination = new byte[maxLengthInt];
using EncoderAdapter encoder = CreateEncoder();
OperationStatus status = encoder.Compress(input, destination, out _, out int written, isFinalBlock: true);
Assert.Equal(OperationStatus.Done, status);
Assert.True(written <= maxLengthInt);

Copilot uses AI. Check for mistakes.
Comment on lines +691 to +695
encoder.Compress(input, compressed, out _, out int compressedSize, isFinalBlock: true);

byte[] decompressed = new byte[input.Length];
using var decoder = CreateDecoder();
decoder.Decompress(compressed.AsSpan(0, compressedSize), decompressed, out _, out _);
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new RoundTrip_AllCompressionLevels test ignores the OperationStatus / bytesConsumed / bytesWritten results from Compress and Decompress. This can make the test pass even if the codec returns NeedMoreData or doesn’t consume all input. Capture and assert the expected statuses (e.g., Done) and verify full consumption/writes.

Suggested change
encoder.Compress(input, compressed, out _, out int compressedSize, isFinalBlock: true);
byte[] decompressed = new byte[input.Length];
using var decoder = CreateDecoder();
decoder.Decompress(compressed.AsSpan(0, compressedSize), decompressed, out _, out _);
OperationStatus compressStatus = encoder.Compress(input, compressed, out int inputBytesConsumed, out int compressedSize, isFinalBlock: true);
Assert.Equal(OperationStatus.Done, compressStatus);
Assert.Equal(input.Length, inputBytesConsumed);
byte[] decompressed = new byte[input.Length];
using var decoder = CreateDecoder();
OperationStatus decompressStatus = decoder.Decompress(compressed.AsSpan(0, compressedSize), decompressed, out int decompressBytesConsumed, out int decompressBytesWritten);
Assert.Equal(OperationStatus.Done, decompressStatus);
Assert.Equal(compressedSize, decompressBytesConsumed);
Assert.Equal(decompressed.Length, decompressBytesWritten);

Copilot uses AI. Check for mistakes.
@iremyux iremyux changed the title [WIP] Add span-based Deflate, ZLib and GZip encoder/decoder APIs Add span-based Deflate, ZLib and GZip encoder/decoder APIs Mar 12, 2026
@iremyux iremyux marked this pull request as ready for review March 12, 2026 14:09
@iremyux iremyux requested review from a team and Copilot March 12, 2026 14:09
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 30 out of 30 changed files in this pull request and generated 3 comments.

Comment on lines +73 to +81
// Compute windowBits based on the compression format:
// - Deflate: negative windowLog produces raw deflate (no header/trailer)
// - ZLib: positive windowLog produces zlib format
// - GZip: windowLog + 16 produces gzip format
int windowBits = format switch
{
CompressionFormat.Deflate => -resolvedWindowLog,
CompressionFormat.ZLib => resolvedWindowLog,
CompressionFormat.GZip => resolvedWindowLog + 16,
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

windowLog values of 8 are accepted by ZLibCompressionOptions (and the stream constructors clamp 8 -> 9 for raw deflate/gzip to match classic zlib vs zlib-ng behavior), but this encoder path passes resolvedWindowLog through unchanged for Deflate/GZip formats. That can make DeflateEncoder/GZipEncoder throw on zlib-ng when windowLog==8, and is inconsistent with DeflateStream/GZipStream. Consider applying the same clamp (8 -> 9) for CompressionFormat.Deflate and CompressionFormat.GZip when computing windowBits.

Suggested change
// Compute windowBits based on the compression format:
// - Deflate: negative windowLog produces raw deflate (no header/trailer)
// - ZLib: positive windowLog produces zlib format
// - GZip: windowLog + 16 produces gzip format
int windowBits = format switch
{
CompressionFormat.Deflate => -resolvedWindowLog,
CompressionFormat.ZLib => resolvedWindowLog,
CompressionFormat.GZip => resolvedWindowLog + 16,
// Compute windowBits based on the compression format.
// For Deflate and GZip, clamp an effective windowLog of 8 up to 9 to match
// DeflateStream/GZipStream behavior and avoid zlib-ng incompatibilities.
int windowBitsLog = resolvedWindowLog;
if ((format == CompressionFormat.Deflate || format == CompressionFormat.GZip) && windowBitsLog == 8)
{
windowBitsLog = 9;
}
// - Deflate: negative windowBits produces raw deflate (no header/trailer)
// - ZLib: positive windowBits produces zlib format
// - GZip: windowBits + 16 produces gzip format
int windowBits = format switch
{
CompressionFormat.Deflate => -windowBitsLog,
CompressionFormat.ZLib => windowBitsLog,
CompressionFormat.GZip => windowBitsLog + 16,

Copilot uses AI. Check for mistakes.
Comment on lines +99 to +109
// -1 means use the default window log
int windowLog = options.WindowLog == -1 ? ZLibNative.DefaultWindowLog : options.WindowLog;

// Compute windowBits based on the compression format:
int windowBits = format switch
{
CompressionFormat.Deflate => -windowLog,
CompressionFormat.ZLib => windowLog,
CompressionFormat.GZip => windowLog + 16,
_ => throw new ArgumentOutOfRangeException(nameof(format))
};
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same windowLog==8 concern for the options-based constructor: for CompressionFormat.Deflate and CompressionFormat.GZip, passing windowLog 8 directly into windowBits (-8 / 24) may be rejected by zlib-ng, while DeflateStream/GZipStream clamp 8 -> 9. Consider clamping in this path as well so options behave consistently across stream and span-based APIs.

Copilot uses AI. Check for mistakes.

Returns the maximum number of bytes the compressed output could require.
*/
FUNCTIONEXPORT uint32_t FUNCTIONCALLINGCONVENTION CompressionNative_CompressBound(uint32_t sourceLen);
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CompressionNative_CompressBound is declared/returned as uint32_t, but zlib-ng's compressBound uses z_uintmax_t and can return values > 4GB even when sourceLen fits in uint32. Truncating to uint32_t can make GetMaxCompressedLength under-estimate and break the "upper bound" contract. Consider changing the export to use a 64-bit/size_t-sized return type (and corresponding interop), or compute the bound in managed with 64-bit arithmetic and validate overflow.

Suggested change
FUNCTIONEXPORT uint32_t FUNCTIONCALLINGCONVENTION CompressionNative_CompressBound(uint32_t sourceLen);
FUNCTIONEXPORT uint64_t FUNCTIONCALLINGCONVENTION CompressionNative_CompressBound(uint32_t sourceLen);

Copilot uses AI. Check for mistakes.
Copy link
Member

@alinpahontu2912 alinpahontu2912 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nitpicks about styling and a few questions, but lgtm, good job 👍

[InlineData(10)]
[InlineData(15)]
[InlineData(-1)]
public void RoundTrip_WithWindowLog(int windowLog)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test with the inline data is repeated 3 times, could we maybe avoid duplication ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moving it to CompressionStreamUnitTestBase so it runs for all three formats without duplication

Comment on lines 269 to 270
if (useDictionary && !SupportsDictionaries)
return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test does not use dictionaries, so we can change this to Fact only, also, the name uses "RandomData", but this PR changes the implementation to use 0xFF....

if (useDictionary && !SupportsDictionaries)
return;

// 0xFF is an invalid first byte for all three formats:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment relates to GZip, Deflate and Zlib, but the test suite is shared for Brotli and Zstandard as well.

Comment on lines 353 to +366

[Fact]
public void GetMaxCompressedLength_FitsActualOutput()
{
byte[] input = CreateTestData(10_000);
long maxLength = GetMaxCompressedLength(input.Length);
byte[] destination = new byte[maxLength];

using EncoderAdapter encoder = CreateEncoder();
OperationStatus status = encoder.Compress(input, destination, out _, out int written, isFinalBlock: true);

Assert.Equal(OperationStatus.Done, status);
Assert.True(written <= maxLength);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the tests here use some variation of byte[] destination = new byte[GetMaxCompressedLength(input.Length)] so having an extra tests feels redundant.

{
byte[] input = CreateTestData();

for (int quality = 0; quality <= 9; quality++)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use MinQuality and MaxQuality, since this runs for Brotli and Zstandard as well (and they have different range for quality).

The destination arrays for both compressed and decompressed data can be allocated outside of the loop.

I think this test is redundant with RoundTrip_SuccessfullyCompressesAndDecompresses

int windowLog = compressionOptions.WindowLog == -1 ? 15 : Math.Max(compressionOptions.WindowLog, 9);
int windowBits = -windowLog;

InitializeDeflater(stream, (ZLibNative.CompressionLevel)compressionOptions.CompressionLevel, (CompressionStrategy)compressionOptions.CompressionStrategy, leaveOpen, windowBits);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually, we would want to remove the Inflater and Deflater classes and replace them with the Encoder/Decoder pair. You can either do it in this PR or leave it for a follow-up PR.

Comment on lines +130 to +150
public void ZLibEncoder_WithOptions()
{
byte[] input = CreateTestData();
var options = new ZLibCompressionOptions
{
CompressionLevel = 9,
CompressionStrategy = ZLibCompressionStrategy.Filtered
};

using var encoder = new ZLibEncoder(options);
byte[] destination = new byte[GetMaxCompressedLength(input.Length)];

OperationStatus status = encoder.Compress(input, destination, out int consumed, out int written, isFinalBlock: true);

Assert.Equal(OperationStatus.Done, status);
Assert.Equal(input.Length, consumed);
Assert.True(written > 0);
}

#endregion
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this test provide anything that the above one does not?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[API Proposal]: Add Deflate, ZLib and GZip encoder/decoder APIs Add static compression helper methods Span-based (non-stream) compression APIs

5 participants