Feature/zip archive forward read#126646
Conversation
Adds a new ForwardRead mode to ZipArchive that enables forward-only sequential reading of ZIP entries from non-seekable streams, using local file headers instead of the central directory. Changes: - ZipArchiveMode.cs: Add ForwardRead = 3 enum value - ZipCustomStreams.cs: Add BoundedReadOnlyStream and ReadAheadStream helper stream classes - ZipArchive.cs: Add GetNextEntry()/GetNextEntryAsync() methods, ForwardRead constructor case, ValidateMode/DecideArchiveStream support, data descriptor parsing, and property guards - ZipArchive.Async.cs: Add ForwardRead cases to CreateAsync and DisposeAsyncCore - ZipArchiveEntry.cs: Add forward-read constructor, ForwardReadDataStream property, UpdateFromDataDescriptor method, OpenInForwardReadMode, and property setter guards - Strings.resx: Add ForwardRead error message strings - ref/System.IO.Compression.cs: Add public API surface - Tests: Add comprehensive zip_ForwardReadTests covering deflate, stored, data descriptors, non-seekable streams, empty archives, partial reads, error cases, and async operations Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR introduces an experimental forward-only streaming read mode for System.IO.Compression.ZipArchive to support sequential iteration over ZIP entries via local file headers (instead of loading the central directory up-front), including async iteration and new unit tests.
Changes:
- Add
ZipArchiveMode.ForwardReadplus publicZipArchive.GetNextEntry()/GetNextEntryAsync(...)APIs for forward-only entry iteration. - Implement forward-read parsing and stream wrappers (
ReadAheadStream,BoundedReadOnlyStream) to enable non-seekable streaming scenarios. - Add new unit tests validating basic ForwardRead behavior and unsupported operations.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| src/libraries/System.IO.Compression/tests/ZipArchive/zip_ForwardReadTests.cs | Adds new test coverage for ForwardRead iteration, sync/async behavior, and unsupported APIs. |
| src/libraries/System.IO.Compression/tests/System.IO.Compression.Tests.csproj | Includes the new ForwardRead test file in the test project. |
| src/libraries/System.IO.Compression/src/System/IO/Compression/ZipCustomStreams.cs | Adds BoundedReadOnlyStream and ReadAheadStream used by ForwardRead mode. |
| src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveMode.cs | Introduces ZipArchiveMode.ForwardRead enum value with documentation. |
| src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveEntry.cs | Adds ForwardRead entry initialization and Open() support for ForwardRead mode. |
| src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchive.cs | Implements ForwardRead iteration, local header parsing, data-descriptor handling, and non-seekable wrapping. |
| src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchive.Async.cs | Enables ForwardRead setup for async factory and disposal paths. |
| src/libraries/System.IO.Compression/src/Resources/Strings.resx | Adds new SR strings for ForwardRead error messages. |
| src/libraries/System.IO.Compression/ref/System.IO.Compression.cs | Updates public surface area (new mode + new ZipArchive methods). |
| // Known size, not encrypted | ||
| Stream bounded = new BoundedReadOnlyStream(_archiveStream, compressedSize); | ||
| Stream decompressor = CreateForwardReadDecompressor(bounded, compressionMethod, uncompressedSize, leaveOpen: false); | ||
| dataStream = new CrcValidatingReadStream(decompressor, crc32, uncompressedSize); |
There was a problem hiding this comment.
I wonder if we should postpone DataStream creation and do it lazily like we do for the other entries.
There was a problem hiding this comment.
I think I can defer that for the on data descriptor entries
| if (isDirectory || isEmptyEntry) | ||
| return null; | ||
|
|
There was a problem hiding this comment.
In ForwardRead mode on a non-seekable input, empty file entries currently return null from BuildForwardReadDataStream, which makes ZipArchiveEntry.Open() throw (ForwardReadNoDataStream). Empty files are valid entries and should be openable (returning an empty stream) so callers can treat them the same as in Read mode; only directory entries should be non-openable.
| if (isDirectory || isEmptyEntry) | |
| return null; | |
| if (isDirectory) | |
| return null; | |
| if (isEmptyEntry) | |
| return Stream.Null; |
| // Known size, not encrypted — store lightweight SubReadStream as a bookmark; | ||
| // decompressor + CRC wrapper are created lazily in OpenInForwardReadMode. | ||
| return new SubReadStream(_archiveStream, _archiveStream.Position, data.CompressedSize); | ||
| } |
There was a problem hiding this comment.
BuildForwardReadDataStream returns a SubReadStream over _archiveStream even when _archiveStream is the ReadAheadStream wrapper (used for non-seekable sources). SubReadStream's Seek(Current, -n) rewinds by calling its super-stream Seek(…, Begin), but ReadAheadStream only supports rewinding via SeekOrigin.Current with a negative offset. This prevents DeflateStream from rewinding unconsumed bytes at end-of-stream, which can leave the archive stream position past the end of the entry and corrupt subsequent header / data-descriptor reads.
| set | ||
| { | ||
| ThrowIfDisposed(); | ||
| throw new NotSupportedException(); |
There was a problem hiding this comment.
ReadAheadStream reports CanSeek=true, but Seek only supports SeekOrigin.Current with a negative offset and Position's setter throws NotSupportedException. This violates the Stream seeking contract and is already causing issues when combined with SubReadStream/DeflateStream rewind logic (it expects seeking to actually work when CanSeek is true). Consider either implementing the additional seek forms you rely on (e.g., SeekOrigin.Begin for backwards seeks within the history and Position setter via Seek), or avoiding passing this stream through code paths that expect a fully seekable stream.
| throw new NotSupportedException(); | |
| if (value < 0) | |
| { | |
| throw new ArgumentOutOfRangeException(nameof(value)); | |
| } | |
| long offset = value - _position; | |
| if (offset == 0) | |
| { | |
| return; | |
| } | |
| if (offset > 0) | |
| { | |
| throw new NotSupportedException(); | |
| } | |
| Seek(offset, SeekOrigin.Current); |
Fixes #1550
Experiment with new Forward only reading mode for ziparchvie, that relies on the local headers of entries instead of the central directory, avoiding loading everything into memory at once