Skip to content

Feature/zip archive forward read#126646

Draft
alinpahontu2912 wants to merge 7 commits intodotnet:mainfrom
alinpahontu2912:feature/zip-archive-forward-read
Draft

Feature/zip archive forward read#126646
alinpahontu2912 wants to merge 7 commits intodotnet:mainfrom
alinpahontu2912:feature/zip-archive-forward-read

Conversation

@alinpahontu2912
Copy link
Copy Markdown
Member

Fixes #1550

Experiment with new Forward only reading mode for ziparchvie, that relies on the local headers of entries instead of the central directory, avoiding loading everything into memory at once

alinpahontu2912 and others added 3 commits April 7, 2026 16:01
Adds a new ForwardRead mode to ZipArchive that enables forward-only
sequential reading of ZIP entries from non-seekable streams, using
local file headers instead of the central directory.

Changes:
- ZipArchiveMode.cs: Add ForwardRead = 3 enum value
- ZipCustomStreams.cs: Add BoundedReadOnlyStream and ReadAheadStream
  helper stream classes
- ZipArchive.cs: Add GetNextEntry()/GetNextEntryAsync() methods,
  ForwardRead constructor case, ValidateMode/DecideArchiveStream
  support, data descriptor parsing, and property guards
- ZipArchive.Async.cs: Add ForwardRead cases to CreateAsync and
  DisposeAsyncCore
- ZipArchiveEntry.cs: Add forward-read constructor, ForwardReadDataStream
  property, UpdateFromDataDescriptor method, OpenInForwardReadMode,
  and property setter guards
- Strings.resx: Add ForwardRead error message strings
- ref/System.IO.Compression.cs: Add public API surface
- Tests: Add comprehensive zip_ForwardReadTests covering deflate,
  stored, data descriptors, non-seekable streams, empty archives,
  partial reads, error cases, and async operations

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an experimental forward-only streaming read mode for System.IO.Compression.ZipArchive to support sequential iteration over ZIP entries via local file headers (instead of loading the central directory up-front), including async iteration and new unit tests.

Changes:

  • Add ZipArchiveMode.ForwardRead plus public ZipArchive.GetNextEntry() / GetNextEntryAsync(...) APIs for forward-only entry iteration.
  • Implement forward-read parsing and stream wrappers (ReadAheadStream, BoundedReadOnlyStream) to enable non-seekable streaming scenarios.
  • Add new unit tests validating basic ForwardRead behavior and unsupported operations.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
src/libraries/System.IO.Compression/tests/ZipArchive/zip_ForwardReadTests.cs Adds new test coverage for ForwardRead iteration, sync/async behavior, and unsupported APIs.
src/libraries/System.IO.Compression/tests/System.IO.Compression.Tests.csproj Includes the new ForwardRead test file in the test project.
src/libraries/System.IO.Compression/src/System/IO/Compression/ZipCustomStreams.cs Adds BoundedReadOnlyStream and ReadAheadStream used by ForwardRead mode.
src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveMode.cs Introduces ZipArchiveMode.ForwardRead enum value with documentation.
src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveEntry.cs Adds ForwardRead entry initialization and Open() support for ForwardRead mode.
src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchive.cs Implements ForwardRead iteration, local header parsing, data-descriptor handling, and non-seekable wrapping.
src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchive.Async.cs Enables ForwardRead setup for async factory and disposal paths.
src/libraries/System.IO.Compression/src/Resources/Strings.resx Adds new SR strings for ForwardRead error messages.
src/libraries/System.IO.Compression/ref/System.IO.Compression.cs Updates public surface area (new mode + new ZipArchive methods).

Comment thread src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchive.cs Outdated
Comment thread src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchive.cs Outdated
Comment thread src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchive.cs Outdated
Comment thread src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchive.cs Outdated
Comment thread src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchive.cs Outdated
Comment thread src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchive.cs Outdated
Comment thread src/libraries/System.IO.Compression/tests/ZipArchive/zip_ForwardReadTests.cs Outdated
Comment thread src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchive.Async.cs Outdated
Comment thread src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchive.cs Outdated
Comment thread src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchive.cs Outdated
Comment thread src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchive.cs Outdated
Comment thread src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchive.cs Outdated
Comment thread src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchive.cs Outdated
Comment thread src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchive.cs Outdated
Comment on lines +695 to +698
// Known size, not encrypted
Stream bounded = new BoundedReadOnlyStream(_archiveStream, compressedSize);
Stream decompressor = CreateForwardReadDecompressor(bounded, compressionMethod, uncompressedSize, leaveOpen: false);
dataStream = new CrcValidatingReadStream(decompressor, crc32, uncompressedSize);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should postpone DataStream creation and do it lazily like we do for the other entries.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I can defer that for the on data descriptor entries

Comment thread src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveEntry.cs Outdated
Copilot AI review requested due to automatic review settings April 21, 2026 10:23
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Comment thread src/libraries/System.IO.Compression/tests/ZipArchive/zip_ForwardReadTests.cs Outdated
Comment thread src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchive.cs Outdated
Copilot AI review requested due to automatic review settings April 22, 2026 09:42
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.

Comment on lines +534 to +536
if (isDirectory || isEmptyEntry)
return null;

Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In ForwardRead mode on a non-seekable input, empty file entries currently return null from BuildForwardReadDataStream, which makes ZipArchiveEntry.Open() throw (ForwardReadNoDataStream). Empty files are valid entries and should be openable (returning an empty stream) so callers can treat them the same as in Read mode; only directory entries should be non-openable.

Suggested change
if (isDirectory || isEmptyEntry)
return null;
if (isDirectory)
return null;
if (isEmptyEntry)
return Stream.Null;

Copilot uses AI. Check for mistakes.
Comment on lines +556 to +559
// Known size, not encrypted — store lightweight SubReadStream as a bookmark;
// decompressor + CRC wrapper are created lazily in OpenInForwardReadMode.
return new SubReadStream(_archiveStream, _archiveStream.Position, data.CompressedSize);
}
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BuildForwardReadDataStream returns a SubReadStream over _archiveStream even when _archiveStream is the ReadAheadStream wrapper (used for non-seekable sources). SubReadStream's Seek(Current, -n) rewinds by calling its super-stream Seek(…, Begin), but ReadAheadStream only supports rewinding via SeekOrigin.Current with a negative offset. This prevents DeflateStream from rewinding unconsumed bytes at end-of-stream, which can leave the archive stream position past the end of the entry and corrupt subsequent header / data-descriptor reads.

Copilot uses AI. Check for mistakes.
set
{
ThrowIfDisposed();
throw new NotSupportedException();
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReadAheadStream reports CanSeek=true, but Seek only supports SeekOrigin.Current with a negative offset and Position's setter throws NotSupportedException. This violates the Stream seeking contract and is already causing issues when combined with SubReadStream/DeflateStream rewind logic (it expects seeking to actually work when CanSeek is true). Consider either implementing the additional seek forms you rely on (e.g., SeekOrigin.Begin for backwards seeks within the history and Position setter via Seek), or avoiding passing this stream through code paths that expect a fully seekable stream.

Suggested change
throw new NotSupportedException();
if (value < 0)
{
throw new ArgumentOutOfRangeException(nameof(value));
}
long offset = value - _position;
if (offset == 0)
{
return;
}
if (offset > 0)
{
throw new NotSupportedException();
}
Seek(offset, SeekOrigin.Current);

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a Forward-only API for System.IO.Compression

3 participants