Fix TarReader: implement full GNU sparse format 1.0 (PAX) reading with DataStream expansion#125283
Fix TarReader: implement full GNU sparse format 1.0 (PAX) reading with DataStream expansion#125283
Conversation
…rse.name and GNU.sparse.realsize Co-authored-by: lewing <24063+lewing@users.noreply.github.com>
…sertions for data stream integrity Co-authored-by: lewing <24063+lewing@users.noreply.github.com>
|
@copilot What does happen when we attempt to extract such archives? If we are to transparently handle the real name and size, then we should also expand the contents when reading the entrys DataStream |
Extracting GNU sparse format 1.0 PAX entries with the current code produces a file with the correct name (from our fix) but incorrect content. Here's what happens: the
When Properly expanding the DataStream would require:
This is significantly more complex than the current fix. The current PR is scoped to fix the metadata issues ( I can revert the |
|
Tagging subscribers to this area: @dotnet/area-system-formats-tar |
|
@copilot simply fixing the exposed metadata does not address the original reported issue, you need to implement complete handling of the pax 1.0 sparse format (only for reading, no need to be able to produce them) I think it is okay to avoid implementing the older sparse file formats though. |
…SparseStream Co-authored-by: rzikm <32671551+rzikm@users.noreply.github.com>
…ed offsets, fix CanSeek order, fix async allocation Co-authored-by: rzikm <32671551+rzikm@users.noreply.github.com>
Done in commit
Code review fixes applied: |
src/libraries/System.Formats.Tar/src/System/Formats/Tar/TarHeader.Read.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Formats.Tar/src/System/Formats/Tar/TarReader.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Formats.Tar/tests/TarReader/TarReader.GetNextEntry.Tests.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Formats.Tar/tests/TarReader/TarReader.GetNextEntry.Tests.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Formats.Tar/tests/TarReader/TarReader.GetNextEntry.Tests.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Formats.Tar/src/System/Formats/Tar/GnuSparseStream.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Formats.Tar/src/System/Formats/Tar/GnuSparseStream.cs
Outdated
Show resolved
Hide resolved
Rent the 1024-byte parsing buffer from ArrayPool<byte>.Shared instead of allocating a new array on each call, reducing GC pressure. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Reject segments that extend past realSize or are not in ascending offset order during InitializeFromParsedMap. Add test cases for out-of-bounds and misordered segments. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…c and async Replace InlineData with MemberData generator that yields every corrupted map string with both useAsync=false and useAsync=true, ensuring full sync/async coverage for all 11 corruption scenarios. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
src/libraries/System.Formats.Tar/src/System/Formats/Tar/TarHeader.Read.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Formats.Tar/src/System/Formats/Tar/GnuSparseStream.cs
Show resolved
Hide resolved
src/libraries/System.Formats.Tar/src/System/Formats/Tar/GnuSparseStream.cs
Show resolved
Hide resolved
…eStream - GnuSparseStream.Dispose now disposes the underlying raw stream - Replace Debug.Assert with runtime check for negative skipBytes on non-seekable streams in ReadFromPackedData/ReadFromPackedDataAsync - Restore AdvanceStreamAsync in the async ProcessDataBlock path (was incorrectly changed to sync AdvanceStream) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rse-format-handling
src/libraries/System.Formats.Tar/src/System/Formats/Tar/GnuSparseStream.cs
Show resolved
Hide resolved
src/libraries/System.Formats.Tar/src/System/Formats/Tar/GnuSparseStream.cs
Outdated
Show resolved
Hide resolved
|
@copilot Address the code review comments, for the segments binary search, preserve the behavior that we don't perform unnecessary searches when doing sequential read of the entire entry. |
…tial scan for forward reads; fix overflow check Co-authored-by: rzikm <32671551+rzikm@users.noreply.github.com>
Addressed in
|
| } | ||
| } | ||
|
|
||
| protected override void Dispose(bool disposing) |
There was a problem hiding this comment.
there is no disposeasync override
| // _size retains the archive data section length needed for correct stream positioning. | ||
| if (TarHelpers.TryGetStringAsBaseTenLong(ExtendedAttributes, PaxEaGnuSparseRealSize, out long gnuSparseRealSize)) | ||
| { | ||
| _gnuSparseRealSize = gnuSparseRealSize; |
There was a problem hiding this comment.
can the size be negative here?
TarReaderwas not handling GNU sparse format 1.0 PAX entries, causing ~46% of entries from bsdtar-created archives (e.g., .NET SDK tarballs built on macOS/APFS) to expose internal placeholder paths likeGNUSparseFile.0/real-file.dll, incorrect sizes, and corrupted extracted content.Changes
Added read-only support for GNU sparse format 1.0 (PAX). When
TarReaderencounters PAX extended attributesGNU.sparse.major=1andGNU.sparse.minor=0, it resolves the real file name fromGNU.sparse.name, reports the expanded size fromGNU.sparse.realsize, and wraps the raw data stream withGnuSparseStreamwhich presents the expanded virtual file content (zeros for holes, packed data at correct offsets).The sparse map embedded in the data section is parsed lazily on first
Read, so_dataStreamremains unconsumed during entry construction. This allowsTarWriter.WriteEntryto round-trip the condensed sparse data correctly for both seekable and non-seekable source archives.Older GNU sparse formats (0.0, 0.1) and write support are not addressed.
Segment search improvements in
GnuSparseStream:_currentSegmentIndexand scan forward — O(1) amortized.InitializeFromParsedMapuses overflow-safe arithmetic (offset > _realSize || length > _realSize - offset) to reject malformed maps that would cause integer overflow.Testing
All existing tests pass. New
TarReader.SparseFile.Tests.cscovers:copyData× sync/asyncpax-nil-sparse-data,pax-nil-sparse-hole,pax-sparse-big)💬 Send tasks to Copilot coding agent from Slack and Teams to turn conversations into code. Copilot posts an update in your thread when it's finished.