Fix deterministic MVID and add PdbChecksum#31
Merged
marek-safar merged 5 commits intodotnet:mainfrom Nov 11, 2021
Merged
Conversation
Deterministic MVID So far the deterministic MVID was computed as a hash of the IL metadata. It didn't include PE headers. This means that if there's an assembly which only differs in the PE architecture it would have the same MVID. This can happen relatively easily for facades (assemblies with only type forwarders) as those are typically identical across platforms, possibly except for the architecture. The correct way to calculate MVID is to hash the content of the entire file with the MVID itself and the strong name signature (if presenet) zeroed out. This change modified the MVID computation to work that way: Write the entire image with MVID all zeroes, compute the hash, write the MVID and then compute the strong name. PdbChecksum In order for the produced PDB to be fully verifiable the assembly must contain a PdbChecksum debug header entry which has a cryptographic hash of the associated PDB. The wayt to calculate the checksum is prescribed as other programs must be able to validate it. See https://github.com/dotnet/runtime/blob/main/docs/design/specs/PE-COFF.md#pdb-checksum-debug-directory-entry-type-19. This change implements that behavior. Symbols are now written explicitly once all metadata is processed but before the final assembly image is written. For portable PDBs the full file is written with PDB ID set to all zeroes. Then the crypto hash is calculated (using SHA256). The hash is then written into the PdbChecksum debug header entry of the assembly image. Deterministic Portable PDB ID So far the PDB ID was set to the same value as MVID. With the fix for MVID described above, this doesn't work anymore. In case of an embedded PDB, it's not possible to calculate MVID using the hash with the PDB embedded, as the PDB would now contain the MVID (cycle). The recommended way to calculate PDB ID is to use the same mechanism as for calculating PDB checksum and use the first 20 bytes of the has as the PDB ID. This change implements this by using the first 20 bytes of the hash and sets the PDB ID as the last change of the portable PDB. The calculated PDB ID is also then set into the CodeView debug header entry of the assembly. Added tests for reading and writing both portable and embedded portable PDBs with checksums. Added test to validate stability of PDB ID across multiple writes. Added test to validate stability of MVID across multiple writes and the fact that it changes when just the PE architecture is changed.
marek-safar
requested changes
Nov 8, 2021
Mono.Cecil.Cil/PortablePdb.cs
Outdated
|
|
||
| return new EmbeddedPortablePdbReader ( | ||
| (PortablePdbReader) new PortablePdbReaderProvider ().GetSymbolReader (module, GetPortablePdbStream (entry))); | ||
| (PortablePdbReader)new PortablePdbReaderProvider ().GetSymbolReader (module, GetPortablePdbStream (entry))); |
There was a problem hiding this comment.
Please undo any line damage or formating changes, they make upstream merges pain
Member
Author
There was a problem hiding this comment.
Apparently I hit something in VS which reformatted the entire file. I'm so glad we have lint in dotnet/linker 😉.
Anyway - hopefully I was able to revert all of these now.
This should mostly get rid of any breaking changes made by this change.
Member
Author
|
This should be fully ready now. |
marek-safar
approved these changes
Nov 10, 2021
marek-safar
left a comment
There was a problem hiding this comment.
I'd suggest to make upstream PR after you merge it here
Member
Author
|
Ported this to jbevain/cecil as jbevain#810. |
vitek-karas
added a commit
to vitek-karas/linker
that referenced
this pull request
Nov 18, 2021
The new Cecil fixes the deterministic MVID and PDB checksum. Details in dotnet/cecil#31.
vitek-karas
added a commit
to dotnet/linker
that referenced
this pull request
Nov 18, 2021
The new Cecil fixes the deterministic MVID and PDB checksum. Details in dotnet/cecil#31.
marek-safar
pushed a commit
that referenced
this pull request
Jan 20, 2022
* Fix deterministic MVID and add PdbChecksum (#31) * Fix how pdb path is calculated in the tests * Fix portable PDB stamp in CodeView header (#32) * Introduce ISymbolWriter.Write This mostly cleans up the code to make it easier to understand. `ISymbolWriter.GetDebugHeader` no longer actually writes the symbols, there's a new `Write` method for just that. The assembly writer calls `Write` first and then the image writer calls `GetDebugHeader` when it's needed. This is partially taken from jbevain#617.
marek-safar
added a commit
that referenced
this pull request
Jan 20, 2022
This reverts commit ff616bf.
agocke
pushed a commit
to dotnet/runtime
that referenced
this pull request
Nov 16, 2022
…otnet/linker#2384) The new Cecil fixes the deterministic MVID and PDB checksum. Details in dotnet/cecil#31. Commit migrated from dotnet/linker@4f4a670
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Make deterministic MVID unique even when the only change is in PE headers. Implement deterministic PDB ID generation independent of MVID and implement writing of PdbCheckum debug header entry.
Observable differences (other than the above):
GetDebugHeader(well except for MPDB writer which I didn't change as it's really tricky). Callers should not notice much of a difference if they assumed onlyDisposewould actually write.Module.Write(streamnow requires the stream to be read/write. It already did so for cases where strong name was written, now it requires it also when deterministic MVID is turned on.Deterministic MVID
So far the deterministic MVID was computed as a hash of the IL metadata. It didn't include PE headers.
This means that if there's an assembly which only differs in the PE architecture it would have the same MVID.
This can happen relatively easily for facades (assemblies with only type forwarders) as those are typically
identical across platforms, possibly except for the architecture.
The correct way to calculate MVID is to hash the content of the entire file with the MVID itself and
the strong name signature (if presenet) zeroed out.
This change modified the MVID computation to work that way: Write the entire image with MVID all zeroes,
compute the hash, write the MVID and then compute the strong name.
PdbChecksum
In order for the produced PDB to be fully verifiable the assembly must contain a PdbChecksum
debug header entry which has a cryptographic hash of the associated PDB. The wayt to calculate the checksum
is prescribed as other programs must be able to validate it. See https://github.com/dotnet/runtime/blob/main/docs/design/specs/PE-COFF.md#pdb-checksum-debug-directory-entry-type-19.
This change implements that behavior. Symbols are now written explicitly once all metadata is processed
but before the final assembly image is written. For portable PDBs the full file is written with PDB ID
set to all zeroes. Then the crypto hash is calculated (using SHA256).
The hash is then written into the PdbChecksum debug header entry of the assembly image.
Deterministic Portable PDB ID
So far the PDB ID was set to the same value as MVID. With the fix for MVID described above, this doesn't
work anymore. In case of an embedded PDB, it's not possible to calculate MVID using the hash
with the PDB embedded, as the PDB would now contain the MVID (cycle).
The recommended way to calculate PDB ID is to use the same mechanism as for calculating PDB checksum
and use the first 20 bytes of the has as the PDB ID.
This change implements this by using the first 20 bytes of the hash and sets the PDB ID as the last
change of the portable PDB.
The calculated PDB ID is also then set into the CodeView debug header entry of the assembly.
Added tests for reading and writing both portable and embedded portable PDBs with checksums.
Added test to validate stability of PDB ID across multiple writes.
Added test to validate stability of MVID across multiple writes and the fact that it changes
when just the PE architecture is changed.
This is a fix for dotnet/linker#2203.