Add ELF Symbol Resolution for Linux .nettrace Traces#2397
Add ELF Symbol Resolution for Linux .nettrace Traces#2397brianrob merged 10 commits intomicrosoft:mainfrom
Conversation
Introduce a TraceModuleFileSymbolInfo type hierarchy (PESymbolInfo, ElfSymbolInfo) to cleanly separate Windows PE and Linux ELF module metadata. Migrate existing PDB and R2R fields from TraceModuleFile into PESymbolInfo with MatchOrInit pattern for type-safe access. Add end-to-end ELF symbol resolution: - ElfSymbolModule: add ReadBuildId, fix RVA computation using page-aligned p_vaddr with SystemPageSize from trace headers - SymbolReader: add FindElfSymbolFilePath with SSQP symbol server support, ElfBuildIdMatches for build-id validation, OpenElfSymbolFile - TraceLog: add OpenElfSymbolsForModuleFile, update LookupSymbolsForModule to dispatch by ModuleBinaryFormat - Populate ELF metadata (BuildId, VirtualAddress, FileOffset, PageSize) from trace events Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Refactor symbol resolution so all three formats (PDB, R2R, ELF) follow the same Find -> Matches -> Open pipeline: - Add R2RPerfMapMatches and ElfBuildIdMatches mirroring PdbMatches - Move validation from Open methods into Matches classes - Add lightweight R2R header probe to avoid double-parsing - Make CheckSecurity private (only used internally) - Simplify TraceLog callers (fallback logic now in Find methods) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add ReadDebugLink static method to ElfSymbolModule that parses ELF section headers to find the .gnu_debuglink section and extract the debug file name. Update FindElfSymbolFilePath to use debuglink-based probing (same directory and .debug/ subdirectory) instead of only trying hardcoded suffixes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Hoist 20+ magic numbers to named constants for all ELF struct field offsets, sizes, and limits - Cap PT_NOTE allocation at 64KB (MaxNoteSizeBytes) to prevent OOM - Add overflow guard in ExtractBuildId note alignment arithmetic - Make FindNameForRva thread-safe: parallel string[] with Volatile.Read/Interlocked.CompareExchange for lazy name cache - Use ThreadLocal<T> for demanglers (mutable parser state) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The binary itself (which typically only has .dynsym exported symbols) was being tried as a fallback before consulting symbol servers that could provide proper debug symbol files (with .symtab full symbols). Restructure the search into three ordered phases: Phase 1: Local debug files adjacent to binary (.debug, .dbg, debuglink) Phase 2: Symbol servers and symbol path directories Phase 3: Binary itself as absolute last resort This ensures we prefer debug symbols from symbol servers over the stripped binary whenever they are available. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
leculver
left a comment
There was a problem hiding this comment.
I think this style of reading structs is a bit hard to reason about and understand when there are bugs and issues. Parsing fields by calculating offsets feels error prone or at least hard to follow/reason about.
What would you think of modifying this to define copies of structs with the right layout, then simply reading those structs? Something similar to:
https://github.com/microsoft/clrmd/blob/main/src/Microsoft.Diagnostics.Runtime/Linux/Structs/ElfHeader64.cs
https://github.com/microsoft/clrmd/blob/main/src/Microsoft.Diagnostics.Runtime/Linux/ElfProgramHeader.cs#L54
Replace manual offset constants and ReadUXX helpers with [StructLayout(Sequential, Pack=1)] structs matching the ELF binary format. Use MemoryMarshal.Read<T> to read structs from byte arrays, following the same pattern as PEFile.cs. Each struct has a SwapEndian() method for big-endian support via BinaryPrimitives.ReverseEndianness. This eliminates ~35 offset constants, 8 ReadUXX helper methods, and collapses 4 symbol parsing branches into 2. Structs defined: Elf32/64_Ehdr, Elf32/64_Phdr, Elf32/64_Shdr, Elf32/64_Sym, Elf_Nhdr. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…lpers - Add IElfStruct interface implemented by all 9 ELF structs - ReadStruct<T> now accepts bigEndian param and calls SwapEndian() internally, eliminating ~15 caller-side endian checks - Add TryReadElfHeader helper consolidating ELF header validation and field extraction duplicated across ReadBuildId, ReadDebugLink, and ParseElf - Add ReadProgramHeader and ReadSymbolEntry helpers parallel to existing ReadSectionHeader - Extend ReadSectionHeader with sh_name output; replace inline Shdr extraction in ReadDebugLink and ParseElf extended section count - Net reduction of ~40 lines with significantly less code duplication Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Thanks for this feedback @leculver. I'm on-board for this kind of change. Please see the latest commits. |
leculver
left a comment
There was a problem hiding this comment.
Some minor feedback here.
src/TraceEvent/TraceLog.cs
Outdated
| // ELF RVA = (address - ImageBase) + FileOffset, matching ElfSymbolModule's | ||
| // (st_value - pVaddr) + pOffset formula. | ||
| ulong fileOffset = moduleFile.ElfInfo.FileOffset; | ||
| computeRva = (address) => (uint)(address - moduleFile.ImageBase) + (uint)fileOffset; |
There was a problem hiding this comment.
Need to check for overflow of uint here. Either for bad or malicious input.
| if (string.IsNullOrEmpty(expectedBuildId)) | ||
| { | ||
| m_log.WriteLine("FindElfSymbolFilePath: No expected build-id provided, assuming unsafe match for {0}", filePath); | ||
| return true; |
There was a problem hiding this comment.
What happens if this binary actually doesn't match? Do we have any way to know that "this trace came from my local computer" vs "this trace is from another computer"? If so we could be smarter in whether we return true or false here. Because the default case when moving a nettrace file to a new machine is that we should have returned false for not matching.
There was a problem hiding this comment.
This is a bug. It should be false. If we need to handle cases where binaries don't have buildids, then we'll likely want to add an option similiar to /UnsafePDBMatch which allows the user to override the check. Thanks for catching this.
| /// <param name="fileName">The simple filename of the ELF module (e.g., "libcoreclr.so")</param> | ||
| /// <param name="buildId">The GNU build-id as a lowercase hex string</param> | ||
| /// <returns>The local file path to the downloaded symbol file, or null if not found.</returns> | ||
| public string FindElfSymbolFilePath(string fileName, string buildId, string elfFilePath = null) |
There was a problem hiding this comment.
We unconditionally use buildId.ToLowerInvariant(). We need to either:
if (buildId is null) throw new ArgumentNullException(...);
Or below, buildId?.ToLowerInvariant();.
(Actually, all of this code needs to be updated to be nullable someday. That might be a good job for copilot cli with a ralph loop. These issues would be clearer.)
| // Used as the key to the m_elfPathCache. | ||
| private struct ElfBuildIdSignature : IEquatable<ElfBuildIdSignature> | ||
| { | ||
| public override int GetHashCode() { return FileName.GetHashCode() + BuildId.GetHashCode(); } |
…h, HashCode.Combine - TraceLog.cs: Wrap RVA computation in checked context for uint overflow - SymbolReader.cs: Add null guard for buildId parameter - SymbolReader.cs: Return false when no expected build-id (safer default) - SymbolReader.cs: Use HashCode.Combine for ElfBuildIdSignature/ElfModuleSignature - Add Microsoft.Bcl.HashCode NuGet package with PerfView DLL embedding Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
10d3cbf to
1cdfc02
Compare
|
Thanks @leculver. Feedback should be addressed now. |
Summary
Wires up end-to-end ELF debug symbol resolution so PerfView can symbolize native Linux modules (e.g.
libcoreclr.so) in.nettracetraces. This builds on the previous PR that added ELF symbol parsing by integrating it into the TraceEvent symbol resolution pipeline, introducing a type hierarchy to cleanly separate PE and ELF module metadata, and aligning all three symbol formats (PDB, R2R, ELF) to follow a consistent Find→Matches→Open pattern.During integration, an RVA computation bug was discovered and fixed: the ELF symbol lookup did not account for the Linux loader's page-alignment of the executable PT_LOAD segment, producing incorrect symbols when
p_vaddr ≠ p_offset(as inlibcoreclr.so).Changes
Commit 1: ELF Symbol Resolution with SymbolInfo Type Hierarchy
TraceModuleFileSymbolInfohierarchy (PESymbolInfo,ElfSymbolInfo) to separate Windows PE and Linux ELF module metadataTraceModuleFileintoPESymbolInfo; addMatchOrInitPE/MatchOrInitELFfor type-safe accessp_vaddrwithSystemPageSizefrom trace headersReadBuildId,FindElfSymbolFilePath(with SSQP symbol server),ElfBuildIdMatches,OpenElfSymbolFileLookupSymbolsForModuleto dispatch byModuleBinaryFormatCommit 2: Align R2R and ELF with PDB Find→Matches→Open Pattern
R2RPerfMapMatchesandElfBuildIdMatchesmirroringPdbMatchesCommit 3: Read
.gnu_debuglinkfrom ELF BinariesReadDebugLinkto parse the.gnu_debuglinksection for the actual debug file nameFindElfSymbolFilePathto probe debuglink paths (same dir,.debug/subdir) instead of only guessing suffixesCommit 4: Harden ElfSymbolModule
ExtractBuildIdalignment arithmeticVolatile.Read/Interlocked.CompareExchange)ThreadLocal<T>demanglers for concurrentFindNameForRvaUnit Testing
93 tests across
ElfSymbolModuleTests(48) andSymbolReaderTests(45), covering:All tests pass on both
net8.0andnet462.Real-World Symbol Testing
100% pass rate for symbol file parsing across x64 and arm64 covering a random assortment of distro-hosted symbol files (500+ per distro per architecture) and all .NET SDKs.
Linux Distro Datasets
Total: 6,497 files, 774,081 symbols
.NET Version Datasets
Fixes #2382