Skip to content

Add ELF Symbol Resolution for Linux .nettrace Traces#2397

Merged
brianrob merged 10 commits intomicrosoft:mainfrom
brianrob:brianrob/elf-symbols-squashed
Mar 30, 2026
Merged

Add ELF Symbol Resolution for Linux .nettrace Traces#2397
brianrob merged 10 commits intomicrosoft:mainfrom
brianrob:brianrob/elf-symbols-squashed

Conversation

@brianrob
Copy link
Copy Markdown
Member

@brianrob brianrob commented Mar 25, 2026

Summary

Wires up end-to-end ELF debug symbol resolution so PerfView can symbolize native Linux modules (e.g. libcoreclr.so) in .nettrace traces. This builds on the previous PR that added ELF symbol parsing by integrating it into the TraceEvent symbol resolution pipeline, introducing a type hierarchy to cleanly separate PE and ELF module metadata, and aligning all three symbol formats (PDB, R2R, ELF) to follow a consistent Find→Matches→Open pattern.

During integration, an RVA computation bug was discovered and fixed: the ELF symbol lookup did not account for the Linux loader's page-alignment of the executable PT_LOAD segment, producing incorrect symbols when p_vaddr ≠ p_offset (as in libcoreclr.so).

Changes

Commit 1: ELF Symbol Resolution with SymbolInfo Type Hierarchy

  • Introduce TraceModuleFileSymbolInfo hierarchy (PESymbolInfo, ElfSymbolInfo) to separate Windows PE and Linux ELF module metadata
  • Migrate existing PDB/R2R fields from TraceModuleFile into PESymbolInfo; add MatchOrInitPE/MatchOrInitELF for type-safe access
  • Fix RVA computation: use page-aligned p_vaddr with SystemPageSize from trace headers
  • Add ReadBuildId, FindElfSymbolFilePath (with SSQP symbol server), ElfBuildIdMatches, OpenElfSymbolFile
  • Update LookupSymbolsForModule to dispatch by ModuleBinaryFormat

Commit 2: Align R2R and ELF with PDB Find→Matches→Open Pattern

  • Add R2RPerfMapMatches and ElfBuildIdMatches mirroring PdbMatches
  • Move validation into Matches classes; simplify Open methods and TraceLog callers
  • Add lightweight R2R header probe to avoid double-parsing

Commit 3: Read .gnu_debuglink from ELF Binaries

  • Add ReadDebugLink to parse the .gnu_debuglink section for the actual debug file name
  • Update FindElfSymbolFilePath to probe debuglink paths (same dir, .debug/ subdir) instead of only guessing suffixes

Commit 4: Harden ElfSymbolModule

  • Hoist 20+ magic numbers to named constants
  • Cap PT_NOTE allocation at 64KB to prevent OOM from crafted ELF
  • Add overflow guard in ExtractBuildId alignment arithmetic
  • Thread-safe lazy name resolution (Volatile.Read/Interlocked.CompareExchange)
  • ThreadLocal<T> demanglers for concurrent FindNameForRva

Unit Testing

93 tests across ElfSymbolModuleTests (48) and SymbolReaderTests (45), covering:

  • ELF parsing (32/64-bit, LE/BE, error handling, symbol lookup)
  • ReadBuildId and ReadDebugLink (valid, corrupt, stripped binaries)
  • MatchOrInit type safety assertions
  • FindElfSymbolFilePath (local, SSQP, debuglink, cache behavior)
  • R2R and PDB Find/Matches/Open paths

All tests pass on both net8.0 and net462.

Real-World Symbol Testing

100% pass rate for symbol file parsing across x64 and arm64 covering a random assortment of distro-hosted symbol files (500+ per distro per architecture) and all .NET SDKs.

Linux Distro Datasets

Dataset Files Symbols
Alpine (x64) 551 115,261
Alpine (arm64) 555 114,778
CentOS (x64) 550 219,067
CentOS (arm64) 550 137,407
Debian (x64) 541 12,716
Debian (arm64) 541 12,980
Debian (i386) 541 11,833
Fedora (x64) 500 5,807
Fedora (arm64) 500 4,967
openSUSE (x64) 550 13,378
openSUSE (arm64) 550 16,594
Ubuntu (x64) 568 109,293

Total: 6,497 files, 774,081 symbols

.NET Version Datasets

Dataset Versions
.NET 8 (x64) 25
.NET 8 (arm64) 25
.NET 9 (x64) 15
.NET 9 (arm64) 15
.NET 10 (x64) 6
.NET 10 (arm64) 6

Fixes #2382

brianrob and others added 5 commits March 24, 2026 16:59
Introduce a TraceModuleFileSymbolInfo type hierarchy (PESymbolInfo,
ElfSymbolInfo) to cleanly separate Windows PE and Linux ELF module
metadata. Migrate existing PDB and R2R fields from TraceModuleFile
into PESymbolInfo with MatchOrInit pattern for type-safe access.

Add end-to-end ELF symbol resolution:
- ElfSymbolModule: add ReadBuildId, fix RVA computation using
  page-aligned p_vaddr with SystemPageSize from trace headers
- SymbolReader: add FindElfSymbolFilePath with SSQP symbol server
  support, ElfBuildIdMatches for build-id validation, OpenElfSymbolFile
- TraceLog: add OpenElfSymbolsForModuleFile, update
  LookupSymbolsForModule to dispatch by ModuleBinaryFormat
- Populate ELF metadata (BuildId, VirtualAddress, FileOffset,
  PageSize) from trace events

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Refactor symbol resolution so all three formats (PDB, R2R, ELF)
follow the same Find -> Matches -> Open pipeline:
- Add R2RPerfMapMatches and ElfBuildIdMatches mirroring PdbMatches
- Move validation from Open methods into Matches classes
- Add lightweight R2R header probe to avoid double-parsing
- Make CheckSecurity private (only used internally)
- Simplify TraceLog callers (fallback logic now in Find methods)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add ReadDebugLink static method to ElfSymbolModule that parses ELF
section headers to find the .gnu_debuglink section and extract the
debug file name. Update FindElfSymbolFilePath to use debuglink-based
probing (same directory and .debug/ subdirectory) instead of only
trying hardcoded suffixes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Hoist 20+ magic numbers to named constants for all ELF struct
  field offsets, sizes, and limits
- Cap PT_NOTE allocation at 64KB (MaxNoteSizeBytes) to prevent OOM
- Add overflow guard in ExtractBuildId note alignment arithmetic
- Make FindNameForRva thread-safe: parallel string[] with
  Volatile.Read/Interlocked.CompareExchange for lazy name cache
- Use ThreadLocal<T> for demanglers (mutable parser state)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The binary itself (which typically only has .dynsym exported symbols)
was being tried as a fallback before consulting symbol servers that
could provide proper debug symbol files (with .symtab full symbols).

Restructure the search into three ordered phases:
  Phase 1: Local debug files adjacent to binary (.debug, .dbg, debuglink)
  Phase 2: Symbol servers and symbol path directories
  Phase 3: Binary itself as absolute last resort

This ensures we prefer debug symbols from symbol servers over the
stripped binary whenever they are available.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@brianrob brianrob marked this pull request as ready for review March 26, 2026 22:33
@brianrob brianrob requested a review from a team as a code owner March 26, 2026 22:33
Copy link
Copy Markdown
Collaborator

@leculver leculver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this style of reading structs is a bit hard to reason about and understand when there are bugs and issues. Parsing fields by calculating offsets feels error prone or at least hard to follow/reason about.

What would you think of modifying this to define copies of structs with the right layout, then simply reading those structs? Something similar to:

https://github.com/microsoft/clrmd/blob/main/src/Microsoft.Diagnostics.Runtime/Linux/Structs/ElfHeader64.cs
https://github.com/microsoft/clrmd/blob/main/src/Microsoft.Diagnostics.Runtime/Linux/ElfProgramHeader.cs#L54

brianrob and others added 2 commits March 27, 2026 08:11
Replace manual offset constants and ReadUXX helpers with
[StructLayout(Sequential, Pack=1)] structs matching the ELF binary
format. Use MemoryMarshal.Read<T> to read structs from byte arrays,
following the same pattern as PEFile.cs. Each struct has a SwapEndian()
method for big-endian support via BinaryPrimitives.ReverseEndianness.

This eliminates ~35 offset constants, 8 ReadUXX helper methods,
and collapses 4 symbol parsing branches into 2.

Structs defined: Elf32/64_Ehdr, Elf32/64_Phdr, Elf32/64_Shdr,
Elf32/64_Sym, Elf_Nhdr.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…lpers

- Add IElfStruct interface implemented by all 9 ELF structs
- ReadStruct<T> now accepts bigEndian param and calls SwapEndian()
  internally, eliminating ~15 caller-side endian checks
- Add TryReadElfHeader helper consolidating ELF header validation
  and field extraction duplicated across ReadBuildId, ReadDebugLink,
  and ParseElf
- Add ReadProgramHeader and ReadSymbolEntry helpers parallel to
  existing ReadSectionHeader
- Extend ReadSectionHeader with sh_name output; replace inline Shdr
  extraction in ReadDebugLink and ParseElf extended section count
- Net reduction of ~40 lines with significantly less code duplication

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@brianrob
Copy link
Copy Markdown
Member Author

I think this style of reading structs is a bit hard to reason about and understand when there are bugs and issues. Parsing fields by calculating offsets feels error prone or at least hard to follow/reason about.

What would you think of modifying this to define copies of structs with the right layout, then simply reading those structs? Something similar to:

https://github.com/microsoft/clrmd/blob/main/src/Microsoft.Diagnostics.Runtime/Linux/Structs/ElfHeader64.cs https://github.com/microsoft/clrmd/blob/main/src/Microsoft.Diagnostics.Runtime/Linux/ElfProgramHeader.cs#L54

Thanks for this feedback @leculver. I'm on-board for this kind of change. Please see the latest commits.

Copy link
Copy Markdown
Collaborator

@leculver leculver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor feedback here.

// ELF RVA = (address - ImageBase) + FileOffset, matching ElfSymbolModule's
// (st_value - pVaddr) + pOffset formula.
ulong fileOffset = moduleFile.ElfInfo.FileOffset;
computeRva = (address) => (uint)(address - moduleFile.ImageBase) + (uint)fileOffset;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to check for overflow of uint here. Either for bad or malicious input.

if (string.IsNullOrEmpty(expectedBuildId))
{
m_log.WriteLine("FindElfSymbolFilePath: No expected build-id provided, assuming unsafe match for {0}", filePath);
return true;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if this binary actually doesn't match? Do we have any way to know that "this trace came from my local computer" vs "this trace is from another computer"? If so we could be smarter in whether we return true or false here. Because the default case when moving a nettrace file to a new machine is that we should have returned false for not matching.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bug. It should be false. If we need to handle cases where binaries don't have buildids, then we'll likely want to add an option similiar to /UnsafePDBMatch which allows the user to override the check. Thanks for catching this.

/// <param name="fileName">The simple filename of the ELF module (e.g., "libcoreclr.so")</param>
/// <param name="buildId">The GNU build-id as a lowercase hex string</param>
/// <returns>The local file path to the downloaded symbol file, or null if not found.</returns>
public string FindElfSymbolFilePath(string fileName, string buildId, string elfFilePath = null)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We unconditionally use buildId.ToLowerInvariant(). We need to either:

if (buildId is null) throw new ArgumentNullException(...);

Or below, buildId?.ToLowerInvariant();.

(Actually, all of this code needs to be updated to be nullable someday. That might be a good job for copilot cli with a ralph loop. These issues would be clearer.)

// Used as the key to the m_elfPathCache.
private struct ElfBuildIdSignature : IEquatable<ElfBuildIdSignature>
{
public override int GetHashCode() { return FileName.GetHashCode() + BuildId.GetHashCode(); }
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use HashCode.Combine

brianrob and others added 3 commits March 27, 2026 16:33
…h, HashCode.Combine

- TraceLog.cs: Wrap RVA computation in checked context for uint overflow
- SymbolReader.cs: Add null guard for buildId parameter
- SymbolReader.cs: Return false when no expected build-id (safer default)
- SymbolReader.cs: Use HashCode.Combine for ElfBuildIdSignature/ElfModuleSignature
- Add Microsoft.Bcl.HashCode NuGet package with PerfView DLL embedding

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@brianrob brianrob force-pushed the brianrob/elf-symbols-squashed branch from 10d3cbf to 1cdfc02 Compare March 27, 2026 23:38
@brianrob
Copy link
Copy Markdown
Member Author

Thanks @leculver. Feedback should be addressed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement ELF Symbol Download and Resolution

2 participants