Skip to content

SymbolizableOjbectFile: Convert Wasm file offset to section offset for DWARF#191068

Merged
dschuff merged 6 commits intollvm:mainfrom
dschuff:symblize-offset
Apr 9, 2026
Merged

SymbolizableOjbectFile: Convert Wasm file offset to section offset for DWARF#191068
dschuff merged 6 commits intollvm:mainfrom
dschuff:symblize-offset

Conversation

@dschuff
Copy link
Copy Markdown
Member

@dschuff dschuff commented Apr 8, 2026

Wasm's object and linking format lacks virtual addresses like ELF et al.
As a result, linked files generally use file offsets as "addresses", whereas
objects and DWARF sections use code section offsets.
This has led to incorrect interpretation of addresses in llvm-objdump
and llvm-symbolizer for linked files.

This change to SerializableObjectFile checks the input ModuleOffset, and
if it falls within a wasm code section, adjusts it to a section offset
before querying the DwarfContext. (For object files, Sec.getAddress() is
0 so it works for object files too). It extends the existing DWARF test
for llvm-symbolizer to include a linked file, and also adds an equivalent
for objdump.

Fixes #129523

dschuff added 2 commits April 8, 2026 22:37
…r DWARF

Wasm's object and linking format lacks virtual addresses like ELF et al.
As a result, linked files generally use file offsets as "addresses", whereas
objects and DWARF sections use code section offsets.
This has led to incorrect interpretation of addresses in llvm-objdump
and llvm-symbolizer for linked files.

This change to SerializableObjectFile checks the input ModuleOffset, and
if it falls within a wasm code section, adjusts it to a section offset
before querying the DwarfContext. (For object files, Sec.getAddress() is
0 so it works for object files too). It extends the existing DWARF test
for llvm-symbolizer to include a linked file, and also adds an equivalent
for objdump.

Fixes llvm#129523
@llvmbot
Copy link
Copy Markdown
Member

llvmbot commented Apr 8, 2026

@llvm/pr-subscribers-llvm-binary-utilities

@llvm/pr-subscribers-debuginfo

Author: Derek Schuff (dschuff)

Changes
  • SymbolizableOjbectFile: Convert Wasm file offset to section offset for DWARF
  • use assembly for tests

Full diff: https://github.com/llvm/llvm-project/pull/191068.diff

6 Files Affected:

  • (modified) llvm/include/llvm/DebugInfo/Symbolize/SymbolizableObjectFile.h (+7)
  • (modified) llvm/lib/DebugInfo/Symbolize/SymbolizableObjectFile.cpp (+23-3)
  • (added) llvm/test/tools/llvm-objdump/lit.local.cfg (+4)
  • (added) llvm/test/tools/llvm-objdump/wasm/line-numbers.s (+74)
  • (added) llvm/test/tools/llvm-symbolizer/lit.local.cfg (+4)
  • (modified) llvm/test/tools/llvm-symbolizer/wasm-basic.s (+1-1)
diff --git a/llvm/include/llvm/DebugInfo/Symbolize/SymbolizableObjectFile.h b/llvm/include/llvm/DebugInfo/Symbolize/SymbolizableObjectFile.h
index 5ef513f570b03..c31c3ac353fb0 100644
--- a/llvm/include/llvm/DebugInfo/Symbolize/SymbolizableObjectFile.h
+++ b/llvm/include/llvm/DebugInfo/Symbolize/SymbolizableObjectFile.h
@@ -74,6 +74,13 @@ class SymbolizableObjectFile : public SymbolizableModule {
   std::unique_ptr<DIContext> DebugInfoContext;
   bool UntagAddresses;
 
+  /// WebAssembly linked files use file offsets for code symbol addresses, but
+  /// DWARF uses section-relative offsets. This helper method converts a module
+  /// file offset into its corresponding section-relative offset, but only if
+  /// the address falls within a Wasm code section.
+  object::SectionedAddress
+  getWasmCodeDwarfOffset(object::SectionedAddress ModuleOffset) const;
+
   struct SymbolDesc {
     uint64_t Addr;
     // If size is 0, assume that symbol occupies the whole memory range up to
diff --git a/llvm/lib/DebugInfo/Symbolize/SymbolizableObjectFile.cpp b/llvm/lib/DebugInfo/Symbolize/SymbolizableObjectFile.cpp
index 29fd4d9fda7ad..39c3c1c96f83d 100644
--- a/llvm/lib/DebugInfo/Symbolize/SymbolizableObjectFile.cpp
+++ b/llvm/lib/DebugInfo/Symbolize/SymbolizableObjectFile.cpp
@@ -269,6 +269,23 @@ bool SymbolizableObjectFile::shouldOverrideWithSymbolTable(
          isa<DWARFContext>(DebugInfoContext.get());
 }
 
+object::SectionedAddress SymbolizableObjectFile::getWasmCodeDwarfOffset(
+    object::SectionedAddress ModuleOffset) const {
+  if (!Module->isWasm())
+    return ModuleOffset;
+  if (ModuleOffset.SectionIndex == object::SectionedAddress::UndefSection)
+    ModuleOffset.SectionIndex =
+        getModuleSectionIndexForAddress(ModuleOffset.Address);
+  for (const SectionRef &Sec : Module->sections()) {
+    if (Sec.getIndex() == ModuleOffset.SectionIndex) {
+      if (Sec.isText())
+        ModuleOffset.Address -= Sec.getAddress();
+      break;
+    }
+  }
+  return ModuleOffset;
+}
+
 DILineInfo
 SymbolizableObjectFile::symbolizeCode(object::SectionedAddress ModuleOffset,
                                       DILineInfoSpecifier LineInfoSpecifier,
@@ -277,8 +294,9 @@ SymbolizableObjectFile::symbolizeCode(object::SectionedAddress ModuleOffset,
     ModuleOffset.SectionIndex =
         getModuleSectionIndexForAddress(ModuleOffset.Address);
   DILineInfo LineInfo;
+  object::SectionedAddress DWARFOffset = getWasmCodeDwarfOffset(ModuleOffset);
   std::optional<DILineInfo> DBGLineInfo =
-      DebugInfoContext->getLineInfoForAddress(ModuleOffset, LineInfoSpecifier);
+      DebugInfoContext->getLineInfoForAddress(DWARFOffset, LineInfoSpecifier);
   if (DBGLineInfo)
     LineInfo = *DBGLineInfo;
 
@@ -305,8 +323,9 @@ DIInliningInfo SymbolizableObjectFile::symbolizeInlinedCode(
   if (ModuleOffset.SectionIndex == object::SectionedAddress::UndefSection)
     ModuleOffset.SectionIndex =
         getModuleSectionIndexForAddress(ModuleOffset.Address);
+  object::SectionedAddress DWARFOffset = getWasmCodeDwarfOffset(ModuleOffset);
   DIInliningInfo InlinedContext = DebugInfoContext->getInliningInfoForAddress(
-      ModuleOffset, LineInfoSpecifier);
+      DWARFOffset, LineInfoSpecifier);
 
   // Make sure there is at least one frame in context.
   bool EmptyFrameAdded = false;
@@ -358,7 +377,8 @@ std::vector<DILocal> SymbolizableObjectFile::symbolizeFrame(
   if (ModuleOffset.SectionIndex == object::SectionedAddress::UndefSection)
     ModuleOffset.SectionIndex =
         getModuleSectionIndexForAddress(ModuleOffset.Address);
-  return DebugInfoContext->getLocalsForAddress(ModuleOffset);
+  object::SectionedAddress DWARFOffset = getWasmCodeDwarfOffset(ModuleOffset);
+  return DebugInfoContext->getLocalsForAddress(DWARFOffset);
 }
 
 std::vector<object::SectionedAddress>
diff --git a/llvm/test/tools/llvm-objdump/lit.local.cfg b/llvm/test/tools/llvm-objdump/lit.local.cfg
new file mode 100644
index 0000000000000..21771693b720e
--- /dev/null
+++ b/llvm/test/tools/llvm-objdump/lit.local.cfg
@@ -0,0 +1,4 @@
+from lit.llvm import llvm_config
+
+if llvm_config.use_lld(required=False):
+    config.available_features.add("lld")
diff --git a/llvm/test/tools/llvm-objdump/wasm/line-numbers.s b/llvm/test/tools/llvm-objdump/wasm/line-numbers.s
new file mode 100644
index 0000000000000..79c533ddb69dc
--- /dev/null
+++ b/llvm/test/tools/llvm-objdump/wasm/line-numbers.s
@@ -0,0 +1,74 @@
+# REQUIRES: webassembly-registered-target, lld
+# RUN: llvm-mc -triple=wasm32-unknown-unknown -filetype=obj %s -o %t.o -g
+# RUN: wasm-ld %t.o -o %t.wasm --no-entry --export=foo --export=bar
+# RUN: llvm-objdump -d --line-numbers %t.o | FileCheck --check-prefix=OBJ %s
+# RUN: llvm-objdump -d --line-numbers %t.wasm | FileCheck --check-prefix=LINKED %s
+
+# This test mirrors test/tools/llvm-symbolizer/wasm-basic.s and tests that line
+# numbers are correctly printed from DWARF information.
+
+.globl foo
+foo:
+    .functype foo () -> ()
+    nop
+    return
+    end_function
+
+.globl bar
+bar:
+    .functype bar (i32) -> (i32)
+    local.get 0
+    nop
+    return
+    end_function
+
+# OBJ:      <foo>:
+# OBJ-EMPTY:
+# OBJ-NEXT: ; foo():
+# OBJ-NEXT: ; {{.*}}line-numbers.s:10
+# OBJ-NEXT:        3: 01            nop
+# OBJ-NEXT: ; {{.*}}line-numbers.s:11
+# OBJ-NEXT:        4: 0f            return
+# OBJ-NEXT: ; {{.*}}line-numbers.s:12
+# OBJ-NEXT:        5: 0b            end
+
+# OBJ:      <bar>:
+# OBJ-EMPTY:
+# OBJ-NEXT: ; bar():
+# OBJ-NEXT: ; {{.*}}line-numbers.s:17
+# OBJ-NEXT:        8: 20 00         local.get 0
+# OBJ-NEXT: ; {{.*}}line-numbers.s:18
+# OBJ-NEXT:        a: 01            nop
+# OBJ-NEXT: ; {{.*}}line-numbers.s:19
+# OBJ-NEXT:        b: 0f            return
+# OBJ-NEXT: ; {{.*}}line-numbers.s:20
+# OBJ-NEXT:        c: 0b            end
+
+
+# Note: The linked version of this test currently bakes in the absolute binary
+# offsets in the code section (for the beginning of each function) because they
+# are relevant. We want to make sure that file offsets are used rather than
+# section offsets. But this does mean that if the output of the linker for the
+# sections that come before Code changes, then this test will have to be updated.
+
+# LINKED:      <foo>:
+# LINKED-EMPTY:
+# LINKED-NEXT: ; foo():
+# LINKED-NEXT: ; {{.*}}line-numbers.s:10
+# LINKED-NEXT:        44: 01            nop
+# LINKED-NEXT: ; {{.*}}line-numbers.s:11
+# LINKED-NEXT:        {{.*}}: 0f            return
+# LINKED-NEXT: ; {{.*}}line-numbers.s:12
+# LINKED-NEXT:        {{.*}}: 0b            end
+
+# LINKED:      <bar>:
+# LINKED-EMPTY:
+# LINKED-NEXT: ; bar():
+# LINKED-NEXT: ; {{.*}}line-numbers.s:17
+# LINKED-NEXT:        49: 20 00         local.get 0
+# LINKED-NEXT: ; {{.*}}line-numbers.s:18
+# LINKED-NEXT:        {{.*}}: 01            nop
+# LINKED-NEXT: ; {{.*}}line-numbers.s:19
+# LINKED-NEXT:        {{.*}}: 0f            return
+# LINKED-NEXT: ; {{.*}}line-numbers.s:20
+# LINKED-NEXT:        {{.*}}: 0b            end
diff --git a/llvm/test/tools/llvm-symbolizer/lit.local.cfg b/llvm/test/tools/llvm-symbolizer/lit.local.cfg
new file mode 100644
index 0000000000000..21771693b720e
--- /dev/null
+++ b/llvm/test/tools/llvm-symbolizer/lit.local.cfg
@@ -0,0 +1,4 @@
+from lit.llvm import llvm_config
+
+if llvm_config.use_lld(required=False):
+    config.available_features.add("lld")
diff --git a/llvm/test/tools/llvm-symbolizer/wasm-basic.s b/llvm/test/tools/llvm-symbolizer/wasm-basic.s
index 1f425e5259316..864961093bf24 100644
--- a/llvm/test/tools/llvm-symbolizer/wasm-basic.s
+++ b/llvm/test/tools/llvm-symbolizer/wasm-basic.s
@@ -1,4 +1,4 @@
-# REQUIRES: webassembly-registered-target
+# REQUIRES: webassembly-registered-target, lld
 # RUN: llvm-mc -triple=wasm32-unknown-unknown -filetype=obj %s -o %t.o -g
 # RUN: llvm-symbolizer --basenames --output-style=GNU -e %t.o 1 2 3 4 5 6 7 8 9 10 11 12 13 | FileCheck %s
 

@dschuff dschuff changed the title symblize offset SymbolizableOjbectFile: Convert Wasm file offset to section offset for DWARF Apr 8, 2026
@dschuff dschuff requested a review from aheejin April 8, 2026 22:49
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 8, 2026

🐧 Linux x64 Test Results

  • 193312 tests passed
  • 5025 tests skipped

✅ The build succeeded and all tests passed.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 8, 2026

🪟 Windows x64 Test Results

  • 133249 tests passed
  • 3078 tests skipped

✅ The build succeeded and all tests passed.

Comment thread llvm/test/tools/llvm-symbolizer/wasm-basic.s Outdated
Comment thread llvm/lib/DebugInfo/Symbolize/SymbolizableObjectFile.cpp Outdated
Comment thread llvm/lib/DebugInfo/Symbolize/SymbolizableObjectFile.cpp
ModuleOffset.SectionIndex =
getModuleSectionIndexForAddress(ModuleOffset.Address);
DILineInfo LineInfo;
object::SectionedAddress DWARFOffset = getWasmCodeDwarfOffset(ModuleOffset);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this getWasmCodeDwarfOffset is called in the common code, how about renaming this to something like getWasmCodeDwarfOffsetIfWasm or something? You can look into the function and see it just returns the offset as is if it's not wasm, but it'd be clearer if you don't need to look into it

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, or maybe just getDwarfOffset, or getOffsetForDwarf? Then we can say that for non-wasm the dwarf offset is just the module offset and for wasm it's special.

Copy link
Copy Markdown
Member

@aheejin aheejin Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not strongly opinionated but I prefer something with Wasm than general names like getDwarfOffset because the purpose of that function is for Wasm, and I think it will be less confusing to (non-Wasm) readers. The reason I proposed the change was just to make it extra clear to non-Wasm readers. If you prefer the original name, I'm fine with that.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe convertDwarfOffsetForWasm?

@dschuff dschuff enabled auto-merge (squash) April 9, 2026 19:47
@dschuff dschuff merged commit 3106002 into llvm:main Apr 9, 2026
9 of 10 checks passed
dschuff added a commit to dschuff/emscripten that referenced this pull request Apr 9, 2026
dschuff added a commit to dschuff/llvm-project that referenced this pull request Apr 10, 2026
…or Wasm

A fix after llvm#191068: The offset adjustment should not be applied for object
files, because section-relative addresses are always used (and all the
SectionIndex values are 0). And, for linked files, invalidate any address that
is outside the text section to prevent it from being matched in DWARF as a
section-relative address.
Add test cases that cover the distinction (e.g. address 3 should match in an
object file but not in a linked file).
Also, fix the comments in the test to match the updated line numbers.
dschuff added a commit to emscripten-core/emscripten that referenced this pull request Apr 10, 2026
dschuff added a commit that referenced this pull request Apr 10, 2026
…e code section (#191329)

A fix after #191068: For linked files, invalidate any address that
is outside the text section to prevent it from being matched in DWARF as
a section-relative address.

Add test cases that cover the distinction (e.g. address 3 should match
in an object file but not in a linked file).
Also, fix the comments in the test to match the updated line numbers.
@zeroomega
Copy link
Copy Markdown
Contributor

For reasons I don't know, this patch somehow broke tools/llvm-driver/passthrough-lld.test test on our clang-linux-x64 builder.

******************** TEST 'LLVM :: tools/llvm-driver/passthrough-lld.test' FAILED ********************
Exit Code: 2

Command Output (stdout):
--
# RUN: at line 5
/b/s/w/ir/x/w/llvm_build/bin/llvm lld -flavor /b/s/w/ir/x/w/llvm_build/bin/ld.lld --help | /b/s/w/ir/x/w/llvm_build/bin/FileCheck /b/s/w/ir/x/w/llvm-llvm-project/llvm/test/tools/llvm-driver/passthrough-lld.test
# executed command: /b/s/w/ir/x/w/llvm_build/bin/llvm lld -flavor /b/s/w/ir/x/w/llvm_build/bin/ld.lld --help
# .---command stderr------------
# | Unknown flavor: /b/s/w/ir/x/w/llvm_build/bin/ld.lld
# `-----------------------------

First failed task: http://cr-buildbucket.appspot.com/build/8684967812711940465

It might have triggered a bug in the lit config.

@@ -0,0 +1,74 @@
# REQUIRES: webassembly-registered-target, lld
# RUN: llvm-mc -triple=wasm32-unknown-unknown -filetype=obj %s -o %t.o -g
# RUN: wasm-ld %t.o -o %t.wasm --no-entry --export=foo --export=bar
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

llvm/ is the base layer; lld/ builds on top of it. Tests of LLVM should not require a higher-level tool for validation.
In this case cross-project-tests/tools/llvm-objdump/ should be used

@jh7370

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What @MaskRay said.

If testing on a fully-linked generated-at-runtime object is necessary, testing should be in cross-project-tests, because llvm-objdump testing must not rely on elemets outside the core LLVM project. Alternatively, you could craft the input using some combination of assembly/llvm-objcopy/yaml2obj, assuming it's possible to do so.

For clarity, I'm only talking about the new test case. The existing test case that doesn't use lld can remain where it is, with the lld requirement removed.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can convert this to an obj2yaml test, or just use a precompiled binary. This seemed cleaner than using a precompiled binary since it came from the same source as the existing test, but it doesn't actually have to be generated at runtime.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

llvm-objdump gives wrong line numbers for WebAssembly

6 participants