SymbolizableOjbectFile: Convert Wasm file offset to section offset for DWARF#191068
SymbolizableOjbectFile: Convert Wasm file offset to section offset for DWARF#191068
Conversation
…r DWARF Wasm's object and linking format lacks virtual addresses like ELF et al. As a result, linked files generally use file offsets as "addresses", whereas objects and DWARF sections use code section offsets. This has led to incorrect interpretation of addresses in llvm-objdump and llvm-symbolizer for linked files. This change to SerializableObjectFile checks the input ModuleOffset, and if it falls within a wasm code section, adjusts it to a section offset before querying the DwarfContext. (For object files, Sec.getAddress() is 0 so it works for object files too). It extends the existing DWARF test for llvm-symbolizer to include a linked file, and also adds an equivalent for objdump. Fixes llvm#129523
|
@llvm/pr-subscribers-llvm-binary-utilities @llvm/pr-subscribers-debuginfo Author: Derek Schuff (dschuff) Changes
Full diff: https://github.com/llvm/llvm-project/pull/191068.diff 6 Files Affected:
diff --git a/llvm/include/llvm/DebugInfo/Symbolize/SymbolizableObjectFile.h b/llvm/include/llvm/DebugInfo/Symbolize/SymbolizableObjectFile.h
index 5ef513f570b03..c31c3ac353fb0 100644
--- a/llvm/include/llvm/DebugInfo/Symbolize/SymbolizableObjectFile.h
+++ b/llvm/include/llvm/DebugInfo/Symbolize/SymbolizableObjectFile.h
@@ -74,6 +74,13 @@ class SymbolizableObjectFile : public SymbolizableModule {
std::unique_ptr<DIContext> DebugInfoContext;
bool UntagAddresses;
+ /// WebAssembly linked files use file offsets for code symbol addresses, but
+ /// DWARF uses section-relative offsets. This helper method converts a module
+ /// file offset into its corresponding section-relative offset, but only if
+ /// the address falls within a Wasm code section.
+ object::SectionedAddress
+ getWasmCodeDwarfOffset(object::SectionedAddress ModuleOffset) const;
+
struct SymbolDesc {
uint64_t Addr;
// If size is 0, assume that symbol occupies the whole memory range up to
diff --git a/llvm/lib/DebugInfo/Symbolize/SymbolizableObjectFile.cpp b/llvm/lib/DebugInfo/Symbolize/SymbolizableObjectFile.cpp
index 29fd4d9fda7ad..39c3c1c96f83d 100644
--- a/llvm/lib/DebugInfo/Symbolize/SymbolizableObjectFile.cpp
+++ b/llvm/lib/DebugInfo/Symbolize/SymbolizableObjectFile.cpp
@@ -269,6 +269,23 @@ bool SymbolizableObjectFile::shouldOverrideWithSymbolTable(
isa<DWARFContext>(DebugInfoContext.get());
}
+object::SectionedAddress SymbolizableObjectFile::getWasmCodeDwarfOffset(
+ object::SectionedAddress ModuleOffset) const {
+ if (!Module->isWasm())
+ return ModuleOffset;
+ if (ModuleOffset.SectionIndex == object::SectionedAddress::UndefSection)
+ ModuleOffset.SectionIndex =
+ getModuleSectionIndexForAddress(ModuleOffset.Address);
+ for (const SectionRef &Sec : Module->sections()) {
+ if (Sec.getIndex() == ModuleOffset.SectionIndex) {
+ if (Sec.isText())
+ ModuleOffset.Address -= Sec.getAddress();
+ break;
+ }
+ }
+ return ModuleOffset;
+}
+
DILineInfo
SymbolizableObjectFile::symbolizeCode(object::SectionedAddress ModuleOffset,
DILineInfoSpecifier LineInfoSpecifier,
@@ -277,8 +294,9 @@ SymbolizableObjectFile::symbolizeCode(object::SectionedAddress ModuleOffset,
ModuleOffset.SectionIndex =
getModuleSectionIndexForAddress(ModuleOffset.Address);
DILineInfo LineInfo;
+ object::SectionedAddress DWARFOffset = getWasmCodeDwarfOffset(ModuleOffset);
std::optional<DILineInfo> DBGLineInfo =
- DebugInfoContext->getLineInfoForAddress(ModuleOffset, LineInfoSpecifier);
+ DebugInfoContext->getLineInfoForAddress(DWARFOffset, LineInfoSpecifier);
if (DBGLineInfo)
LineInfo = *DBGLineInfo;
@@ -305,8 +323,9 @@ DIInliningInfo SymbolizableObjectFile::symbolizeInlinedCode(
if (ModuleOffset.SectionIndex == object::SectionedAddress::UndefSection)
ModuleOffset.SectionIndex =
getModuleSectionIndexForAddress(ModuleOffset.Address);
+ object::SectionedAddress DWARFOffset = getWasmCodeDwarfOffset(ModuleOffset);
DIInliningInfo InlinedContext = DebugInfoContext->getInliningInfoForAddress(
- ModuleOffset, LineInfoSpecifier);
+ DWARFOffset, LineInfoSpecifier);
// Make sure there is at least one frame in context.
bool EmptyFrameAdded = false;
@@ -358,7 +377,8 @@ std::vector<DILocal> SymbolizableObjectFile::symbolizeFrame(
if (ModuleOffset.SectionIndex == object::SectionedAddress::UndefSection)
ModuleOffset.SectionIndex =
getModuleSectionIndexForAddress(ModuleOffset.Address);
- return DebugInfoContext->getLocalsForAddress(ModuleOffset);
+ object::SectionedAddress DWARFOffset = getWasmCodeDwarfOffset(ModuleOffset);
+ return DebugInfoContext->getLocalsForAddress(DWARFOffset);
}
std::vector<object::SectionedAddress>
diff --git a/llvm/test/tools/llvm-objdump/lit.local.cfg b/llvm/test/tools/llvm-objdump/lit.local.cfg
new file mode 100644
index 0000000000000..21771693b720e
--- /dev/null
+++ b/llvm/test/tools/llvm-objdump/lit.local.cfg
@@ -0,0 +1,4 @@
+from lit.llvm import llvm_config
+
+if llvm_config.use_lld(required=False):
+ config.available_features.add("lld")
diff --git a/llvm/test/tools/llvm-objdump/wasm/line-numbers.s b/llvm/test/tools/llvm-objdump/wasm/line-numbers.s
new file mode 100644
index 0000000000000..79c533ddb69dc
--- /dev/null
+++ b/llvm/test/tools/llvm-objdump/wasm/line-numbers.s
@@ -0,0 +1,74 @@
+# REQUIRES: webassembly-registered-target, lld
+# RUN: llvm-mc -triple=wasm32-unknown-unknown -filetype=obj %s -o %t.o -g
+# RUN: wasm-ld %t.o -o %t.wasm --no-entry --export=foo --export=bar
+# RUN: llvm-objdump -d --line-numbers %t.o | FileCheck --check-prefix=OBJ %s
+# RUN: llvm-objdump -d --line-numbers %t.wasm | FileCheck --check-prefix=LINKED %s
+
+# This test mirrors test/tools/llvm-symbolizer/wasm-basic.s and tests that line
+# numbers are correctly printed from DWARF information.
+
+.globl foo
+foo:
+ .functype foo () -> ()
+ nop
+ return
+ end_function
+
+.globl bar
+bar:
+ .functype bar (i32) -> (i32)
+ local.get 0
+ nop
+ return
+ end_function
+
+# OBJ: <foo>:
+# OBJ-EMPTY:
+# OBJ-NEXT: ; foo():
+# OBJ-NEXT: ; {{.*}}line-numbers.s:10
+# OBJ-NEXT: 3: 01 nop
+# OBJ-NEXT: ; {{.*}}line-numbers.s:11
+# OBJ-NEXT: 4: 0f return
+# OBJ-NEXT: ; {{.*}}line-numbers.s:12
+# OBJ-NEXT: 5: 0b end
+
+# OBJ: <bar>:
+# OBJ-EMPTY:
+# OBJ-NEXT: ; bar():
+# OBJ-NEXT: ; {{.*}}line-numbers.s:17
+# OBJ-NEXT: 8: 20 00 local.get 0
+# OBJ-NEXT: ; {{.*}}line-numbers.s:18
+# OBJ-NEXT: a: 01 nop
+# OBJ-NEXT: ; {{.*}}line-numbers.s:19
+# OBJ-NEXT: b: 0f return
+# OBJ-NEXT: ; {{.*}}line-numbers.s:20
+# OBJ-NEXT: c: 0b end
+
+
+# Note: The linked version of this test currently bakes in the absolute binary
+# offsets in the code section (for the beginning of each function) because they
+# are relevant. We want to make sure that file offsets are used rather than
+# section offsets. But this does mean that if the output of the linker for the
+# sections that come before Code changes, then this test will have to be updated.
+
+# LINKED: <foo>:
+# LINKED-EMPTY:
+# LINKED-NEXT: ; foo():
+# LINKED-NEXT: ; {{.*}}line-numbers.s:10
+# LINKED-NEXT: 44: 01 nop
+# LINKED-NEXT: ; {{.*}}line-numbers.s:11
+# LINKED-NEXT: {{.*}}: 0f return
+# LINKED-NEXT: ; {{.*}}line-numbers.s:12
+# LINKED-NEXT: {{.*}}: 0b end
+
+# LINKED: <bar>:
+# LINKED-EMPTY:
+# LINKED-NEXT: ; bar():
+# LINKED-NEXT: ; {{.*}}line-numbers.s:17
+# LINKED-NEXT: 49: 20 00 local.get 0
+# LINKED-NEXT: ; {{.*}}line-numbers.s:18
+# LINKED-NEXT: {{.*}}: 01 nop
+# LINKED-NEXT: ; {{.*}}line-numbers.s:19
+# LINKED-NEXT: {{.*}}: 0f return
+# LINKED-NEXT: ; {{.*}}line-numbers.s:20
+# LINKED-NEXT: {{.*}}: 0b end
diff --git a/llvm/test/tools/llvm-symbolizer/lit.local.cfg b/llvm/test/tools/llvm-symbolizer/lit.local.cfg
new file mode 100644
index 0000000000000..21771693b720e
--- /dev/null
+++ b/llvm/test/tools/llvm-symbolizer/lit.local.cfg
@@ -0,0 +1,4 @@
+from lit.llvm import llvm_config
+
+if llvm_config.use_lld(required=False):
+ config.available_features.add("lld")
diff --git a/llvm/test/tools/llvm-symbolizer/wasm-basic.s b/llvm/test/tools/llvm-symbolizer/wasm-basic.s
index 1f425e5259316..864961093bf24 100644
--- a/llvm/test/tools/llvm-symbolizer/wasm-basic.s
+++ b/llvm/test/tools/llvm-symbolizer/wasm-basic.s
@@ -1,4 +1,4 @@
-# REQUIRES: webassembly-registered-target
+# REQUIRES: webassembly-registered-target, lld
# RUN: llvm-mc -triple=wasm32-unknown-unknown -filetype=obj %s -o %t.o -g
# RUN: llvm-symbolizer --basenames --output-style=GNU -e %t.o 1 2 3 4 5 6 7 8 9 10 11 12 13 | FileCheck %s
|
🐧 Linux x64 Test Results
✅ The build succeeded and all tests passed. |
🪟 Windows x64 Test Results
✅ The build succeeded and all tests passed. |
| ModuleOffset.SectionIndex = | ||
| getModuleSectionIndexForAddress(ModuleOffset.Address); | ||
| DILineInfo LineInfo; | ||
| object::SectionedAddress DWARFOffset = getWasmCodeDwarfOffset(ModuleOffset); |
There was a problem hiding this comment.
Given that this getWasmCodeDwarfOffset is called in the common code, how about renaming this to something like getWasmCodeDwarfOffsetIfWasm or something? You can look into the function and see it just returns the offset as is if it's not wasm, but it'd be clearer if you don't need to look into it
There was a problem hiding this comment.
Hm, or maybe just getDwarfOffset, or getOffsetForDwarf? Then we can say that for non-wasm the dwarf offset is just the module offset and for wasm it's special.
There was a problem hiding this comment.
I'm not strongly opinionated but I prefer something with Wasm than general names like getDwarfOffset because the purpose of that function is for Wasm, and I think it will be less confusing to (non-Wasm) readers. The reason I proposed the change was just to make it extra clear to non-Wasm readers. If you prefer the original name, I'm fine with that.
There was a problem hiding this comment.
maybe convertDwarfOffsetForWasm?
After llvm/llvm-project#191068 it is no longer necessary (or correct).
…or Wasm A fix after llvm#191068: The offset adjustment should not be applied for object files, because section-relative addresses are always used (and all the SectionIndex values are 0). And, for linked files, invalidate any address that is outside the text section to prevent it from being matched in DWARF as a section-relative address. Add test cases that cover the distinction (e.g. address 3 should match in an object file but not in a linked file). Also, fix the comments in the test to match the updated line numbers.
After llvm/llvm-project#191068 it is no longer necessary (or correct).
…e code section (#191329) A fix after #191068: For linked files, invalidate any address that is outside the text section to prevent it from being matched in DWARF as a section-relative address. Add test cases that cover the distinction (e.g. address 3 should match in an object file but not in a linked file). Also, fix the comments in the test to match the updated line numbers.
|
For reasons I don't know, this patch somehow broke First failed task: http://cr-buildbucket.appspot.com/build/8684967812711940465 It might have triggered a bug in the lit config. |
| @@ -0,0 +1,74 @@ | |||
| # REQUIRES: webassembly-registered-target, lld | |||
| # RUN: llvm-mc -triple=wasm32-unknown-unknown -filetype=obj %s -o %t.o -g | |||
| # RUN: wasm-ld %t.o -o %t.wasm --no-entry --export=foo --export=bar | |||
There was a problem hiding this comment.
llvm/ is the base layer; lld/ builds on top of it. Tests of LLVM should not require a higher-level tool for validation.
In this case cross-project-tests/tools/llvm-objdump/ should be used
There was a problem hiding this comment.
What @MaskRay said.
If testing on a fully-linked generated-at-runtime object is necessary, testing should be in cross-project-tests, because llvm-objdump testing must not rely on elemets outside the core LLVM project. Alternatively, you could craft the input using some combination of assembly/llvm-objcopy/yaml2obj, assuming it's possible to do so.
For clarity, I'm only talking about the new test case. The existing test case that doesn't use lld can remain where it is, with the lld requirement removed.
There was a problem hiding this comment.
I can convert this to an obj2yaml test, or just use a precompiled binary. This seemed cleaner than using a precompiled binary since it came from the same source as the existing test, but it doesn't actually have to be generated at runtime.
Wasm's object and linking format lacks virtual addresses like ELF et al.
As a result, linked files generally use file offsets as "addresses", whereas
objects and DWARF sections use code section offsets.
This has led to incorrect interpretation of addresses in llvm-objdump
and llvm-symbolizer for linked files.
This change to SerializableObjectFile checks the input ModuleOffset, and
if it falls within a wasm code section, adjusts it to a section offset
before querying the DwarfContext. (For object files, Sec.getAddress() is
0 so it works for object files too). It extends the existing DWARF test
for llvm-symbolizer to include a linked file, and also adds an equivalent
for objdump.
Fixes #129523