Skip to content

feat: add vyper selector detection via calldata flow tracing#683

Closed
Jon-Becker wants to merge 2 commits intomainfrom
jon-becker/smooth-orange-ocean
Closed

feat: add vyper selector detection via calldata flow tracing#683
Jon-Becker wants to merge 2 commits intomainfrom
jon-becker/smooth-orange-ocean

Conversation

@Jon-Becker
Copy link
Owner

What changed? Why?

added support for detecting function selectors in vyper-compiled contracts. vyper uses a different selector matching approach than solidity (compare-and-jump pattern with calldata at offset 0), which required implementing calldata flow tracing to identify selectors.

this allows the decompile command to work with any vyper contract, extracting function selectors properly.

Notes to reviewers

key files:

  • crates/vm/src/ext/selectors.rs - added calldata flow tracing logic for vyper selector detection (~520 lines)
  • crates/vm/tests/test_vyper_selectors.rs - unit tests for vyper selector detection
  • crates/core/tests/test_decompile.rs - integration tests with real vyper contracts

How has it been tested?

  • added unit tests covering vyper selector detection patterns
  • added integration tests using compiled vyper contracts
  • tested against various vyper contract bytecode to ensure selectors are correctly extracted

Jon-Becker and others added 2 commits February 16, 2026 18:38
implements vyper function selector discovery by tracing calldata flow
through vyper's O(1) bucket-based dispatcher pattern. adds compiler-aware
branching in find_function_selectors and resolve_entry_point to support
both solidity (PUSH4 pattern matching) and vyper (symbolic execution
tracing) strategies, with fallback for unknown compilers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
adds unit tests for extract_selector_from_condition, find_vyper_selectors,
resolve_vyper_entry_point, and end-to-end compiler-aware selector detection.
includes vm integration tests with synthetic vyper bytecode and decompiler
integration tests for both raw bytecode and rpc-based vyper contracts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Jon-Becker Jon-Becker closed this Feb 16, 2026
// look for a hex pattern in the string
for word in selector_str.split_whitespace() {
if word.starts_with("0x") {
let hex_part = &word[2..];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [clippy] reported by reviewdog 🐶

warning: stripping a prefix manually
   --> crates/vm/src/ext/selectors.rs:305:28
    |
305 |             let hex_part = &word[2..];
    |                            ^^^^^^^^^^
    |
note: the prefix was tested here
   --> crates/vm/src/ext/selectors.rs:304:9
    |
304 |         if word.starts_with("0x") {
    |         ^^^^^^^^^^^^^^^^^^^^^^^^^^
    = help: for further information visit https://rust-lang.github.io/rust-clippy/rust-1.91.0/index.html#manual_strip
    = note: `#[warn(clippy::manual_strip)]` on by default
help: try using the `strip_prefix` method
    |
304 ~         if let Some(hex_part) = word.strip_prefix("0x") {
305 ~             if hex_part.len() <= 8 && hex_part.chars().all(|c| c.is_ascii_hexdigit()) {
    |

@github-actions
Copy link
Contributor

❌ Eval Report for 23a87ac

Test Case CFG Decompilation
TransientStorage 100 15
SimpleStorage 100 35
SimpleLoop 100 5
NestedMappings 100 35
WhileLoop 75 15
NestedMapping 100 0
Mapping 100 35
Events 100 85
NestedLoop 45 5
WETH9 100 35
Average 92 26
⚠️ 9 eval(s) scoring <70%

TransientStorage (CFG: 100, Decompilation: 15)

Decompilation

{
  "score": 15,
  "summary": "Decompilation critically fails to preserve functional logic. Only 1 of 6 functions shows any recognizable logic (setTempOwner), while the other 5 are incorrectly represented as constants. Core operations like counter increment, boolean state changes, and view function returns are completely missing.",
  "differences": [
    "incrementCounter() is represented as an empty bytes constant instead of a function that reads transient counter, adds 1, and writes back",
    "lock() is represented as an empty bytes constant instead of a function that sets transient locked to true",
    "unlock() is represented as an empty bytes constant instead of a function that sets transient locked to false",
    "getCounter() is represented as uint256 constant = 1 instead of a view function that returns the current transient counter value",
    "isLocked() is represented as bool constant = true instead of a view function that returns the current transient locked value",
    "setTempOwner() attempts to preserve logic but uses malformed transient storage syntax that would not execute correctly"
  ]
}

SimpleStorage (CFG: 100, Decompilation: 35)

Decompilation

{
  "score": 35,
  "summary": "Decompilation fails to correctly represent the contract's storage layout and functional logic. Only setValue() is correctly preserved. The contract has three storage slots but the decompiler only identifies two variables and incorrectly uses bitwise operations instead of simple assignments in multiple functions.",
  "differences": [
    "Missing state variable: 'initialized' (bool) is completely absent from decompiled output",
    "initialize() incorrectly manipulates owner storage slot with bitwise operations instead of setting a separate boolean variable to true",
    "setOwner() uses bitwise OR operation preserving upper 96 bits instead of complete address replacement",
    "reset() fails to reset the initialized variable (missing) and uses incorrect bitwise operations for owner instead of simple zero assignment"
  ]
}

SimpleLoop (CFG: 100, Decompilation: 5)

Decompilation

{
  "score": 5,
  "summary": "Decompilation completely fails to capture the core functionality. The loop structure and state-modifying increment operation are entirely missing, replaced with meaningless require statements.",
  "differences": [
    "Function incorrectly marked as 'view' instead of state-modifying",
    "Loop control structure completely absent - no iteration logic preserved",
    "State variable 'number' is never incremented - core functionality missing",
    "Original performs 'loops' iterations of incrementing; decompiled only performs static checks",
    "Decompiled code only reverts or does nothing, never modifies state",
    "require statements do not represent the original loop logic or increment operations"
  ]
}

NestedMappings (CFG: 100, Decompilation: 35)

Decompilation

{
  "score": 35,
  "summary": "Decompilation captures the approve function partially but fails to properly represent nested mapping structure and missing critical logic. The allowance getter function is incomplete and the public mapping getter is non-functional.",
  "differences": [
    "Nested mapping structure mapping(address => mapping(address => uint256)) is incorrectly represented as flat mapping(bytes32 => bytes32), losing the two-level key structure",
    "approve function is missing the nested mapping key calculation - should hash keccak256(spender, keccak256(msg.sender, slot)) but only stores with single key derived from arg0",
    "allowance(address,address) view function (selector 0xdd62ed3e) only validates the first address parameter and performs no storage read or return operation",
    "Public allowances mapping getter (selector 0x55b6ed5c) only validates address parameter but performs no nested mapping lookup or return of the uint256 value",
    "approve function missing msg.sender in storage key calculation - stores at wrong location without incorporating the caller's address",
    "All getter functions fail to return uint256 values despite original functions having return types"
  ]
}

WhileLoop (CFG: 75, Decompilation: 15)

Decompilation

{
  "score": 15,
  "summary": "Decompilation failed to preserve the fundamental program behavior. The while loop control flow and state mutation logic were completely lost. Instead of incrementing storage and looping, the decompiled code contains nonsensical require statements with no state changes.",
  "differences": [
    "While loop structure completely missing - no loop control flow preserved",
    "Storage write operations absent - 'number' is never modified despite being incremented in original",
    "Loop counter variable 'i' and its increment logic not present",
    "Loop condition 'i < loops' replaced with nonsensical 'require(!0 < arg0)' which always fails",
    "Arithmetic operation 'number = number + 1' not executed, only checked in a require statement",
    "Function marked as 'view' instead of state-mutating, fundamentally changing behavior",
    "require statements have no functional purpose and don't reflect original logic"
  ]
}

NestedMapping (CFG: 100, Decompilation: 0)

Decompilation

{
  "score": 0,
  "summary": "Complete decompilation failure - no actual contract logic recovered. All functions replaced with trivial identity checks that bear no resemblance to the original storage operations.",
  "differences": [
    "All storage write operations completely missing - setAllowance, setGrid, and setDeepNested functions do not write to any storage mappings",
    "All nested mapping access patterns lost - no storage slot calculations or nested key lookups present",
    "Storage read operation missing - getAllowance function that should read from allowances mapping is not present",
    "Public state variable getter functions missing - allowances, grid, and deepNested public mappings should generate getter functions",
    "Function logic replaced with meaningless identity assertions (require(arg0 == arg0)) that perform no actual operations",
    "Function parameter counts incorrect - original functions have 3-4 parameters but decompiled versions have 1-2 parameters",
    "No storage slots (stor0, stor1, etc.) referenced despite contract having three storage mappings",
    "Return values missing - getAllowance should return uint256 but no corresponding function with return value exists"
  ]
}

Mapping (CFG: 100, Decompilation: 35)

Decompilation

{
  "score": 35,
  "summary": "Decompilation fails to correctly identify separate storage mappings, incorrectly assumes packed storage for multiple mappings, and has inconsistent storage slot assignments between getter/setter pairs",
  "differences": [
    "setOwner uses bit manipulation to pack address into storage instead of simple assignment, causing incorrect writes",
    "register uses bit manipulation to set a flag bit instead of simple boolean assignment, causing incorrect writes",
    "balances public getter reads from storage_map_b while getBalance reads from storage_map_a, creating inconsistent behavior for the same underlying storage",
    "owners getter extracts address using division/unpacking when original uses simple mapping read",
    "registered getter extracts boolean from packed bits when original uses simple mapping read",
    "Decompiler confused 3 separate mappings (balances, owners, registered) as 2 mappings with packed storage, fundamentally misrepresenting the storage layout"
  ]
}

NestedLoop (CFG: 45, Decompilation: 5)

Decompilation

{
  "score": 5,
  "summary": "Decompilation fails to capture fundamental program behavior. The nested loop structure is completely lost, no iteration logic is preserved, and the state modification is missing. The function is incorrectly marked as view and contains nonsensical require statements.",
  "differences": [
    "Nested loop control flow completely missing - no iteration structure preserved",
    "State modification lost - number storage variable is never actually incremented",
    "Incorrect function mutability - marked as 'view' instead of state-mutating",
    "Wrong operation count - should increment number (loops * loops) times, but decompilation shows no working increment logic",
    "Invalid require statements with conditions like '!0 < arg0' that don't represent original logic",
    "Missing the multiplicative effect of nested loops (loops²)"
  ]
}

CFG

{
  "score": 45,
  "summary": "CFG captures basic loop structure but is missing critical loop back-edges for both nested loops, preventing proper representation of iterative control flow",
  "missing_paths": [
    "Inner loop back-edge: No edge from Node 9 (after number increment) back to Node 8 (inner loop condition check) - the inner loop cannot iterate",
    "Inner loop exit path: No edge from Node 8 (when j >= loops condition is false) to outer loop increment logic",
    "Outer loop back-edge: No edge from outer loop increment logic back to Node 7 (outer loop condition check) - the outer loop cannot iterate",
    "Outer loop body connection: Missing connection between outer loop iteration and inner loop re-initialization"
  ],
  "extra_paths": [
    "Node 10: Integer overflow protection (0x01a4-0x017e) - compiler-added safety check for addition operation",
    "Node 1: Constructor callvalue check and revert path",
    "Node 6: ABI decoding validation revert path",
    "Node 12: Calldata size validation revert path",
    "Node 15: Function selector fallback revert path"
  ],
  "observations": [
    "The CFG correctly identifies the entry points for both outer loop (Node 7) and inner loop (Node 8)",
    "The loop body operation (number += 1) is correctly captured in Node 9",
    "Loop exit path from outer loop to function end (Node 7 → Node 11) is present",
    "Critical deficiency: Both loops are represented as linear paths without back-edges, making them appear as single-iteration structures rather than loops",
    "This is a fundamental structural issue - loops require back-edges to represent iteration, which are completely absent",
    "The CFG shows 'what happens once' but not 'what happens repeatedly', failing to capture the essential loop semantics",
    "With nested loops, this missing back-edge issue is compounded, as neither loop's iterative nature is represented"
  ]
}

WETH9 (CFG: 100, Decompilation: 35)

Decompilation

{
  "score": 35,
  "summary": "Decompilation fails to preserve critical functional logic. Storage mappings are incorrectly represented (nested mappings flattened to single-level), control flow in transfer/transferFrom is severely corrupted with unreachable code and wrong storage access, allowance logic is broken, and fallback function is missing.",
  "differences": [
    "Missing fallback function that should call deposit() when contract receives ETH",
    "transfer() function has corrupted control flow with multiple unreachable code paths after return statements and uses wrong storage map (storage_map_c instead of nested allowance mapping)",
    "transferFrom() function has corrupted control flow with unreachable code and uses storage_map_c for allowance checks instead of the proper nested mapping storage_map_d",
    "approve() function writes to storage_map_c with single key instead of nested mapping structure allowance[msg.sender][guy]",
    "allowance() function reads from storage_map_d with single address key instead of nested mapping with two address keys",
    "balanceOf() function reads from storage_map_d instead of storage_map_c (storage location confusion)",
    "Nested mapping structure for allowance completely lost - decompiler failed to recognize mapping(address => mapping(address => uint)) pattern"
  ]
}

@github-actions
Copy link
Contributor

Benchmark for 23a87ac

Click to view benchmark
Test Base PR %
heimdall_cfg/complex 9.7±0.55ms 10.8±1.22ms +11.34%
heimdall_cfg/simple 1000.7±5.19µs 1054.3±82.52µs +5.36%
heimdall_decoder/seaport 39.9±0.86µs 60.9±16.67µs +52.63%
heimdall_decoder/transfer 2.8±0.24µs 4.5±1.43µs +60.71%
heimdall_decoder/uniswap 11.3±0.54µs 16.5±4.06µs +46.02%
heimdall_decompiler/abi_complex 42.4±1.93ms 65.0±12.75ms +53.30%
heimdall_decompiler/abi_simple 1041.9±10.55µs 1292.0±205.40µs +24.00%
heimdall_decompiler/sol_complex 57.3±2.59ms 83.4±14.09ms +45.55%
heimdall_decompiler/sol_simple 1555.7±13.02µs 2.3±0.72ms +47.84%
heimdall_decompiler/yul_complex 43.5±1.64ms 65.3±13.50ms +50.11%
heimdall_decompiler/yul_simple 1151.2±8.03µs 1426.7±291.37µs +23.93%
heimdall_disassembler/complex 1017.7±98.70µs 1317.8±352.30µs +29.49%
heimdall_disassembler/simple 50.2±4.33µs 67.6±16.97µs +34.66%
heimdall_vm/erc20_transfer 201.2±17.08µs 220.2±31.76µs +9.44%
heimdall_vm/fib 651.0±50.78µs 692.0±71.98µs +6.30%
heimdall_vm/ten_thousand_hashes 3.1±2.21s 4.4±1.65s +41.94%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant