Skip to content

feat: add vyper selector detection support#685

Open
Jon-Becker wants to merge 2 commits intomainfrom
jon-becker/small-pink-light
Open

feat: add vyper selector detection support#685
Jon-Becker wants to merge 2 commits intomainfrom
jon-becker/small-pink-light

Conversation

@Jon-Becker
Copy link
Copy Markdown
Owner

What changed? Why?

added support for detecting function selectors from vyper-compiled contracts. vyper uses different dispatch patterns than solidity:

  • solidity: PUSH4 <selector>EQJUMPI (jump-if-equal)
  • vyper sparse: PUSH4 <selector>XOR/SUBJUMPI (skip-if-not-equal)
  • vyper dense: bucket selection via MOD/AND, then ISZERO(EQ)JUMPI within buckets

the decompiler now recognizes all three patterns when resolving entry points.

key changes:

  • updated resolve_entry_point() in selectors.rs to detect vyper's XOR, SUB, and ISZERO+EQ dispatch patterns
  • added inline documentation explaining the different compiler dispatch strategies
  • created test data module with 5 verified mainnet vyper contracts (curve pools, yearn vault)
  • added integration tests validating selector detection on live contracts

Notes to reviewers

key files:

  • crates/vm/src/ext/selectors.rs - core logic for vyper dispatch pattern detection
  • crates/core/tests/vyper_contracts.rs - test data for 5 mainnet contracts with known selectors
  • crates/core/tests/test_decompile.rs - integration tests for each contract

contracts tested:

  1. curve 3pool (0xbEbc...F1C7) - vyper 0.2.x, int128 indices
  2. curve steth/eth (0xDC24...0022) - vyper 0.2.8, payable functions
  3. curve susd (0xA540...BfD) - early vyper version
  4. yearn yvdai v2 (0x19D3...5001) - vyper 0.2.x, vault contract
  5. curve tricrypto2 (0xD51a...AE46) - vyper 0.2.12, uint256 indices

How has it been tested?

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 16, 2026

✅ Coverage Report for c910b7e

Metric Value
Base branch 66.04%
PR branch 66.06%
Diff +0.01%

@Jon-Becker Jon-Becker marked this pull request as ready for review February 16, 2026 23:46
@Jon-Becker Jon-Becker force-pushed the jon-becker/small-pink-light branch from 026c34a to b587a2e Compare February 17, 2026 14:14
Vyper 0.2.x contracts extract the function selector via
CALLDATALOAD(0) → MSTORE(0x1c) → MLOAD(0) instead of using SHR.
This produces "memory[0]" in solidified output rather than "msg.data[0]",
causing all four selector dispatch pattern matchers to fail.

Introduces condition_references_selector() to match both "msg.data[0]"
(Solidity/newer Vyper) and "memory[0]" (Vyper 0.2.x) patterns.

Fixes decompilation of contracts like vKP3R (0x2FC52C61fB...dC1a2).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
jump_condition.contains("msg.data[0]") || jump_condition.contains("memory[0]")
}

pub fn resolve_entry_point(vm: &mut VM, selector: &str) -> u128 {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [clippy] reported by reviewdog 🐶

warning: missing documentation for a function
   --> crates/vm/src/ext/selectors.rs:192:1
    |
192 | pub fn resolve_entry_point(vm: &mut VM, selector: &str) -> u128 {
    | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = note: requested on the command line with `-W missing-docs`

@github-actions
Copy link
Copy Markdown
Contributor

❌ Eval Report for 0b972c3

Test Case CFG Decompilation
SimpleStorage 100 35
TransientStorage 100 25
SimpleLoop 65 15
NestedMappings 100 35
NestedLoop 65 0
NestedMapping 100 0
Mapping 100 35
WETH9 100 45
WhileLoop 40 15
Events 100 75
Average 87 28
⚠️ 9 eval(s) scoring <70%

SimpleStorage (CFG: 100, Decompilation: 35)

Decompilation

{
  "score": 35,
  "summary": "Decompilation fails to capture critical state variable (initialized boolean) and produces incorrect logic for initialize() and reset() functions. Bitwise operations in setOwner() and reset() do not correctly represent simple address assignments.",
  "differences": [
    "Missing 'initialized' boolean state variable - not declared in decompiled contract",
    "initialize() function incorrectly modifies owner variable with bitwise operations instead of setting initialized boolean to true",
    "reset() function does not properly reset owner to address(0) - uses incorrect bitwise operations",
    "reset() function cannot reset initialized to false since the variable is not tracked",
    "setOwner() uses unnecessary bitwise operations that may not correctly assign address value"
  ]
}

TransientStorage (CFG: 100, Decompilation: 25)

Decompilation

{
  "score": 25,
  "summary": "Decompilation fails to capture most function implementations. Only one function (setTempOwner) is partially decompiled with incorrect transient storage access. Five of six functions are completely missing. Transient storage operations are not properly represented.",
  "differences": [
    "incrementCounter function missing entirely - should read transient storage slot, increment by 1, and write back",
    "lock function missing entirely - should write true to transient storage for locked variable",
    "unlock function missing entirely - should write false to transient storage for locked variable",
    "getCounter function missing entirely - should return uint256 value from transient storage counter",
    "isLocked function missing entirely - should return bool value from transient storage locked",
    "setTempOwner uses incorrect storage access pattern - uses 'transient[0x01]' syntax with bitwise operations instead of proper TSTORE opcode representation",
    "Decompiled constants (unlock, getCounter, incrementCounter, isLocked, lock) appear to be misclassified as constant values rather than function implementations",
    "Transient storage semantics not preserved - original uses transient keyword for storage that resets after transaction, decompilation doesn't capture this behavior"
  ]
}

SimpleLoop (CFG: 65, Decompilation: 15)

Decompilation

{
  "score": 15,
  "summary": "Decompilation fails to capture the fundamental loop behavior and state modifications. The loop iteration and storage write operations are completely missing.",
  "differences": [
    "Missing loop control flow: Original has 'for' loop that iterates from 0 to loops-1, decompiled has no loop structure",
    "Missing state modification: Original increments 'number' storage variable on each iteration, decompiled only reads 'number' but never writes to it",
    "Incorrect function mutability: Original function mutates state (should be non-view), decompiled is marked as 'view' which prevents state changes",
    "Incorrect logic operations: Decompiled contains nonsensical requires like 'require(arg0 == arg0)' and 'require(!0 < arg0)' which don't match original logic",
    "Missing storage writes: Original performs 'loops' number of storage writes (number++), decompiled performs zero storage writes",
    "Incorrect termination condition: 'require(!0 < arg0)' evaluates to 'require(false)' when arg0 > 0, causing revert when original would execute loop"
  ]
}

CFG

{
  "score": 65,
  "summary": "CFG captures most loop control flow but is missing the critical back edge that represents loop iteration from loop body back to condition check",
  "missing_paths": [
    "Back edge from loop body (node 8) to loop condition check (node 7) - this is the fundamental loop iteration path that allows the loop to execute multiple times"
  ],
  "extra_paths": [
    "Overflow check for number++ increment (node 9, 0x0135-0x0161) - compiler-added safety check",
    "Function dispatcher and selector matching logic (nodes 3, 12)",
    "Parameter validation and decoding paths (nodes 4-7)",
    "Contract initialization checks (nodes 0-2)"
  ],
  "observations": [
    "Loop initialization, condition check, body, and exit are all present as nodes",
    "The increment operation with overflow protection is properly captured",
    "The missing back edge is a significant structural omission - without it, the CFG suggests the loop body executes at most once",
    "All other fundamental control flow elements (function entry/exit, conditional branching to/from loop) are correctly represented",
    "Function selector dispatch for both loop() and number() getter are present"
  ]
}

NestedMappings (CFG: 100, Decompilation: 35)

Decompilation

{
  "score": 35,
  "summary": "Decompilation fails to capture nested mapping structure and loses critical functionality. The approve function's storage operation is incorrect, and the allowance function logic is completely missing.",
  "differences": [
    "Nested mapping structure not preserved: 'mapping(address => mapping(address => uint256))' decompiled as flat 'mapping(bytes32 => bytes32)' losing the two-key access pattern",
    "approve function storage operation incorrect: uses single key 'var_a' derived from arg0 (spender) instead of computing nested mapping slot using both msg.sender and arg0",
    "allowance function (selector 0xdd62ed3e) returns nothing instead of reading and returning allowances[owner][spender]",
    "Public mapping getter 'allowances' function (selector 0x55b6ed5c) returns nothing instead of reading nested mapping value",
    "approve function missing proper nested mapping slot calculation: should compute keccak256(spender, keccak256(msg.sender, slot)) but only uses address(arg0)"
  ]
}

NestedLoop (CFG: 65, Decompilation: 0)

Decompilation

{
  "score": 0,
  "summary": "Decompilation completely fails to capture the nested loop structure and state modification logic. The function behavior is fundamentally incorrect.",
  "differences": [
    "Nested loop structure is completely missing - no iteration logic preserved",
    "State modification (number += 1) is not present - no increment operation",
    "Function incorrectly marked as 'view' when it should modify state",
    "Loop logic replaced with nonsensical require statements that would always cause revert",
    "Total functional behavior lost - original increments number by loops*loops, decompiled version just reverts"
  ]
}

CFG

{
  "score": 65,
  "summary": "CFG captures outer loop structure but missing clear inner loop back-edge, making nested loop structure incomplete",
  "missing_paths": [
    "Inner loop back-edge: The inner loop (j < loops) should have a back-edge from the loop body back to the condition check, but the CFG shows node 9 (inner loop body with number += 1) only connecting forward to node 10 (overflow check) without a visible path back to node 8 (inner loop condition check at 0x84)",
    "Outer loop back-edge: Similarly, the outer loop should have a back-edge from the inner loop exit (node 8 after JUMPI at 0x8c fails) back to node 7 (outer loop condition at 0x77), but this connection is not evident in the graph structure"
  ],
  "extra_paths": [
    "Compiler overflow check at node 10 (0x01a4) - added by compiler for arithmetic safety",
    "Function dispatcher paths (nodes 3, 13) for routing calls to loop() vs number()",
    "Callvalue check (nodes 0->1) ensuring no ETH sent to non-payable function",
    "Calldatasize validation (node 2->15) for malformed calls",
    "ABI decoding validation (nodes 4, 5, 6, 12) for input parameter checks",
    "Return data encoding paths (node 14) for the number getter function"
  ],
  "observations": [
    "The outer loop entry at node 7 (0x77 JUMPDEST) with condition check (LT at 0x7a) and exit path (JUMPI to 0xbf at node 11) is clearly represented",
    "The inner loop entry at node 8 (0x84 JUMPDEST) with condition check (LT at 0x87) is present",
    "The inner loop body at node 9 performs the number += 1 operation with overflow check (ADD at 0x199, GT check at 0x19e)",
    "However, the critical back-edges that define loop iteration are either missing from the graph or not properly connected",
    "Without back-edges, the CFG represents the loops more as linear paths with conditional exits rather than true cyclic loop structures",
    "The loop increment operations (i++ and j++) are not visibly represented as separate nodes, which may be embedded in the back-edge logic that appears incomplete",
    "Node 8 shows the inner loop structure but only connects forward to node 9 (loop body), missing the continuation/increment path"
  ]
}

NestedMapping (CFG: 100, Decompilation: 0)

Decompilation

{
  "score": 0,
  "summary": "Decompilation completely failed to preserve any functional logic. No storage operations, no nested mapping accesses, no meaningful control flow. All functions only contain trivial self-equality checks that serve no purpose.",
  "differences": [
    "All three nested mapping state variables (allowances, grid, deepNested) are missing",
    "setAllowance function logic completely missing - no storage write to allowances mapping",
    "setGrid function logic completely missing - no storage write to grid mapping",
    "setDeepNested function logic completely missing - no storage write to deepNested mapping",
    "getAllowance function logic completely missing - no storage read from allowances mapping",
    "Public getter functions for state variables are missing",
    "All decompiled functions contain only meaningless require(arg == arg) checks with no storage operations",
    "Function selectors in decompiled output don't match original contract functions",
    "No nested mapping access patterns preserved (keccak256 hashing for storage slot calculation)",
    "Core contract purpose (managing nested mapping data structures) is completely lost"
  ]
}

Mapping (CFG: 100, Decompilation: 35)

Decompilation

{
  "score": 35,
  "summary": "Critical storage mapping confusion causes functional incorrectness. The decompiler conflated three separate mappings into two storage maps, causing setBalance/getBalance to use different storage than the balances getter, and introducing incorrect bit manipulation logic in setOwner and register functions.",
  "differences": [
    "setBalance writes to storage_map_a but the balances getter reads from storage_map_b, breaking the core balance tracking functionality",
    "setOwner uses bit manipulation (| uint96) instead of simple address assignment to storage",
    "register uses bit manipulation (| uint248) instead of simple boolean true assignment",
    "Storage layout does not preserve the three distinct mappings (balances, owners, registered) from the original contract"
  ]
}

WETH9 (CFG: 100, Decompilation: 45)

Decompilation

{
  "score": 45,
  "summary": "Decompilation captures basic structure but contains critical logic errors in transfer/transferFrom functions, incorrect storage access patterns, and missing fallback function. Core operations are recognizable but functionally incorrect.",
  "differences": [
    "transfer() function has severely corrupted logic with unreachable code paths, duplicate operations, and incorrect conditional checks (e.g., 'require(!storage_map_c[var_a] < arg1)' should be 'require(storage_map_c[var_a] >= arg1)')",
    "transfer() incorrectly calls transferFrom logic inline instead of delegating to transferFrom() function",
    "transferFrom() has corrupted control flow with unreachable code blocks and incorrect allowance checking logic (checks storage_map_c instead of storage_map_d for allowance)",
    "transferFrom() incorrectly checks 'address(arg0) == address(msg.sender)' instead of 'address(arg0) != address(msg.sender)' for allowance deduction path",
    "approve() function writes to wrong storage mapping (storage_map_c instead of storage_map_d)",
    "balanceOf() reads from wrong storage mapping (storage_map_d instead of storage_map_c)",
    "allowance() reads from wrong storage mapping (storage_map_d with incorrect key derivation)",
    "withdraw() uses incorrect syntax 'address(msg.sender).transfer(arg0)' with wrong return types (should be call with value, not returning bool and bytes)",
    "Missing fallback function that calls deposit() when contract receives ETH",
    "Storage layout confusion: balances should use storage_map_c consistently, allowances should use nested mapping storage_map_d, but decompiled code mixes these incorrectly",
    "Multiple unreachable code blocks in transfer() and transferFrom() due to unconditional returns in middle of logic paths"
  ]
}

WhileLoop (CFG: 40, Decompilation: 15)

Decompilation

{
  "score": 15,
  "summary": "Decompilation fails to capture the while loop structure and state modification logic. The function is incorrectly marked as view and contains only trivial require statements instead of iterative logic.",
  "differences": [
    "While loop control flow is completely missing - no iteration occurs",
    "State variable 'number' is never incremented in decompiled version",
    "Function is marked as 'view' instead of state-modifying",
    "Loop counter variable and comparison logic not preserved",
    "No relationship between input parameter and number of iterations"
  ]
}

CFG

{
  "score": 40,
  "summary": "CFG captures loop entry and conditional branch but missing critical back-edge from loop body to loop header, failing to represent the iterative nature of the while loop",
  "missing_paths": [
    "Back-edge from loop body (after incrementing i and number) to loop condition check at 0x77 - this is the fundamental path that makes the while loop iterate",
    "The increment of loop variable i is not clearly represented in a separate node or connected properly",
    "Path from successful overflow check (node 8 at 0x0193) back to loop header is absent"
  ],
  "extra_paths": [
    "Callvalue check during contract initialization (nodes 0-1)",
    "Function selector dispatch logic (nodes 2-3, 12)",
    "Parameter decoding and validation paths (nodes 4-7)",
    "Arithmetic overflow detection for addition (nodes 8-9)",
    "Multiple compiler-generated validation and bounds checking paths"
  ],
  "observations": [
    "The CFG correctly identifies the loop condition check (DUP2 DUP2 LT ISZERO at node 7)",
    "The loop exit path (node 7 -> node 10) when condition is false is properly captured",
    "The loop body entry (node 7 -> node 8) when condition is true is present",
    "However, the most critical element of a while loop - the back-edge that creates iteration - is completely missing",
    "Without the back-edge from body to header, the CFG represents at most a single-iteration if statement, not a loop",
    "This is a fundamental structural deficiency that prevents understanding the control flow semantics",
    "Node 8 ends at 0x018a with instructions that should lead back to iteration, but no edge exists",
    "The CFG may be incomplete or truncated, missing nodes between 0x0193 and the loop header"
  ]
}

@github-actions
Copy link
Copy Markdown
Contributor

Benchmark for 0b972c3

Click to view benchmark
Test Base PR %
heimdall_cfg/complex 8.7±0.04ms 9.4±0.20ms +8.05%
heimdall_cfg/simple 880.1±3.60µs 946.0±4.11µs +7.49%
heimdall_decoder/seaport 32.0±0.82µs 33.6±0.81µs +5.00%
heimdall_decoder/transfer 3.5±0.16µs 3.7±0.18µs +5.71%
heimdall_decoder/uniswap 9.8±0.27µs 10.4±0.66µs +6.12%
heimdall_decompiler/abi_complex 37.0±1.40ms 35.9±0.32ms -2.97%
heimdall_decompiler/abi_simple 928.1±9.41µs 1024.7±15.88µs +10.41%
heimdall_decompiler/sol_complex 47.3±0.28ms 46.5±0.66ms -1.69%
heimdall_decompiler/sol_simple 1335.7±7.56µs 1461.6±12.40µs +9.43%
heimdall_decompiler/yul_complex 38.7±0.95ms 36.8±0.20ms -4.91%
heimdall_decompiler/yul_simple 1009.8±4.24µs 1106.1±6.38µs +9.54%
heimdall_disassembler/complex 779.0±26.10µs 795.2±31.52µs +2.08%
heimdall_disassembler/simple 36.5±1.74µs 38.2±1.95µs +4.66%
heimdall_vm/erc20_transfer 148.6±4.97µs 149.9±6.38µs +0.87%
heimdall_vm/fib 519.6±11.41µs 506.9±6.88µs -2.44%
heimdall_vm/ten_thousand_hashes 446.4±15.62ms 440.1±8.03ms -1.41%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant