Skip to content

feat: add vyper selector detection via symbolic execution#684

Closed
Jon-Becker wants to merge 3 commits intomainfrom
jon-becker/bright-violet-desert
Closed

feat: add vyper selector detection via symbolic execution#684
Jon-Becker wants to merge 3 commits intomainfrom
jon-becker/bright-violet-desert

Conversation

@Jon-Becker
Copy link
Owner

What changed? Why?

added support for detecting function selectors in vyper-compiled contracts. vyper uses a different dispatching mechanism than solidity (branching logic instead of jump tables), so we needed a symbolic execution approach to extract selectors.

the implementation:

  • adds vyper-specific selector detection via symbolic execution
  • follows the approach from evmole for handling vyper's branching dispatch
  • supports detecting selectors from any vyper compiled contract
  • tested against 5 live vyper contracts from ethereum mainnet

Notes to reviewers

key files:

  • crates/vm/src/ext/selectors/vyper.rs - new vyper symbolic execution logic
  • crates/vm/src/ext/selectors/mod.rs - refactored to support both solidity and vyper
  • crates/core/tests/test_vyper_selectors.rs - integration tests with live contracts

How has it been tested?

tested with cargo run against 5 live vyper contracts from etherscan:

  • verified selector extraction matches abi for each contract
  • integration tests cover multiple vyper versions and patterns
  • all tests pass locally

Jon-Becker and others added 3 commits February 16, 2026 19:16
Implements Vyper-specific selector detection that traces calldata flow
through the dispatcher using symbolic execution. Converts selectors.rs
to a directory module and adds vyper.rs with:

- Calldata flow tracking through CALLDATALOAD + SHR/DIV operations
- Dense/sparse selector section detection via EQ comparisons
- Hash bucket dispatch detection via MOD/AND operations with forking
- Binary search dispatch support via GT/LT branch forking
- Fallback from Solidity to Vyper detection when few selectors found
- Enhanced resolve_entry_point to support Vyper's msg.data[0x00] format

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tests 5 live Vyper contracts on Ethereum mainnet covering:
- Dense selector dispatch (Curve 3Pool Zap)
- Dense selector section pattern (Curve Vyper 2)
- Sparse selector section pattern (Curve Vyper 1)
- Hash bucket dispatch (Curve General)
- Different Vyper version (Lido Curve Pool)
Plus a regression test for Solidity (WETH).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove DIV, GT, LT, SHR opcode constants that were imported but
not directly referenced in the vyper selector detection code.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
let has_calldata = instruction
.input_operations
.iter()
.any(|op| involves_calldata_wrapped(op));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [clippy] reported by reviewdog 🐶

warning: redundant closure
   --> crates/vm/src/ext/selectors/vyper.rs:141:22
    |
141 |                 .any(|op| involves_calldata_wrapped(op));
    |                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: replace the closure with the function itself: `involves_calldata_wrapped`
    |
    = help: for further information visit https://rust-lang.github.io/rust-clippy/rust-1.91.0/index.html#redundant_closure
    = note: `#[warn(clippy::redundant_closure)]` on by default

// Pattern 3: Check if the condition solidifies to something with msg.data and ==
// This catches cases where the WrappedOpcode tree doesn't exactly match the above patterns
let solidified = condition.solidify();
if solidified.contains("msg.data") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [clippy] reported by reviewdog 🐶

warning: this `if` statement can be collapsed
   --> crates/vm/src/ext/selectors/vyper.rs:204:5
    |
204 | /     if solidified.contains("msg.data") {
205 | |         if solidified.contains("==") || solidified.contains("^ ") {
206 | |             return extract_selector_from_solidified(&solidified);
207 | |         }
208 | |     }
    | |_____^
    |
    = help: for further information visit https://rust-lang.github.io/rust-clippy/rust-1.91.0/index.html#collapsible_if
    = note: `#[warn(clippy::collapsible_if)]` on by default
help: collapse nested if block
    |
204 ~     if solidified.contains("msg.data")
205 ~         && (solidified.contains("==") || solidified.contains("^ ")) {
206 |             return extract_selector_from_solidified(&solidified);
207 ~         }
    |


let hex_len = hex_end - hex_start;
// Selectors are 1-8 hex chars (1 to 4 bytes)
if hex_len >= 1 && hex_len <= 8 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [clippy] reported by reviewdog 🐶

warning: manual `RangeInclusive::contains` implementation
   --> crates/vm/src/ext/selectors/vyper.rs:323:16
    |
323 |             if hex_len >= 1 && hex_len <= 8 {
    |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: use: `(1..=8).contains(&hex_len)`
    |
    = help: for further information visit https://rust-lang.github.io/rust-clippy/rust-1.91.0/index.html#manual_range_contains
    = note: `#[warn(clippy::manual_range_contains)]` on by default

@github-actions
Copy link
Contributor

❌ Eval Report for b3c1385

Test Case CFG Decompilation
SimpleLoop 70 15
NestedMappings 100 45
NestedLoop 45 15
NestedMapping 100 5
WETH9 100 45
SimpleStorage 100 35
TransientStorage 100 15
WhileLoop 100 15
Mapping 100 65
Events 100 75
Average 91 33
⚠️ 9 eval(s) scoring <70%

SimpleLoop (CFG: 70, Decompilation: 15)

Decompilation

{
  "score": 15,
  "summary": "Decompilation fails to preserve core program logic. The loop structure, increment operation, and control flow are completely missing. Function is incorrectly marked as view when it should modify state.",
  "differences": [
    "Loop control flow (for loop from 0 to arg0) is entirely missing",
    "State variable increment operation (number++) is not preserved",
    "Function mutability is incorrect - marked as 'view' instead of state-modifying",
    "Nonsensical require statements replace the actual loop logic",
    "Decompiled version would revert immediately rather than incrementing number multiple times"
  ]
}

NestedMappings (CFG: 100, Decompilation: 45)

Decompilation

{
  "score": 45,
  "summary": "Decompilation captures one function correctly but fails to properly reconstruct the nested mapping structure and the allowance view function. Critical storage access logic is missing or incorrectly represented.",
  "differences": [
    "The nested mapping structure 'mapping(address => mapping(address => uint256))' is decompiled as a simple 'mapping(bytes32 => bytes32)' which loses the two-level nesting required for owner->spender->amount lookups",
    "The allowance() view function (selector 0xdd62ed3e) is decompiled as only validating the address argument but missing the actual storage read and return of allowances[owner][spender]",
    "The approve() function only stores using one key (arg0) when it should compute a nested mapping key based on both msg.sender and arg0 (spender). The storage operation 'storage_map_a[var_a] = arg1' is missing the msg.sender component of the key calculation",
    "The public allowances mapping getter function (selector 0x55b6ed5c) is decompiled with only address validation logic, missing the nested mapping access that should return allowances[arg0][arg1] where arg1 would be the second address parameter",
    "Function signatures are incomplete: allowance() should accept two address parameters (owner, spender) but decompiled version shows only one parameter; the public mapping getter should also accept two address parameters"
  ]
}

NestedLoop (CFG: 45, Decompilation: 15)

Decompilation

{
  "score": 15,
  "summary": "Critical decompilation failure: nested loops completely missing, loop logic not preserved, state mutation incorrectly marked as view function, storage writes absent",
  "differences": [
    "Nested loop structure completely absent - original has two nested for-loops incrementing from 0 to loops, decompiled has no loop constructs",
    "Function marked as 'view' but original mutates state (increments number storage variable loops*loops times)",
    "Storage write operation missing - original performs 'number += 1' in nested loop body, decompiled has no storage writes",
    "Loop iteration logic lost - original increments number by loops^2 total times, decompiled performs no iterations",
    "Control flow completely incorrect - decompiled shows static require statements instead of iterative loops",
    "Functional behavior fundamentally different - calling loop(5) should increment number by 25, decompiled version would not modify state at all"
  ]
}

CFG

{
  "score": 45,
  "summary": "CFG captures loop entry and condition checks but is missing critical back-edges for both nested loops, making the iterative nature of the loops incomplete.",
  "missing_paths": [
    "Inner loop back-edge: Path from inner loop body (node 9) back to inner loop condition check (node 8) for j++ iteration",
    "Outer loop back-edge: Path from inner loop exit (node 8 exit path) back to outer loop condition check (node 7) for i++ iteration",
    "The CFG shows loop structures but without back-edges, the iterative behavior is not properly represented"
  ],
  "extra_paths": [
    "Overflow check path (node 9 → 10): Compiler-added arithmetic overflow protection for number += 1",
    "Callvalue check (node 0 → 1): Compiler-added check that function is not payable",
    "Calldatasize validation (nodes 4-5): Compiler-added parameter validation",
    "Function selector dispatch logic (nodes 3, 13): Compiler-added function routing"
  ],
  "observations": [
    "The CFG correctly identifies both nested loop structures with their condition checks",
    "Loop entry points and exit conditions are properly represented",
    "The inner loop body with state variable increment is captured",
    "However, the graph structure is incomplete without back-edges showing how loops iterate",
    "A complete CFG should show cyclic paths where node 9 connects back to node 8, and node 8 should connect back to node 7 after inner loop completion",
    "The linear structure shown (7→8→9→10, 7→11) only represents one iteration path, not the repeated execution characteristic of loops"
  ]
}

NestedMapping (CFG: 100, Decompilation: 5)

Decompilation

{
  "score": 5,
  "summary": "Decompilation catastrophically failed to preserve contract functionality. None of the original functions, storage operations, or logic are present. Output contains only meaningless tautological require statements that bear no resemblance to the original nested mapping operations.",
  "differences": [
    "All four storage-modifying functions (setAllowance, setGrid, setDeepNested, and their nested mapping writes) are completely missing",
    "The view function getAllowance and its nested mapping read operation is completely missing",
    "All three nested mapping state variables (allowances, grid, deepNested) are absent from the decompiled contract",
    "No storage read operations (SLOAD) are present in decompiled output",
    "No storage write operations (SSTORE) are present in decompiled output",
    "Decompiled functions only contain trivial require statements checking that values equal themselves (e.g., require(arg0 == arg0)), which serve no functional purpose",
    "Function selectors in decompiled output do not match original function selectors",
    "No nested mapping access patterns (double/triple keccak256 hashing for storage slots) are represented",
    "Control flow is completely different: original has storage operations, decompiled has only tautological checks",
    "All function return values are missing (getAllowance should return uint256)"
  ]
}

WETH9 (CFG: 100, Decompilation: 45)

Decompilation

{
  "score": 45,
  "summary": "Decompilation captures basic structure but contains critical control flow errors, incorrect storage access patterns, unreachable code blocks, and missing fallback function logic",
  "differences": [
    "transfer() function has unreachable duplicate code blocks after return statements that should not exist in the control flow",
    "transfer() calls transferFrom(msg.sender, dst, wad) in original but decompiled version inlines different logic with incorrect conditional checks (checking msg.sender == msg.sender which is always true)",
    "transferFrom() function contains malformed allowance check logic - uses storage_map_c instead of nested storage_map_d for allowance mapping lookups",
    "transferFrom() checks 'address(arg0) == address(msg.sender)' but should check 'src != msg.sender' as the condition for allowance consumption",
    "transferFrom() has unreachable duplicate code blocks after return statements",
    "approve() function incorrectly writes to storage_map_c instead of storage_map_d for allowance storage",
    "balanceOf() incorrectly reads from storage_map_d instead of storage_map_c where balances are stored",
    "allowance() incorrectly accesses storage - uses only arg1 as key instead of nested mapping [arg0][arg1]",
    "withdraw() incorrectly uses address.transfer() which returns void, but decompilation attempts to capture (bool success, bytes memory ret0) as if it were a call()",
    "Fallback function that calls deposit() on plain ETH transfers is completely missing from decompiled output",
    "transferFrom() allowance decrement logic is missing from the accessible code path",
    "Control flow structure is fundamentally broken with multiple unreachable return statements suggesting incorrect basic block reconstruction"
  ]
}

SimpleStorage (CFG: 100, Decompilation: 35)

Decompilation

{
  "score": 35,
  "summary": "Decompilation fails to capture critical state variables and function logic. The 'initialized' boolean variable is completely missing, and setOwner/initialize/reset functions contain incorrect bitwise operations instead of simple storage assignments.",
  "differences": [
    "Missing 'initialized' boolean state variable - decompiled contract only has 2 state variables instead of 3",
    "initialize() incorrectly modifies 'owner' with bitwise operations instead of setting 'initialized' to true",
    "setOwner() uses complex bitwise operations '(address(arg0) * 0x01) | (uint96(owner))' instead of simple assignment 'owner = _owner'",
    "reset() missing logic to set 'initialized = false'",
    "reset() uses incorrect bitwise operations for owner assignment instead of 'owner = address(0)'"
  ]
}

TransientStorage (CFG: 100, Decompilation: 15)

Decompilation

{
  "score": 15,
  "summary": "Decompilation severely fails to preserve functional logic. Transient storage operations are mostly missing or incorrectly represented as constants. Only one function (setTempOwner) is partially decompiled, but with incorrect logic. Five of six functions are completely absent as executable code.",
  "differences": [
    "incrementCounter() function missing - represented as empty constant instead of function that reads and writes transient storage",
    "lock() function missing - represented as empty constant instead of function that writes boolean true to transient storage",
    "unlock() function missing - represented as empty constant instead of function that writes boolean false to transient storage",
    "getCounter() function missing - represented as constant '1' instead of view function that reads transient storage slot",
    "isLocked() function missing - represented as constant 'Bool(true)' instead of view function that reads transient storage",
    "setTempOwner() transient storage write logic incorrect - uses complex bit manipulation (arg0 * 0x01 | uint96(transient[0x01])) instead of simple assignment (tempOwner = owner)",
    "setTempOwner() incorrectly marked as 'pure' instead of non-payable (original modifies transient state)",
    "All transient storage read operations (TLOAD) are missing",
    "Most transient storage write operations (TSTORE) are missing or incorrect",
    "Function return values missing for getCounter() and isLocked()",
    "State-changing behavior lost for incrementCounter(), lock(), and unlock() functions"
  ]
}

WhileLoop (CFG: 100, Decompilation: 15)

Decompilation

{
  "score": 15,
  "summary": "Decompilation fails to capture the while loop structure and iterative logic. The loop control flow, counter variable, and repeated state modifications are completely missing.",
  "differences": [
    "While loop structure not preserved - no iteration logic present",
    "Counter variable initialization and increment operations missing",
    "Repeated increment of 'number' state variable not captured",
    "Function incorrectly marked as 'view' instead of state-changing",
    "Decompiled logic would revert for any input > 0 instead of executing loop",
    "No state changes occur in decompiled version, fundamentally changing program behavior"
  ]
}

Mapping (CFG: 100, Decompilation: 65)

Decompilation

{
  "score": 65,
  "summary": "Core storage operations are present but storage layout confusion causes significant functional errors. Read operations use wrong storage map, and write operations use bitwise manipulation instead of direct assignment for some mappings.",
  "differences": [
    "setBalance writes to storage_map_a instead of balances mapping, but balances() reads from storage_map_b - storage slot mismatch breaks balance tracking",
    "setOwner uses bitwise operations (address * 0x01 | uint96(storage)) instead of direct assignment, which could corrupt data or fail",
    "register uses bitwise operations (0x01 * 0x01 | uint248(storage)) instead of simple boolean assignment",
    "owners() reads from storage_map_b with division operation (address(storage_map_b[var_b] / 0x01)) instead of direct read from storage_map_a where setOwner writes",
    "registered() reads from storage_map_b with complex boolean conversion instead of direct read from storage_map_a where register writes",
    "getBalance reads from storage_map_a which matches setBalance writes, but public balances() getter reads from storage_map_b - inconsistent storage access for same logical mapping"
  ]
}

@github-actions
Copy link
Contributor

Benchmark for b3c1385

Click to view benchmark
Test Base PR %
heimdall_cfg/complex 9.8±0.09ms 9.8±0.24ms 0.00%
heimdall_cfg/simple 969.3±5.22µs 1011.5±14.30µs +4.35%
heimdall_decoder/seaport 40.6±2.50µs 41.0±2.97µs +0.99%
heimdall_decoder/transfer 3.0±0.24µs 3.1±0.26µs +3.33%
heimdall_decoder/uniswap 11.7±0.77µs 11.8±0.89µs +0.85%
heimdall_decompiler/abi_complex 43.3±1.69ms 41.9±2.59ms -3.23%
heimdall_decompiler/abi_simple 1063.9±30.43µs 1075.1±27.76µs +1.05%
heimdall_decompiler/sol_complex 58.3±1.93ms 57.7±1.95ms -1.03%
heimdall_decompiler/sol_simple 1581.7±32.38µs 1604.6±10.41µs +1.45%
heimdall_decompiler/yul_complex 47.0±3.83ms 46.0±3.67ms -2.13%
heimdall_decompiler/yul_simple 1159.7±15.09µs 1190.1±5.40µs +2.62%
heimdall_disassembler/complex 1012.7±103.27µs 1115.2±30.43µs +10.12%
heimdall_disassembler/simple 51.1±6.99µs 55.1±2.69µs +7.83%
heimdall_vm/erc20_transfer 190.9±12.45µs 190.0±7.96µs -0.47%
heimdall_vm/fib 650.2±45.78µs 637.2±54.20µs -2.00%
heimdall_vm/ten_thousand_hashes 4.4±1.60s 4.7±0.84s +6.82%

@Jon-Becker Jon-Becker closed this Feb 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant