This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Remill is a static binary translator that lifts machine code instructions into LLVM bitcode. It supports x86/amd64 (including AVX/AVX512), AArch64, AArch32, SPARC32/64, and PPC architectures. Remill is designed as a library for binary analysis tools like McSema.
# Build dependencies including LLVM from source
cmake -G Ninja -S dependencies -B dependencies/build
cmake --build dependencies/build
# Or with external LLVM (macOS with Homebrew)
brew install llvm@17
cmake -G Ninja -S dependencies -B dependencies/build -DUSE_EXTERNAL_LLVM=ON \
"-DCMAKE_PREFIX_PATH:PATH=$(brew --prefix llvm@17)"
cmake --build dependencies/buildcmake -G Ninja -B build "-DCMAKE_PREFIX_PATH:PATH=$(pwd)/dependencies/install" \
-DCMAKE_BUILD_TYPE=Release
cmake --build build# Build test dependencies
cmake --build build --target test_dependencies
# Run all tests
cd build && ctest
# Run architecture-specific tests
ctest -R amd64
ctest -R aarch64Intrinsics System: Remill defers memory/control-flow semantics to downstream consumers via intrinsics (__remill_read_memory_*, __remill_write_memory_*, etc.). This allows tools to implement their own memory models.
SEM (Semantics): Generic C++ template functions implementing instruction behavior. Named after the instruction mnemonic (e.g., AND, ADD).
ISEL (Instruction Selection): Specific instantiations of semantics for particular encodings. Named with operand types (e.g., AND_GPRv_MEMv_32).
State Structure: Per-architecture register/flag state (State.h). Uniform size across generations enables mixing translated bitcode.
Basic Block Lifting: Core function __remill_basic_block(State&, addr_t, Memory*) - variables correspond to register names, decoder and lifter synchronize via this naming convention.
lib/Arch/X86/Semantics/- x86/amd64 instruction semantics by category (BINARY.cpp, LOGICAL.cpp, AVX.cpp, SSE.cpp, etc.)lib/BC/InstructionLifter.cpp- Converts decoded instructions to LLVM IRinclude/remill/Arch/Runtime/- State structures, intrinsics, operators, typestests/X86/,tests/AArch64/- Architecture-specific test suites
template <typename D, typename S1, typename S2>
DEF_SEM(INSTR_NAME, D dst, S1 src1, S2 src2) {
auto lhs = Read(src1);
auto rhs = Read(src2);
auto res = UAdd(lhs, rhs); // or UAnd, UMul, etc.
WriteZExt(dst, res);
// Update flags last
return memory;
}- Signed:
SAdd,SSub,SMul,SDiv - Unsigned:
UAdd,USub,UMul,UDiv,UAnd,UOr,UXor - Floating-point:
FAdd,FSub,FMul,FDiv - Vector ops:
UAddV32,FMulV64,UReadV32,FWriteV64
// For SCALABLE instructions, suffix with operand size
DEF_ISEL(AND_GPRv_MEMv_8) = AND<R8W, R8, M8>;
DEF_ISEL(AND_GPRv_MEMv_16) = AND<R16W, R16, M16>;
DEF_ISEL(AND_GPRv_MEMv_32) = AND<R32W, R32, M32>;
IF_64BIT(DEF_ISEL(AND_GPRv_MEMv_64) = AND<R64W, R64, M64>;)
// Or use shorthand macro
DEF_ISEL_RnW_Rn_Mn(AND_GPRv_MEMv, AND);R8,R16,R32,R64- Register readsR8W,R16W,R32W,R64W- Register writesM8,M16,M32,M64- Memory operandsV128W- SIMD destination (always 128-bit for writes)
Read()all source operands first- Perform computations
Write()/WriteZExt()to destinations- Update suppressed operands (flags) last
This ordering limits state mutations if memory access violations occur.
Test files go in tests/{ARCH}/{CATEGORY}/ and must be included in Tests.S.
TEST_BEGIN(ANDr32r32, 2)
TEST_IGNORE_FLAGS(AF)
TEST_INPUTS(
0, 0,
1, 0,
0xFFFFFFFF, 1,
0x7FFFFFFF, 1)
and ARG1_32, ARG2_32
TEST_ENDTEST_BEGIN(name, num_args)- Start test with N input argumentsTEST_IGNORE_FLAGS(...)- Flags left in undefined stateTEST_INPUTS(...)- Comma-separated input values (grouped by num_args)ARG1_32,ARG2_64, etc. - Input argument registers by size
- Format: Google style via
.clang-format(80-char line limit, 2-space indent) - Standard: C++17
- Run
clang-format -i <file>before committing
For x86 instruction details, consult docs/XED/xed.txt. Key fields:
- Instruction category (LOGICAL, BINARY, etc.) → determines which
.cppfile SCALABLEattribute → needs size-suffixed ISELsEXPLICIT/IMPLICIT/SUPPRESSED→ determines semantic function parametersR/W/RW→ operand mutability (RW generates two parameters: write first, read second)