Skip to content

From bytes to CircIR

lkorenc edited this page Apr 6, 2022 · 16 revisions

This page is intended to help developers that are planning to look around the codebase with possible intentions of tweaking things or fixing bugs.

Context

For the duration of the entire lift process a context is present (and should be owned by CircuitSmithy class). Apart for owning all important pieces that are needed for calling into other libraries (llvm::LLVMContext or remill::Arch) it also provides a few helpers that streamline usage of remill lifting utilities.

Context is responsible for determining which registers should appear in the resulting circuit as inputs/outputs (it makes sense to forbid XMM register for example, if it is known that floating point instruction or other instructions using them will not be lifted).

Decoder

First step is to decode bytes (or a stream of bytes) into parsed instructions. This can be split into two parts

  • Disassembly into remill::Instruction - remill exposes an API to do that, so there is no extra work required.
  • Operand fuzzing - for each instruction we try to determine which bits are responsible for which operand. This is currently done by a rather naive heuristic that however performed rather well on inputs so far. For more information see appropriate wiki page and include/circuitous/Fuzz directory.

Instruction encoding

Ordering of instruction bytes in the trace is not enforced from outside - circuitous is free to arrange it as it wishes, however it must be internally consistent + it must hold that decoding checks for contexts are unique and there can never be a case where two contexts are selected.

Currently we order bytes as they are normally - e.g. for 48 8d 08 (which is encoding of some variant of lea instruction) extract from index 0 of size 8 would yield 0x48 and extract from index 0 of size 4 would yield 0x8.

In the codebase std::string or std::bitset is often used to hold the encoding - on contrary to how human would write the number on index 0 the least significant bit of first byte will be stored. This is opposite to how ctor of llvm::APInt expects its argument to be ordered and therefore encoding is often reversed before.

Decoder component synthesis

For each context there is an unique DecoderResult value, that is currently independent from the instruction semantics itself. The DecoderResult computation can be split into two parts, which must both hold:

  • All pieces of encoding that are not part of operands are checked to be fixed (via DecodeCondition).
  • If there is operand encoding that is not saturating - there are combinations of values that are not a valid encoding - it must be checked that the invalid case did not occur. To allow easier analysis later, the format of this check is enforced to be of form: not( or( DecodeCondition( x, a ), DecodeCondition( x, b ), ... ) )

Decoder component is emitted by CircuitBuilder orthogonally to the rest of components (except they are joined by top-level VerifyInstruction).

Instruction lifter

While there is remill::InstructionLifter available to be used it does not provide the functionality circuitous more often than not needs. Original remill::InstructionLifter operates on a mode that "instantiates" the semantics with specific operands whereas in circuitous it is needed to lift a semantics with "generic" operands (think rax vs all 16 gpr registers). Another problem is output values as we cannot have memory operations in the bitcode that is to be lowered. Original remill::InstructionLifter is however still used for type conversion and various resizes.

To combat operand lifting problem, circuitous overrides hooks responsible for operand lifting:

  • Immediate operands are lifted either as
    • input.immediate( value ) if the constant is hardcoded by the semantics.
    • input.immediate( extract.N.M( instruction_bytes() ) ) if the constant is encoded in the instruction bytes.
  • Registers are lifted naturally by selection based on the corresponding encoding bits - e.g. select( extract.N.M, reg1, reg2, reg3, ... ).
  • Memory operands are combination of approaches above applied to each parts of expression that computes final address.

Register operands that are written into are unfortunately more complicated. The approach circuitous currently employs is that we create special alloca that corresponds to destination register and then later, when CircuitBuilder inlines the semantics it fetches this alloca (it has unique metadata) and correctly emits output check against the destination register (which is chosen by a decoder selection again).

Flattener

After circuitous generates semantics function it needs to get rid of control flow in it - this of course requires the function to have no loops. Only thing to keep in mind here is that when store happens, the path condition must be included as well.

SF <- 0

therefore must be translated as

SF <- select( path_condition, 0, SF )

Circuitous LLVM IR Intrinsics

Sometimes we want to capture some high level semantic information that cannot be natively expressed in llvm IR - for example we want to have a semantic alias for equality when dealing with decoder checks, or n-ary select. To achieve this goal circuitous provide its own set of intrinsic calls. Usually these later map to circIR 1:1 but some are used to carry information between various stages of the lifting process or enable/disable some native llvm optimisations that would break the IR.

There is a bunch of helpers that tries to streamline the process of both defining new intrinsics and emitting new calls/parsing information from existing calls. For more see include/circuitous/IR/Intrinsics.hpp as it is documented rather nicely in the source file.

High level view of generated circuit

Currently one context follows roughly following structure:

VerifyInstruction( decoder_check, error_transition, computational_transition, ... various invariants ... )

where under various invariants simple invariant checks are included such as:

  • input_ebit -> output_ebit error bit must be saturated (once set it is never lowered),
  • input_timestamp + 1 == output_timestamp,
  • unused memory hints and unused advices are properly zeroed.

error_transition is simply constructed as follows:

x1 = output.constraint( r1_in, r2_out )
... repeat for each register ...
error_transtition = or( not( current_ebit ), and( x1, ... ) )

therefore either error did not occur (and there is some computation happening which is check somewhere else) or the error did occur and we need to check registers are not changed (including instruction pointer)!

computational_transition can be split into two logical units:

  • Simple check whether the register(s) selected as output one has correct value:
    • A = output.constraint( select( bits reg1_out reg2_out ... ) X )
  • Check that all other values are unchanged. This is a bit more tricky because we can modify one arbitrary register and not a fixed one. Therefore for each register that could have been written into we emit following:
    • B = or( was_decoded_as_dst_operand( rN ), register_constraint( rN_in, rN_out ))

These two parts must be combined with i.e. C = and( A, B ) (since they complement each other). This expression is enough as long as we do not consider error bit, which property states that if it is set nothing can change; therefore we need to add an extra check on top of C:

computational_transition = or( current_ebit, C ).

Either C is true (and values were computed properly) or there is an error therefore it does not matter if values were computed properly - they actually need to remain the same, but that is responsibility of error_transition.

Clone this wiki locally