Skip to content

Overview of generated circuits

Sebastiaan Peters edited this page May 19, 2022 · 3 revisions

This page aims to be a somewhat self-contained page on how to understand and follow graph diagrams, and to give insight into how different aspects of instructions encodings relate to the generated circuits. This page is structured in a similar top-down fashion as the circuits themselves.

The circuit

A circuit is expected to be able to execute the equivalent of a single CPU instruction (or clock assuming each instruction takes just one). The CPU and memory state before execution are given as an input trace and the state post-execution is an output trace. These traces The circuit is represented as an acyclic data-flow-graph in which the leaves are either values from the in/output traces.

The circuit graph consists of a top-level node called circuit. This top-level node is followed by a XOR/only-once node which has all possible contexts (instructions with extra asserts) it can execute as children. Meaning that a circuit only successfully executes if it executed a single instruction, and the instruction executed successfully includes possible extra asserts (like RIP being increased appropriately).

Contexts

Instructions are not converted to circuitry directly,rather they are represented as Context which can be seen as a generalized version of the instruction. An example of this would be where mov rax, imm would be generalized to mov reg, imm. Contexts in the graph are represented as "VerifyInstruction" nodes.

VerifyInstruction nodes contain a child for each condition that needs to hold for the instruction to execute successfully. As such their precise form depends on the instruction it represents.

Conceptional it can be split up into 3 large chucks:

  • Instruction decoding
  • The computation
  • Trace info maintenance

Instruction decoding

Every Context is supposed to check whether the instruction they represent should be executed, the end of this is stored in a Decoding Result node. Each check is represented as a Decoder Condition node with as operands the expected value and the bits it checks.
The input encoding for the circuit is placed by the system at the node instruction_bits.

Whether the values in instruction_bits represent a valid encoding for the represented instruction is in a couple of parts:

  • The Non-register/memory selecting bits from instruction_bits are checked against constant values (i.e do opcodes match)
  • The values of instruction_bits that are not part of the expected encoding are checked if they are all zero up to the architecture's maximum length (15 for x86). i.e a decoding result for a nop (single byte) will check if there are 14 zero bytes. This works as the system provides instructions rather than the first 15 bytes located at RIP
    comment: <> (* And another check that is currently redundant check which probably should be removed?)

It is important to note that bits used to select the register are not part of the decoder subtree.

Extract

The values from instruction_bits are available through an extract node which has operands end_exclusive,beginning_inclusive. Two important things to know is that byte values are in little-endian. So extract.3.0 extracts the 3 least significant bits of the byte. Another thing is since only one context can be true, every decoding always starts at index 0.

The computation

This part is easily recognized as it includes a large and node that specifies all the required post-conditions for each register/memory location.

Registers/Memory locations have their .in and .out connected through a series of nodes in which the desired computation of an instruction occurs.

The registers which are not touched by the instruction have their in and out nodes simply connected through a register_constraint. Meaning the context only holds true if in == out.

Before we continue with how nodes that depend on registers that can be an operand, we will take a quick detour on register selection

Register selection & Advice nodes

To make circuits simpler, we have nodes that can hold values called advice nodes. Think of these as a tmp variable. Typically, x86 instructions which have an operand express which operand gets selected by setting 3 special bits in the encoding. For x86 these bits look like the following:

000 -> eax

001 -> ecx

010 -> ebx

...

The value of these 3 bits is extracted from the instruction_bits and typically saved in a separate advice node.

To check whether eax is selected, all that needs to be done is make a comparison between the value of the advice node and it's encoding constant To retrieve the register itself which is selected, the value in the advice node can be passed as an index to a select node that is connected to the registers and which the indices match the encoding.

Going back to the computation node.
Registers which might be an operand (and thus potentially be changed) are of the following form: Instead of just stating regX.in == regX.out, we state OR(regX.in == regX.out, selected_operand=X) where selected_operand=X is the check between the advice node holding the operand index and the register. Effectively meaning either regX remains their value constant if not selected, and if selected then just return true.

Trace info maintenance

This part keeps track of some important trace info. Notably, it makes sure that :

  • Errors flags never get de-raised
  • Timestamps are increased by 1 with each execution TODO -- EXPAND THIS

Clone this wiki locally