fix(avm)!: bytecode hashing - internal audit by jeanmon · Pull Request #21152 · AztecProtocol/aztec-packages

jeanmon · 2026-03-05T10:43:46Z

Linear issue AVM-53

MirandaWood

Amazing, thank you! I know this is a very complex one.
I'm not fully finished with my review but posting this now as to not waste any time! I still need to:

Think over the pil relations using byte length, field length, padding, and rounds_rem
Look over the relations test based on this^

Sorry there are quite a few pedantic nits - overall this is great 🚀

MirandaWood · 2026-03-05T12:55:36Z

+    // Alternatively, we could have constrained `input_len` based on `size_in_bytes` but this
+    // would have required some more costly range checks.


I feel like it should be possible to constrain input_len using size_in_bytes (and it feels like such a waste to have both lengths constrained on the same row, but totally separately! hurts my brain) without range checks since decomp tracks bytes_remaining. But I'm not sure whether this would just add another lookup, which you successfully removed, and/or needs more shifted propagation since we must link to later rows rather than start...
I'll think more on this but submitting this review anyway so other things can be addressed in parallel :)

For any future readers: I did not find a way to use size_in_bytes to define input_len at start once without either a range check or reintroducing the large lookup we removed. For posterity my line of thinking was utilising decomp's bytes_remaining and our rounds_rem since we already know they are correct at start. I think the 'closest' I got was hinting padding only at start to be 0, 1, or 2 (avoiding propagating it all the way from end), keeping #[BYTECODE_LENGTH_FIELDS]), then constraining it against some r = size_in_bytes - 31 * input_len = size_in_bytes - 31 * (3 * rounds_rem - padding) (note: im probably missing out some -1 needed for the start field, this is just for illustration!).

But to make sure r is small we either need to propagate something (like padding , in this PR, or r itself and compare against bytes_remaining on a future row, maybe increasing a lookup tuple), add a new lookup (removed in this PR), or range check it.

So even though we have two length representations of the same bytecode on the same row, it makes the most sense (balancing overhead and being able to reason ab correctness) to have them completely unrelated.

(Rabbit hole aside: another near miss was to extend the multi perms somehow to read bytes when sel_windows_gt_remaining but again since the window size of decomp is bigger than our packed field size (37 > 31) that opened new cans of worms...)

Thanks for the brain power spent on this @MirandaWood. Your conclusion is similar to mine.

MirandaWood

Thanks for your patience!! Review ready, overall great 💮

…elations reworked

Co-authored-by: Miranda Wood <miranda@aztec-labs.com>

MirandaWood

Amazing, thank you! A gargantuan task 🎉

BEGIN_COMMIT_OVERRIDE chore: contract_instance_retrieval pre-audit avm (#21220) fix(avm)!: bytecode hashing - internal audit (#21152) fix(avm)!: precomputed pre-audit (#21313) fix(avm)!: remove unused dc selector (#21314) fix(avm)!: Pre audit public data check / squash (#21266) chore(avm): Pre audit misc opcodes (#21521) END_COMMIT_OVERRIDE

jeanmon marked this pull request as ready for review March 5, 2026 10:45

jeanmon requested review from IlyasRidhuan, Maddiaa0 and fcarreiro as code owners March 5, 2026 10:45

jeanmon requested review from MirandaWood and removed request for IlyasRidhuan, Maddiaa0 and fcarreiro March 5, 2026 10:46

MirandaWood reviewed Mar 6, 2026

View reviewed changes