fix(avm)!: bytecode hashing - internal audit#21152
Conversation
MirandaWood
left a comment
There was a problem hiding this comment.
Amazing, thank you! I know this is a very complex one.
I'm not fully finished with my review but posting this now as to not waste any time! I still need to:
- Think over the pil relations using byte length, field length, padding, and rounds_rem
- Look over the relations test based on this^
Sorry there are quite a few pedantic nits - overall this is great 🚀
| // Alternatively, we could have constrained `input_len` based on `size_in_bytes` but this | ||
| // would have required some more costly range checks. |
There was a problem hiding this comment.
I feel like it should be possible to constrain input_len using size_in_bytes (and it feels like such a waste to have both lengths constrained on the same row, but totally separately! hurts my brain) without range checks since decomp tracks bytes_remaining. But I'm not sure whether this would just add another lookup, which you successfully removed, and/or needs more shifted propagation since we must link to later rows rather than start...
I'll think more on this but submitting this review anyway so other things can be addressed in parallel :)
There was a problem hiding this comment.
For any future readers: I did not find a way to use size_in_bytes to define input_len at start once without either a range check or reintroducing the large lookup we removed. For posterity my line of thinking was utilising decomp's bytes_remaining and our rounds_rem since we already know they are correct at start. I think the 'closest' I got was hinting padding only at start to be 0, 1, or 2 (avoiding propagating it all the way from end), keeping #[BYTECODE_LENGTH_FIELDS]), then constraining it against some r = size_in_bytes - 31 * input_len = size_in_bytes - 31 * (3 * rounds_rem - padding) (note: im probably missing out some -1 needed for the start field, this is just for illustration!).
But to make sure r is small we either need to propagate something (like padding , in this PR, or r itself and compare against bytes_remaining on a future row, maybe increasing a lookup tuple), add a new lookup (removed in this PR), or range check it.
So even though we have two length representations of the same bytecode on the same row, it makes the most sense (balancing overhead and being able to reason ab correctness) to have them completely unrelated.
(Rabbit hole aside: another near miss was to extend the multi perms somehow to read bytes when sel_windows_gt_remaining but again since the window size of decomp is bigger than our packed field size (37 > 31) that opened new cans of worms...)
There was a problem hiding this comment.
Thanks for the brain power spent on this @MirandaWood. Your conclusion is similar to mine.
MirandaWood
left a comment
There was a problem hiding this comment.
Thanks for your patience!! Review ready, overall great 💮
…elations reworked
Co-authored-by: Miranda Wood <miranda@aztec-labs.com>
ab1d0a0 to
a4a35ef
Compare
MirandaWood
left a comment
There was a problem hiding this comment.
Amazing, thank you! A gargantuan task 🎉
BEGIN_COMMIT_OVERRIDE chore: contract_instance_retrieval pre-audit avm (#21220) fix(avm)!: bytecode hashing - internal audit (#21152) fix(avm)!: precomputed pre-audit (#21313) fix(avm)!: remove unused dc selector (#21314) fix(avm)!: Pre audit public data check / squash (#21266) chore(avm): Pre audit misc opcodes (#21521) END_COMMIT_OVERRIDE
Linear issue AVM-53