Refactored VM, VMState, and Block + new apply_transaction#247
Conversation
| @property | ||
| def precompiles(self): | ||
| if self._precompiles is None: | ||
| return set() |
There was a problem hiding this comment.
This should be a dict instead of a set
| self.db = ImmutableDB(db) | ||
| else: | ||
| self.db = db | ||
| self.db = TrackedDB(self.db) |
There was a problem hiding this comment.
Instead of overwriting self.db here maybe move this up into the previous if/else
| db = None | ||
|
|
||
| def __init__(self, header, transactions=None, uncles=None): | ||
| self.db = BaseChainDB(MemoryDB()).db # for generating transaction_root and receipt_root |
There was a problem hiding this comment.
I'm not sure I understand the need for this. Where is it used in the code?
There was a problem hiding this comment.
In non-stateless mode, the transaction trie is rebuilt via BaseChainDB.add_transaction( block_header, index_key, transaction):
def add_transaction(self, block_header, index_key, transaction):
transaction_db = Trie(self.db, root_hash=block_header.transaction_root)
transaction_db[index_key] = rlp.encode(transaction)
return transaction_db.root_hashAnd it is triggered in VM.add_transaction(transaction, computation):
https://github.com/hwwhww/py-evm/blob/ec9ff9c63ff96e484150bc62ab862aed5505a03d/evm/vm/base.py#L79
For stateless mode, BaseBlock.db is only used for storing transaction trie and receipt trie in VMState level
during create_block because I'd want to remove chaindb from VMState eventually.
In VMState.add_transaction:
https://github.com/hwwhww/py-evm/blob/ec9ff9c63ff96e484150bc62ab862aed5505a03d/evm/vm_state.py#L223
# Get trie roots and changed key-values.
tx_root_hash, tx_db = cls.add_trie_node_to_db(
block.header.transaction_root,
index_key,
transaction,
block.db,
)
receipt_root_hash, receipt_db = cls.add_trie_node_to_db(
block.header.receipt_root,
index_key,
receipt,
block.db,
)And then insert the transaction to block.db:
https://github.com/hwwhww/py-evm/blob/ec9ff9c63ff96e484150bc62ab862aed5505a03d/evm/vm_state.py#L247-L254
@staticmethod
def add_trie_node_to_db(root_hash, index_key, node, db):
"""
Add transaction or receipt to the given db.
"""
trie_db = Trie(db, root_hash=root_hash)
trie_db[index_key] = rlp.encode(node)
return trie_db.root_hash, trie_db.dbThere was a problem hiding this comment.
I'm wondering if we can take a different approach to decouple some of this.
The mk_receipt_root function in the pyethereum codebase is probably the utility we need to bring over.
Then, rather than keeping track of the trie for receipts and transactions we could very easily just rebuild it on demand each time we add a new transaction/receipt. Thoughts?
There was a problem hiding this comment.
I did rewrite mk_receipt_root for debugging during implementing, it would be much cleaner if we use mk_receipt_root to get root. The trade-offs are:
-
Since we want to get the "current block" after each time we call
VMState.apply_transaction, ifimport_blockis called, the trie-rebuilding will insert all the previous receipts of this block to the trie during eachVMState.apply_transactioncalling. Would it be more inefficient? -
I find that if we keep the current code, maybe we don't need to store
VMState.receiptsinVMStatelevel. I'm still checking if it's true, right now it seems that we only require maintainingVMState.receiptsfor rebuilding receipt trie.
If 2 is true, it won't be a big deal to maintain VMState.receipts.
There was a problem hiding this comment.
I'm comfortable with the overhead with rebuilding the trie each time. If the overhead turns out to be significant we can address it at that time.
There was a problem hiding this comment.
by the way, you do what you think is best. This idea isn't coming from me having a fully formed solution, but rather trying to further simplify the Block object and remove some coupling and database access. It may very well be that it doesn't provide a worthwhile improvement.
There was a problem hiding this comment.
Totally agree your point. Either is fine with me for now, let me figure out it after more refactorings of create_block.
| if unknown_fields: | ||
| raise AttributeError( | ||
| "Unable to set the field(s) {0} on the `BlockHeader` class. " | ||
| "Received the following unexpected fields: {0}.".format( |
There was a problem hiding this comment.
The {0} here should be {1} to correctly format the string.
| def create_block( | ||
| cls, | ||
| transactions, | ||
| transaction_witnesses, |
There was a problem hiding this comment.
not sure if this is an improvement but...
If I understand correctly, we require that len(transaction) == len(transaction_witnesses). Rather than passing these in as separate lists, it might be good to use a data structure like the following:
transactions_and_witnesses = (
(transaction_0, transaction_witness_0),
(transaction_1, transaction_witness_1),
(transaction_2, transaction_witness_2),
...
)
This approach builds in some minimal validation since you can then loop over them like this:
for index, (transaction, transaction_witness) in enumerate(transactions_and_witnesses):
...Not sure this is a huge improvement, but since the two arguments are inherently linked together, it might be nice to accept them already linked.
There was a problem hiding this comment.
Yes, we do require that len(transaction) == len(transaction_witnesses).
From sharding doc:
Transaction package format
[
[nonce, acct, data....], # transaction body (see below for specification)
[node1, node2, node3....] # witness
]I will modify it to the form you suggested, thank you. :)
| """ | ||
| return self.get_block_header_by_hash(block_header.parent_hash) | ||
|
|
||
| def is_key_exsits(self, key): |
| def get_computation(self, message): | ||
| """Return state object | ||
| """ | ||
| if self.computation_class is not None: |
There was a problem hiding this comment.
Slightly more readable version of this.
if self.computation_class is None:
raise AttributeError("No `computation_class` has been set for this VMState")
else:
return self.computation_class(self, message)| vm_state, | ||
| transaction, | ||
| block, | ||
| is_stateless=True, |
There was a problem hiding this comment.
I think there is a route for us to remove the need for this flag. We can have VM.apply_transaction always be pure/stateless, and then modify Chain.apply_transaction to update the local vm state before returning the computation object.
Assuming there is nothing wrong with that approach I like it a lot more than using a flag.
There was a problem hiding this comment.
Agree! Now the non-stateless mode is only for verifying the stateless mode result in case I screw up. Could we remove the non-stateless mode later?
There was a problem hiding this comment.
Yes, doing it separately is fine. Mind opening an issue to track it and linking it here.
| receipt, | ||
| block.db, | ||
| ) | ||
| trie_data = {} |
There was a problem hiding this comment.
You can replace these three lines (and remove the mutation) with:
trie_data = cytoolz.merge(tx_db.wrapped_db.kv_store, receipt_db.wrapped_db.kv_store)
| "rlp==0.4.7", | ||
| "eth-keys==0.1.0b3", | ||
| "trie>=0.3.2", | ||
| "trie==0.3.2", |
There was a problem hiding this comment.
What was the breakage with 1.0.0? I thought it was backwards compatible.
There was a problem hiding this comment.
I haven't dug into the root cause. The error message is like:
https://travis-ci.org/hwwhww/py-evm/jobs/324634985
tests/json-fixtures/test_state.py:247:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
evm/vm/base.py:39: in __init__
self.block = block_class.from_header(header=header, chaindb=self.chaindb)
evm/vm/forks/frontier/blocks.py:94: in from_header
transactions = chaindb.get_block_transactions(header, cls.get_transaction_class())
.tox/py35-native-state-byzantium/lib/python3.5/site-packages/eth_utils/functional.py:22: in inner
return callback(fn(*args, **kwargs))
evm/db/chain.py:163: in get_block_transactions
if transaction_key in transaction_db:
.tox/py35-native-state-byzantium/lib/python3.5/site-packages/trie/hexary.py:397: in __contains__
return self.exists(key)
.tox/py35-native-state-byzantium/lib/python3.5/site-packages/trie/hexary.py:106: in exists
return self.get(key) != BLANK_NODE
.tox/py35-native-state-byzantium/lib/python3.5/site-packages/trie/hexary.py:62: in get
root_node = self._get_node(self.root_hash)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <trie.Trie object at 0x7f6c2cc46908>, node_hash = None
def _get_node(self, node_hash):
if node_hash == BLANK_NODE:
return BLANK_NODE
elif node_hash == BLANK_NODE_HASH:
return BLANK_NODE
> if len(node_hash) < 32:
E TypeError: object of type 'NoneType' has no len()
There was a problem hiding this comment.
That's because Trie.__init__() doesn't upcall -- it just issues a warning
53f1046 to
558bdb4
Compare
Changelog
Seeking for @pipermerriam 's advice and wisdom about mutating
|
|
Both options seem to have comprable trade-offs so I'd say go with whichever seems best. Allowing the mutation to remain for the time being is fince with the understanding that we'll be looking for a clean way to remove it in the future. I'd rather give us time to figure out a solution we like rather than push forward with a solution we're not sure about. |
|
@pipermerriam |
pipermerriam
left a comment
There was a problem hiding this comment.
This looks good. There's some basic cleanup that can be done, but nothing that changes the overall structure significantly. Given the size of this PR, I'd like to have @gsalgado give it a brief review to make sure I didn't miss something. I'll try to give the tests one more look in the morning when I'm fresh because I know I rushed through them.
| return cls.get_block_class().from_header(block_header, db) | ||
|
|
||
| @staticmethod | ||
| def get_parent_header(block_header, db): |
There was a problem hiding this comment.
Given that this method is purely a passthrough what is the reasoning for including it on the VM class?
| return db.get_block_header_by_hash(block_header.parent_hash) | ||
|
|
||
| @staticmethod | ||
| def get_block_header_by_hash(block_hash, db): |
There was a problem hiding this comment.
Given that this method is purely a passthrough what is the reasoning for including it on the VM class?
There was a problem hiding this comment.
Not really, I just left them with @classmethod def get_block_by_header(cls, block_header, db). Should I move @staticmethod def get_parent_header(block_header, db) and @classmethod def get_prev_headers(cls, last_block_hash, db) to evm.utils.db?
| parent_header, | ||
| gas_limit_floor=GENESIS_GAS_LIMIT, | ||
| ), | ||
| timestamp=max(timestamp, parent_header.timestamp + 1), |
There was a problem hiding this comment.
I think that rather than using max here this should throw an exception if timestamp <= parent_header.timestamp. It feels wrong to override the timestamp argument in this way.
There was a problem hiding this comment.
Yes, you're right, it was supposed to be checking current time (time.time()) and parent_header.timestamp + 1 if timestamp is None. I'll update it. Thank you.
| compute_difficulty, | ||
| parent_header, | ||
| timestamp, | ||
| coinbase=b'\x35' * 20, |
There was a problem hiding this comment.
Where does b'\x35 * 20` come from and can we get it moved into a constant variable somewhere?
|
|
||
|
|
||
| def generate_header_from_parent_header( | ||
| compute_difficulty, |
There was a problem hiding this comment.
naming nitpick. Might be a bit clearer that this is a callable if it was named compute_difficulty_fn or something else with a clear indicator that it's a callable.
| ) | ||
| ) | ||
|
|
||
| # XXX: Should these and some other checks be moved into |
There was a problem hiding this comment.
Maybe open an issue to revisit this separately.
There was a problem hiding this comment.
XXX: Should these and some other checks be moved into VM.validate_block(), as they apply to all block flavours?
Ah, these two lines and this function were copied from Block.validate(). VMState.validate_block() became kind of applying to all block flavors now.
Another future issue is that this function would need to be refactored for sharding #256 since there's no Collation.uncle.
| # def chaindb(self): | ||
| # return self._chaindb | ||
|
|
||
| def set_chaindb(self, db): |
There was a problem hiding this comment.
This doesn't appear to be used anywhere.
| # further modifications can occur using the `State` object after | ||
| # leaving the context. | ||
| state.db = None | ||
| state._trie = None |
There was a problem hiding this comment.
Note to self:
There was a decomission() method added to the state database object to remove the need to access this private variable. We should backport that improvement from the sharding branch.
| with self.state_db() as state_db: | ||
| # first revert the database state root. | ||
| state_db.root_hash = state_root | ||
| # now roll the underlying database back |
There was a problem hiding this comment.
This comment placement seems wrong, assuming I'm correct that it applies to the next statement self._chaindb.revert(checkpoint_id).
| """ | ||
| Returns the block header by hash. | ||
| """ | ||
| for value in self.prev_headers: |
There was a problem hiding this comment.
It looks like we could store self.prev_headers as a mapping from hash => header so that we can do O(1) lookups rather than repeatedly enumerating this list to find the necessary header.
There was a problem hiding this comment.
I found that I missed that if we store self.prev_headers as a list is good for access block.blockhash(uint blockNumber) in contract code, we could make VMState.get_ancestor_hash(block_number) get block_hash via block_number directly since the prev_headers should be chained already:
def get_ancestor_hash(self, block_number):
"""
Return the hash of the ancestor with the given block number.
"""
ancestor_depth = self.block_header.block_number - block_number - 1
if (ancestor_depth >= MAX_PREV_HEADER_DEPTH or
ancestor_depth < 0 or
ancestor_depth >= len(self.prev_headers)):
return b''
header = self.prev_headers[ancestor_depth]
return header.hashThen we get O(1) for BLOCKHASH opcode.
And since VMState has its block properties, we can make VMState.get_parent_header(block_header) into @property parent_header() for
VMState.validate_block(block):
@property
def parent_header(self):
return self.prev_headers[0]But, unfortunately, VMState.validate_uncle(uncle) still requires calling VMState.get_block_header_by_hash(uncle.parent_hash):
https://github.com/hwwhww/py-evm/blob/81b4b1ae1c854c97540b63ebd41f196c83a35dad/evm/vm/forks/frontier/vm_state.py#L305
Look on the bright side, MAX_UNCLE_DEPTH is 6.
With these changes, I'd vote for accessing via block_number directly for sharding clients. But for non-sharding clients, they may call validate_uncle more than call BLOCKHASH(many-ancestors-ago).
p.s. EIP 96 would make BLOCKHASH very different: ethereum/EIPs#210
gsalgado
left a comment
There was a problem hiding this comment.
Looks good to me, but I think it might make sense to move some BaseVMState classmethods into standalone functions
| """ | ||
| if chaindb is None: | ||
| chaindb = self.chaindb | ||
| if block_header is None: # TODO: remove |
There was a problem hiding this comment.
Can you maybe explain (in the comment) why we want to remove this?
There was a problem hiding this comment.
Ah sorry, I was thinking about if I can store block_header in prev_headers in some cases, but it turns out because block_header is more like a container object in VMState, it's more reasonable to leave block_header alone now. I'll remove this TODO tag.
Furthermore, there's another TODO issue I mention in this PR and #256, for sharding clients, we need to decouple VMState.block_header the VMState properties (blockhash, timestamp, block_number, difficulty, and gas_limit), in that case, the parameters of VMState.__init__ may become something like:
def __init__(self, chaindb, state_root, block_header, prev_headers, receipts=[])Thanks!
| return cls._state_class | ||
|
|
||
| def get_state(self): | ||
| def get_state(self, chaindb=None, block_header=None, prev_headers=None): |
There was a problem hiding this comment.
No callsites seem to pass a prev_headers argument to this method. Would it make sense to remove it?
| old_receipt = _make_frontier_receipt(vm_state, transaction, computation) | ||
|
|
||
| receipt = Receipt( | ||
| state_root=b'' if computation.is_error else b'\x01', |
There was a problem hiding this comment.
What does b'\x01' mean here? Should it be a constant?
There was a problem hiding this comment.
It was copied from evm.vm.forks.byzantium.blocks:
https://github.com/ethereum/py-evm/blob/master/evm/vm/forks/byzantium/blocks.py#L30
This protocol change is from EIP 98/658 Embedding transaction status code in receipts, 1 for success, 0 for failure.
There was a problem hiding this comment.
I think I support moving these two values into named constants to improve readability.
| @classmethod | ||
| def apply_transaction( | ||
| cls, | ||
| vm_state, |
There was a problem hiding this comment.
It feels weird to have a BaseVMState method that takes a vm_state argument, but maybe I'm missing something?
| :param vm_state: the VMState object | ||
| :param transaction: the transaction need to be applied | ||
| :param block: the block which the transaction applies on | ||
| :param is_stateless: if is_stateless, call VMState.add_transactionto set block |
There was a problem hiding this comment.
It actually calls cls.add_transaction()
| parent_header = copy.deepcopy(prev_headers[0]) | ||
|
|
||
| computation, block, _ = FrontierVMState.apply_transaction( | ||
| vm_state1, |
There was a problem hiding this comment.
This is what I meant; it doesn't feel right to call a classmethod passing an instance of that same class as its first argument. I guess that's because we wanted to make apply_transaction() a pure function, but then maybe we should make it into a standalone function as well, to avoid this weirdness when calling it?
There was a problem hiding this comment.
True!
Actually, I think after the discussion in #247 (comment) and keeping VMState mutable, another way we can do is to change them back to regular instance method def apply_transaction(self, transaction, block, is_stateless=True). And so do VMState.execute_transaction(...) and VMState.make_receipt(...) methods.
@pipermerriam what do you think?
There was a problem hiding this comment.
Yes, going back to regular instance methods seems to make the most sense.
1. Moved Block.add_transaction to VM.add_transaction 2. Removed Block.chaindb and moved all related member functions to VM. (Make Block dumb)
1. Added VMState forks 2. Moved VM.get_computation() to VMState.get_computation() 3. Moved `execute_transaction`, `make_receipt`, `validate_block` and `validate_uncle` from `VM` to `VMState` 4. Configured `opcodes` and `precompiles` in `VMState.__init__(...)` 5. Make sure that VMState only do read-only operations on the chaindb
…reads and writes dicts
1. Added VMState level apply_transaction and add_transaction. 2. Assigned `Block.db = BaseChainDB(MemoryDB()).db`, only for generating transaction_root and receipt_root in `VMState` level. 3. Set `computation_class` in `VMState`. 4. Applied witness data as database.
…teless state transition
1. Fixed VM.create_block() 2. Added more tests of test_vm.py and test_vm_state.py 3. Using copy.deepcopy to prevent VMState.apply_transaction from mutating the given witness_db
…ransaction functions
1. Store receipts in VMState and removed Block.db 2. Rebuild the transaction trie and receipt trie in every apply_transaction
…v_headers(last_block_hash, db) into standalone functions
…et_parent_header(block_header) into "the" VMState.parent_header
81b4b1a to
08c7a84
Compare
Changelog
@pipermerriam @gsalgado |
08c7a84 to
a8b3719
Compare
a8b3719 to
2b6df98
Compare
|
|
What was wrong?
For #195: pure apply_transaction + idea of #236
How was it fixed?
The original goal is to implement:
Now we have:
computation.access_logs.readsandcomputation.access_logs.readsare the reads and writes sets.witness_dbisdb.stateobjinformation would be acquired fromvm_stateandblockblockdatainformation would be be acquired fromblock1. Made
BlockRLP object dumb2. Cherry-picked
TrackedDBfromshardingbranch3. Refactored
VM,VMState, andBlockVMStateforkscomputation_classinVMStateand movedVM.get_computation()toVMState.get_computation()BlocktoVMBlock.mine(**kwargs)intoVM.pack_block(block, *args, **kwargs)BlocktoVMStateVM.execute_transaction(transaction)function intoVMState.execute_transaction(transaction)BlocktoVMStateBlock.make_receipt(transaction, computation)intoVMState.make_receipt(transaction, computation)apply_transactionandadd_transactionVM.apply_transaction(transaction)function intoVMState.apply_transactionn(transaction)Block.add_transaction(transaction, computation)intoVMState. add_transaction(vm_state, transaction, computation)VM._is_statelessflag; ifVM.is_statelessisTrue, use witness data aschaindb; otherwise, useChain.chaindbas before.Block.db = BaseChainDB(MemoryDB()).db, only for generating transaction_root and receipt_root inVMStatelevel.VMState.apply_transactionreturns the updatedblockandtrie_datatoVM.apply_transaction, wheretrie_datais the key-value list of transactions and receipts data that should be store inVM.chaindbfor non-stateless clients.apply_transaction is tiggeredbyChain, the path would be:4. Add
AccessLogsclass to holdreadsandwriteslogs fromTrackedDB.5. Create block (unfinished)
@classmethod create_block(transactions, transaction_witnesses, prev_state_root, parent_header, coinbase)to create a block with the giventransaction_witness.tests/core/vm/test_vm.py::test_create_block.6. Added
VMState.prev_headersclass instance, now all the parent header queries inVMStatelevel are usingprev_headerdata.7. Storing receipts in
VMState.receiptsand rebuilding the trie root in everyapply_transaction8. Refactored reward methods: moved from
base.pyto Frontier9. Moved
VMState.get_parent_header(block_header, db)andVMState.get_prev_headers(last_block_hash, db)into standalone functionsp.s. Sorry for that I reset and squash some commits and then my
git mvturns into mv, so it's unclear to diffevm/vm/vm_state.pyandevm/vm_state.py.Cute Animal Picture
TODOs for other PRs
VMState.block_headertheVMStateproperties (blockhash, timestamp, block_number, difficulty, and gas_limit), because shard chains useperiod_start_prevhashblock properties of main chain in contracts, butVMState.state_db()should useVMState.collation_header's state_root.is_statelessflag, make it only one route.I will open new issues later.