From 551f2e3f8eb663a53bdc24b52448d8f2c01bfee9 Mon Sep 17 00:00:00 2001 From: Alessandro De Blasis Date: Wed, 22 Feb 2023 12:00:11 +0000 Subject: [PATCH 01/13] docs(persistence): Added SAVEPOINTS_ROLLBACKS.md design document --- persistence/docs/SAVEPOINTS_ROLLBACKS.md | 146 +++++++++++++++++++++++ 1 file changed, 146 insertions(+) create mode 100644 persistence/docs/SAVEPOINTS_ROLLBACKS.md diff --git a/persistence/docs/SAVEPOINTS_ROLLBACKS.md b/persistence/docs/SAVEPOINTS_ROLLBACKS.md new file mode 100644 index 000000000..9c6768437 --- /dev/null +++ b/persistence/docs/SAVEPOINTS_ROLLBACKS.md @@ -0,0 +1,146 @@ +# Savepoints and Rollbacks Design + +This document is a design guideline that will be used as a technical reference for the implementation of the `Savepoints` and `Rollbacks`. + +The entry points will be in the `Persistence` module but there are going to be points of integration with `Utility` as well. + + +- [Background](#background) + - [Data Consistency (the "C" in CAP Theorem)](#data-consistency-the-c-in-cap-theorem) +- [Definitions](#definitions) + - [Savepoints](#savepoints) + - [Rollbacks](#rollbacks) + - [Minimum Viable Product](#minimum-viable-product) + - [Improvements over MVP (Ideas)](#improvements-over-mvp-ideas) + - [Savepoints](#savepoints-1) + - [Rollbacks](#rollbacks-1) + - [Further improvements](#further-improvements) +- [Random thoughts](#random-thoughts) + +## Background + +At the time of writing, it seems that we identified the points within the codebase in which we should take some action to support savepoints and rollbacks. +This means that we probably know the **WHEN**s and **WHERE**s, the scope of this document is to identify the **WHAT**s and **HOW**s. + +It might sound simple, but the ability to recover from a failure that would prevent the node from committing a block deterministically is a critical feature and it's probably a non-trivial problem to solve. + +As it stands we use multiple data stores (please refer to [PROTOCOL_STATE_HASH.md](../PROTOCOL_STATE_HASH.md) for additional information about their designs): + +| Component | Data Type | Underlying Storage Engine | +| --------------------- | ------------------------------------- | --------------------------------- | +| Data Tables | SQL Database / Engine | PostgresSQL | +| Transaction Indexer | Key Value Store | BadgerDB | +| Block Store | Key Value Store | BadgerDB | +| Merkle Trees | Merkle Trie backed by Key-Value Store | BadgerDB | + +Something worth mentioning specifically about `Merkle Trees` is the fact that we store separate trees for the individual `Actor` types (i.e. `App`, `Validator`, `Fisherman`, etc.), for the `Accounts`/`Pools` and for the data types such as `Transactions`, `Params` and `Flags`. + +This means that technically each one of these trees is a separate data store. + +### Data Consistency (the "C" in CAP Theorem) + +We cannot really make the assumption, especially in this day and age, but in the simplest case, we could very much have a monolytic setup where the node is running on the same machine as the `PostgresSQL` database and the `BadgerDB` key-value stores but that would not change the fact that we are dealing with a distributed system since each one of these components is a separate process that could fail independently. + +Imagine a scenario in which the state is committed to the `PostgresSQL` database but one/some of the `BadgerDB` key-value stores fails due to storage issues. What would happen next? That's **non-deterministic**. + +Even a single node is a small distributed system because it has multiple separate components that are communicating with each other (on the same machine or via unreliable networks). + +Since a node is part of a distributed network of nodes that are all trying to reach consensus on the "state of the world", we have to make sure that the data that we are storing is consistent internally in the first place. + +In the event of a failure at any level during this process, we cannot commit state non-atomically. That would make the node inconsistent and put it in a non-deterministic state that is not recoverable unless we have a way to rollback to a previous, clean, state with all the implications that this would have on the network. (social coordination comes to mind) + +We either **succeed** across **all** data stores or we **fail** and we have to be able to **recover** from that failure like if **nothing has happened**. + + +The following diagram illustrates the high-level flow: + +```mermaid +flowchart TD + NewBlock[A Block was just committed\nor we are at Height 0] -->CreateSavepoint(Create SavePoint) + CreateSavepoint --> UpdatingState(We are creating\na new block.\nUpdating the State) + UpdatingState --> ShouldCommit + ShouldCommit{Should we commit?} --> |Yes| ReceivedOKFromDatastores{Have all\nthe datastores\ncommitted\nsuccessfully?} + ShouldCommit --> |No| NewBlock + ReceivedOKFromDatastores -->|Yes| Committed[Committed successfully] --> |Repeat with\nthe new Block| NewBlock + ReceivedOKFromDatastores -->|No, we have at least one failure| Rollback[Rollback] --> |Repeat with\nthe rolled back Block | NewBlock + +``` + +## Definitions + +### Savepoints + +A savepoint (also referred as checkpoints/snapshots depending on the implementation) is either the beginning of database transaction (or distributed transaction) or some sort of artifact that was created right after a successful commit happened that allows to recreate a perfect copy of the state at the time it was created. + +### Rollbacks + +A rollback is the process of cleanly reverting the state of the node to a previous state that we have saved in a savepoint. + +### Minimum Viable Product + +After having examined the `Persistence` and `Utility` modules, I have identified the following areas that we can consider as part of the MVP, these could be used as individual Github issues/deliverables potentially: + +- [**State changes invalidation and rollback triggering**] We need some sort of shared and thread-safe reference that is available for us across the whole call-stack that we can use in the event of a failure to flag that we need to abort whatever we are doing and rollback. This could be achieved via the use of the [context](https://pkg.go.dev/context) package. + +- [**Ensure atomicity across data stores**] We need to make sure that we are using transactions correctly and that we are not accidentally committing state ahead of time in any of the data-stores. + +- [**Distributed commits across data-stores**] We need to implement a 2PC (two-phase commit) or 3PC (three-phase commit) protocol that should make sure that the state has been committed safely on **all** data stores before it is considered `valid`. + + > TODO: **Discuss what is appropriate to do here** + +- [**Savepoints**] The simplest version could be the database transaction that can simply be discarded, basically the uncommitted transaction sits in memory until it's flushed to storage and rolling back to a savepoint would be as simple as discarding the non-pristine version of the state in memory. + +- [**Rollbacks**] Rolling back to a savepoint would mean not only that the state has been restored to the previous savepoint but also that the node has to go back into a state that allows it to proceed with its normal operation (i.e. all the modules should behave as if nothing has happened) + +- [**Extensive testing**] We need to make sure that we have a good test coverage for all the above scenarios and that we can simulate failures in a controlled environment. + +### Improvements over MVP (Ideas) + +Apart from internal failures that should resolve themselves automatically whenever possible, nodes might require a way to save their state and restore it later, not necessarily at the previous block height. This could be useful for a number of reasons: + +- To allow nodes to recover from an unforeseen crash (bugs) +- To facilitate socially coordinated rollbacks of the chain to a specific block height in the case of a consensus failure/chain-halt +- To improve operator experience when managing and upgrading their fleet of nodes +- And more... + +Performing operations like: copying datadirs, compressing them, etc. is probably not the best approach. +Having a first-class support for savepoints and rollbacks would be, IMHO, a much better solution. + +A more advanced/flexible version of **Savepoints** could be to efficiently serialize the state into an artifact (non-trivial also because we have multiple data-stores but definitely possible), maybe leveraging previous savepoints to only store the changes that have been made since the last one and/or using some version of a WAL (Write-Ahead log that records the changes that happened). + +#### Savepoints + +A savepoint must have the following properties: + +- It must be able to be created at any block height +- It must be self-contained and/or easy to move around (e.g. a file/archive) +- The action of creating a savepoint must be atomic (i.e. it must be impossible to create a savepoint that is incomplete) +- The action of creating a savepoint must be easy to perform (e.g. a CLI command and/or a flag being passed to the node binary when starting it) +- The action of creating a savepoint must be as fast as possible +- The operator should be informed with meaningful messages about the progress of the savepoint creation process (telemetry, logging and stdout come to mind) +- It should be, as much as possible, compact +- It must have some form of integrity check mechanism (e.g. checksum/hash verification and maybe even a signature that could be very useful in the case of a social rollback) + +#### Rollbacks + +The following are some of the properties that a rollback mechanism must have: + +- If the operator requested a rollback, regardless of the internal state of the node, it must be able to rollback to the requested block height, provided a valid savepoint +- The rollback process must be atomic (i.e. it must be impossible to rollback to a block height has incomplete/invalid state) +- The rollback process must be easy to perform (e.g. a CLI command and/or a flag being passed to the node binary when starting it) +- The rollback process must be as fast as possible +- The operator should be informed in a meaningful way about the progress of the rollback process (telemetry, logging and stdout come to mind) + + +### Further improvements + +- Savepoints could be disseminated/retrieved using other networks like Tor/IPFS/etc., this would free-up bandwidth on the main network and could be used in conjunction with `FastSync` to speed up the process of bootstrapping a new node without overwhelming the `P2P` network with a lot of traffic that is not `Protocol`-related. This could be very important when the network reaches critical mass in terms of scale. Hopefully soon. + + For example: a fresh node could be looking for the latest `Savepoint` signed by PNI available, download it from Tor, apply its state and resume the normal sync process from the other nodes from there. + +- Building on top of the latter, when it comes to socially-coordinated rollbacks, the DAO/PNI could designate a `Savepoint` as the one that node-operators should use to rollback to by advertising it somewhere (this could be done in several ways, using alternative decentralized networks/protocols is preferable for increased resiliency: IPFS, Ethereum smart-contract interaction, etc.). This would allow node-operators to be notified, even programmatically, without having to rely on `Discord` (which is not decentralized anyway...) or other ways of coordination in order to "get the news" about the latest `Savepoint` that they should use to rollback to. + +## Random thoughts + +- I wonder if serialized, compressed and signed `Merkle Patricia Trie`s could be leveraged as a media for storing `Savepoint`s in a space-efficient and "blockchain-native" way 🤔. + From 9743e879a94dc73ea771bce7e0a0aca1ff144f41 Mon Sep 17 00:00:00 2001 From: Alessandro De Blasis Date: Wed, 22 Feb 2023 12:00:26 +0000 Subject: [PATCH 02/13] docs(persistence): changelog --- persistence/docs/CHANGELOG.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/persistence/docs/CHANGELOG.md b/persistence/docs/CHANGELOG.md index 378cf3b85..a3b46361e 100644 --- a/persistence/docs/CHANGELOG.md +++ b/persistence/docs/CHANGELOG.md @@ -7,6 +7,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] +## [0.0.0.37] - 2023-02-22 + +- Added `SAVEPOINTS_ROLLBACKS.md` design document + ## [0.0.0.36] - 2023-02-17 - Module now embeds `base_modules.IntegratableModule` for DRYness From 6d63f0eb9381a4ef0a4f37511df90ae9e84e2226 Mon Sep 17 00:00:00 2001 From: Alessandro De Blasis Date: Tue, 28 Feb 2023 21:26:24 +0000 Subject: [PATCH 03/13] chore(persistence): grammatical nits --- persistence/docs/SAVEPOINTS_ROLLBACKS.md | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/persistence/docs/SAVEPOINTS_ROLLBACKS.md b/persistence/docs/SAVEPOINTS_ROLLBACKS.md index 9c6768437..5536a3740 100644 --- a/persistence/docs/SAVEPOINTS_ROLLBACKS.md +++ b/persistence/docs/SAVEPOINTS_ROLLBACKS.md @@ -22,7 +22,7 @@ The entry points will be in the `Persistence` module but there are going to be p At the time of writing, it seems that we identified the points within the codebase in which we should take some action to support savepoints and rollbacks. This means that we probably know the **WHEN**s and **WHERE**s, the scope of this document is to identify the **WHAT**s and **HOW**s. -It might sound simple, but the ability to recover from a failure that would prevent the node from committing a block deterministically is a critical feature and it's probably a non-trivial problem to solve. +It might sound simple, but the ability to recover from a failure that would prevent the node from committing a block deterministically is a critical feature, and it's a non-trivial problem to solve. As it stands we use multiple data stores (please refer to [PROTOCOL_STATE_HASH.md](../PROTOCOL_STATE_HASH.md) for additional information about their designs): @@ -33,19 +33,19 @@ As it stands we use multiple data stores (please refer to [PROTOCOL_STATE_HASH.m | Block Store | Key Value Store | BadgerDB | | Merkle Trees | Merkle Trie backed by Key-Value Store | BadgerDB | -Something worth mentioning specifically about `Merkle Trees` is the fact that we store separate trees for the individual `Actor` types (i.e. `App`, `Validator`, `Fisherman`, etc.), for the `Accounts`/`Pools` and for the data types such as `Transactions`, `Params` and `Flags`. +Something worth mentioning specifically about `Merkle Trees` is the fact that we store a separate tree for each `Actor` type (i.e. `App`, `Validator`, `Fisherman`, etc.), for `Accounts` & `Pools` and for the data types such as `Transactions`, `Params` and `Flags`. -This means that technically each one of these trees is a separate data store. +This means that each tree is a separate data store. ### Data Consistency (the "C" in CAP Theorem) -We cannot really make the assumption, especially in this day and age, but in the simplest case, we could very much have a monolytic setup where the node is running on the same machine as the `PostgresSQL` database and the `BadgerDB` key-value stores but that would not change the fact that we are dealing with a distributed system since each one of these components is a separate process that could fail independently. +We cannot make the assumption, especially in this day and age, but in the simplest case, we could very much have a monolithic setup where the node is running on the same machine as the `PostgresSQL` database and the `BadgerDB` key-value stores, but that would not change the fact that we are dealing with a distributed system since each one of these components is a separate process that could fail independently. Imagine a scenario in which the state is committed to the `PostgresSQL` database but one/some of the `BadgerDB` key-value stores fails due to storage issues. What would happen next? That's **non-deterministic**. Even a single node is a small distributed system because it has multiple separate components that are communicating with each other (on the same machine or via unreliable networks). -Since a node is part of a distributed network of nodes that are all trying to reach consensus on the "state of the world", we have to make sure that the data that we are storing is consistent internally in the first place. +Since a node is part of a distributed network of nodes that are all trying to reach consensus on the "World State", we have to make sure that the data that we are storing is consistent internally in the first place. In the event of a failure at any level during this process, we cannot commit state non-atomically. That would make the node inconsistent and put it in a non-deterministic state that is not recoverable unless we have a way to rollback to a previous, clean, state with all the implications that this would have on the network. (social coordination comes to mind) @@ -63,14 +63,13 @@ flowchart TD ShouldCommit --> |No| NewBlock ReceivedOKFromDatastores -->|Yes| Committed[Committed successfully] --> |Repeat with\nthe new Block| NewBlock ReceivedOKFromDatastores -->|No, we have at least one failure| Rollback[Rollback] --> |Repeat with\nthe rolled back Block | NewBlock - ``` ## Definitions ### Savepoints -A savepoint (also referred as checkpoints/snapshots depending on the implementation) is either the beginning of database transaction (or distributed transaction) or some sort of artifact that was created right after a successful commit happened that allows to recreate a perfect copy of the state at the time it was created. +A savepoint (also called checkpoints/snapshots, depending on the implementation) is either the beginning of a database transaction (or distributed transaction) or some artifact created right after a successful commit happened that allows recreating a perfect copy of the state at the time it was created. ### Rollbacks From cfac74834e14004b0486f652f945c42ace26bac3 Mon Sep 17 00:00:00 2001 From: Alessandro De Blasis Date: Wed, 1 Mar 2023 16:13:37 +0000 Subject: [PATCH 04/13] chore(persistence): committed -> written --- persistence/docs/SAVEPOINTS_ROLLBACKS.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/persistence/docs/SAVEPOINTS_ROLLBACKS.md b/persistence/docs/SAVEPOINTS_ROLLBACKS.md index 5536a3740..1f2d4cc06 100644 --- a/persistence/docs/SAVEPOINTS_ROLLBACKS.md +++ b/persistence/docs/SAVEPOINTS_ROLLBACKS.md @@ -41,7 +41,7 @@ This means that each tree is a separate data store. We cannot make the assumption, especially in this day and age, but in the simplest case, we could very much have a monolithic setup where the node is running on the same machine as the `PostgresSQL` database and the `BadgerDB` key-value stores, but that would not change the fact that we are dealing with a distributed system since each one of these components is a separate process that could fail independently. -Imagine a scenario in which the state is committed to the `PostgresSQL` database but one/some of the `BadgerDB` key-value stores fails due to storage issues. What would happen next? That's **non-deterministic**. +Imagine a scenario in which the state is persisted durably (what happens after an sql `COMMIT` statement) to the `PostgresSQL` database but one/some of the `BadgerDB` key-value stores fails due to storage issues. What would happen next? That's **non-deterministic**. Even a single node is a small distributed system because it has multiple separate components that are communicating with each other (on the same machine or via unreliable networks). From dbd741c4fddb2fb9a7728a8827fe5ad5852ca671 Mon Sep 17 00:00:00 2001 From: Alessandro De Blasis Date: Wed, 1 Mar 2023 17:05:46 +0000 Subject: [PATCH 05/13] feat(persistence): tooling --- persistence/docs/SAVEPOINTS_ROLLBACKS.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/persistence/docs/SAVEPOINTS_ROLLBACKS.md b/persistence/docs/SAVEPOINTS_ROLLBACKS.md index 1f2d4cc06..b66ced19b 100644 --- a/persistence/docs/SAVEPOINTS_ROLLBACKS.md +++ b/persistence/docs/SAVEPOINTS_ROLLBACKS.md @@ -93,6 +93,8 @@ After having examined the `Persistence` and `Utility` modules, I have identified - [**Extensive testing**] We need to make sure that we have a good test coverage for all the above scenarios and that we can simulate failures in a controlled environment. +- [**Tooling**] The CLI should provide ways to create savepoints and rollbacks. i.e.: `p1 persistence rollback --num_blocks=5` + ### Improvements over MVP (Ideas) Apart from internal failures that should resolve themselves automatically whenever possible, nodes might require a way to save their state and restore it later, not necessarily at the previous block height. This could be useful for a number of reasons: From a7e5f7b6a9cde6d9c2ecd437daf9ea65e1b1b361 Mon Sep 17 00:00:00 2001 From: Alessandro De Blasis Date: Wed, 8 Mar 2023 09:30:18 +0000 Subject: [PATCH 06/13] Update persistence/docs/SAVEPOINTS_ROLLBACKS.md [skip ci] Co-authored-by: Daniel Olshansky --- persistence/docs/SAVEPOINTS_ROLLBACKS.md | 1 + 1 file changed, 1 insertion(+) diff --git a/persistence/docs/SAVEPOINTS_ROLLBACKS.md b/persistence/docs/SAVEPOINTS_ROLLBACKS.md index b66ced19b..365c1f88d 100644 --- a/persistence/docs/SAVEPOINTS_ROLLBACKS.md +++ b/persistence/docs/SAVEPOINTS_ROLLBACKS.md @@ -102,6 +102,7 @@ Apart from internal failures that should resolve themselves automatically whenev - To allow nodes to recover from an unforeseen crash (bugs) - To facilitate socially coordinated rollbacks of the chain to a specific block height in the case of a consensus failure/chain-halt - To improve operator experience when managing and upgrading their fleet of nodes +- Governance transactions that enable rolling back state subsets - And more... Performing operations like: copying datadirs, compressing them, etc. is probably not the best approach. From 8e7b199b74776be05bef41288862a69ccfe107d2 Mon Sep 17 00:00:00 2001 From: Alessandro De Blasis Date: Wed, 8 Mar 2023 09:30:50 +0000 Subject: [PATCH 07/13] docs(persistence): removed maybe --- persistence/docs/SAVEPOINTS_ROLLBACKS.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/persistence/docs/SAVEPOINTS_ROLLBACKS.md b/persistence/docs/SAVEPOINTS_ROLLBACKS.md index b66ced19b..85a7adeb2 100644 --- a/persistence/docs/SAVEPOINTS_ROLLBACKS.md +++ b/persistence/docs/SAVEPOINTS_ROLLBACKS.md @@ -120,7 +120,7 @@ A savepoint must have the following properties: - The action of creating a savepoint must be as fast as possible - The operator should be informed with meaningful messages about the progress of the savepoint creation process (telemetry, logging and stdout come to mind) - It should be, as much as possible, compact -- It must have some form of integrity check mechanism (e.g. checksum/hash verification and maybe even a signature that could be very useful in the case of a social rollback) +- It must have some form of integrity check mechanism (e.g. checksum/hash verification and even a signature that could be very useful in the case of a social rollback) #### Rollbacks From 0e6af930cab503b993b376a74aad56aa0f117004 Mon Sep 17 00:00:00 2001 From: Alessandro De Blasis Date: Wed, 8 Mar 2023 09:31:42 +0000 Subject: [PATCH 08/13] Update persistence/docs/SAVEPOINTS_ROLLBACKS.md [skip ci] Co-authored-by: Daniel Olshansky --- persistence/docs/SAVEPOINTS_ROLLBACKS.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/persistence/docs/SAVEPOINTS_ROLLBACKS.md b/persistence/docs/SAVEPOINTS_ROLLBACKS.md index 365c1f88d..8a6a804ba 100644 --- a/persistence/docs/SAVEPOINTS_ROLLBACKS.md +++ b/persistence/docs/SAVEPOINTS_ROLLBACKS.md @@ -95,7 +95,7 @@ After having examined the `Persistence` and `Utility` modules, I have identified - [**Tooling**] The CLI should provide ways to create savepoints and rollbacks. i.e.: `p1 persistence rollback --num_blocks=5` -### Improvements over MVP (Ideas) +### Long-term (🚀 🌔 ) ideas Apart from internal failures that should resolve themselves automatically whenever possible, nodes might require a way to save their state and restore it later, not necessarily at the previous block height. This could be useful for a number of reasons: From 949b4bf2d396128b420c5e35259671a4c3dd8926 Mon Sep 17 00:00:00 2001 From: Alessandro De Blasis Date: Wed, 8 Mar 2023 09:32:09 +0000 Subject: [PATCH 09/13] Update persistence/docs/SAVEPOINTS_ROLLBACKS.md [skip ci] Co-authored-by: Daniel Olshansky --- persistence/docs/SAVEPOINTS_ROLLBACKS.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/persistence/docs/SAVEPOINTS_ROLLBACKS.md b/persistence/docs/SAVEPOINTS_ROLLBACKS.md index 8a6a804ba..3e8ce09f3 100644 --- a/persistence/docs/SAVEPOINTS_ROLLBACKS.md +++ b/persistence/docs/SAVEPOINTS_ROLLBACKS.md @@ -120,7 +120,8 @@ A savepoint must have the following properties: - The action of creating a savepoint must be easy to perform (e.g. a CLI command and/or a flag being passed to the node binary when starting it) - The action of creating a savepoint must be as fast as possible - The operator should be informed with meaningful messages about the progress of the savepoint creation process (telemetry, logging and stdout come to mind) -- It should be, as much as possible, compact +- It should be, as much as possible, compact (i.e. zipped) to reduce the size and cost of disseminating snapshots +- A reduction in the snapshot size should be prioritized over its compression speed since it is an infrequent event - It must have some form of integrity check mechanism (e.g. checksum/hash verification and maybe even a signature that could be very useful in the case of a social rollback) #### Rollbacks From 28f8fa58d1088555d1c3077a63c39cb1007a5c14 Mon Sep 17 00:00:00 2001 From: Alessandro De Blasis Date: Thu, 9 Mar 2023 13:25:40 +0000 Subject: [PATCH 10/13] docs(persistence): reworded definitions --- persistence/docs/SAVEPOINTS_ROLLBACKS.md | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/persistence/docs/SAVEPOINTS_ROLLBACKS.md b/persistence/docs/SAVEPOINTS_ROLLBACKS.md index c68553554..ef5dac478 100644 --- a/persistence/docs/SAVEPOINTS_ROLLBACKS.md +++ b/persistence/docs/SAVEPOINTS_ROLLBACKS.md @@ -9,9 +9,10 @@ The entry points will be in the `Persistence` module but there are going to be p - [Data Consistency (the "C" in CAP Theorem)](#data-consistency-the-c-in-cap-theorem) - [Definitions](#definitions) - [Savepoints](#savepoints) + - [Snapshots](#snapshots) - [Rollbacks](#rollbacks) - [Minimum Viable Product](#minimum-viable-product) - - [Long-term (🚀 🌔 ) ideas](#long-term----ideas) + - [Long-term (🚀 🌔) ideas](#long-term---ideas) - [Savepoints](#savepoints-1) - [Rollbacks](#rollbacks-1) - [Further improvements](#further-improvements) @@ -69,7 +70,11 @@ flowchart TD ### Savepoints -A savepoint (also called checkpoints/snapshots, depending on the implementation) is either the beginning of a database transaction (or distributed transaction) or some artifact created right after a successful commit happened that allows recreating a perfect copy of the state at the time it was created. +A savepoint is either the beginning of a database transaction (or distributed transaction) created right after a successful commit happened that allows recreating a perfect copy of the state at the time it was created. + +### Snapshots + +A snapshot is an artifact that encapsulates a savepoint. In V0 terms, it would be a shareable copy of the data directory. In V1 terms it's going to be a compressed archive that once decompressed and loaded into the node, it allows us to recover the state of the node at the height at which the snapshot was created. ### Rollbacks @@ -85,8 +90,6 @@ After having examined the `Persistence` and `Utility` modules, I have identified - [**Distributed commits across data-stores**] We need to implement a 2PC (two-phase commit) or 3PC (three-phase commit) protocol that should make sure that the state has been committed safely on **all** data stores before it is considered `valid`. - > TODO: **Discuss what is appropriate to do here** - - [**Savepoints**] The simplest version could be the database transaction that can simply be discarded, basically the uncommitted transaction sits in memory until it's flushed to storage and rolling back to a savepoint would be as simple as discarding the non-pristine version of the state in memory. - [**Rollbacks**] Rolling back to a savepoint would mean not only that the state has been restored to the previous savepoint but also that the node has to go back into a state that allows it to proceed with its normal operation (i.e. all the modules should behave as if nothing has happened) @@ -95,7 +98,7 @@ After having examined the `Persistence` and `Utility` modules, I have identified - [**Tooling**] The CLI should provide ways to create savepoints and rollbacks. i.e.: `p1 persistence rollback --num_blocks=5` -### Long-term (🚀 🌔 ) ideas +### Long-term (🚀 🌔) ideas Apart from internal failures that should resolve themselves automatically whenever possible, nodes might require a way to save their state and restore it later, not necessarily at the previous block height. This could be useful for a number of reasons: From 045454b41760c74cd5ca2ddade62a83d986131a6 Mon Sep 17 00:00:00 2001 From: Alessandro De Blasis Date: Thu, 9 Mar 2023 13:43:12 +0000 Subject: [PATCH 11/13] docs(persistence): added mina protocol to moonshot section --- persistence/docs/SAVEPOINTS_ROLLBACKS.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/persistence/docs/SAVEPOINTS_ROLLBACKS.md b/persistence/docs/SAVEPOINTS_ROLLBACKS.md index ef5dac478..ac2eaffd7 100644 --- a/persistence/docs/SAVEPOINTS_ROLLBACKS.md +++ b/persistence/docs/SAVEPOINTS_ROLLBACKS.md @@ -111,7 +111,9 @@ Apart from internal failures that should resolve themselves automatically whenev Performing operations like: copying datadirs, compressing them, etc. is probably not the best approach. Having a first-class support for savepoints and rollbacks would be, IMHO, a much better solution. -A more advanced/flexible version of **Savepoints** could be to efficiently serialize the state into an artifact (non-trivial also because we have multiple data-stores but definitely possible), maybe leveraging previous savepoints to only store the changes that have been made since the last one and/or using some version of a WAL (Write-Ahead log that records the changes that happened). +A more advanced/flexible version of **Savepoints** could be to efficiently serialize the state into an artifact (**Snapshot**) (non-trivial also because we have multiple data-stores but definitely possible), maybe leveraging previous savepoints to only store the changes that have been made since the last one and/or using some version of a WAL (Write-Ahead log that records the changes that happened). + +**Snapshot** hash verification using the `genesis.json` file and [Mina protocol](https://minaprotocol.com/lightweight-blockchain) #### Savepoints From a5f6d18ea3218aacdb870e76a6400642d1b69006 Mon Sep 17 00:00:00 2001 From: Alessandro De Blasis Date: Thu, 9 Mar 2023 13:45:11 +0000 Subject: [PATCH 12/13] chore(persistence): changelog --- persistence/docs/CHANGELOG.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/persistence/docs/CHANGELOG.md b/persistence/docs/CHANGELOG.md index 571ee817f..b25c1e26c 100644 --- a/persistence/docs/CHANGELOG.md +++ b/persistence/docs/CHANGELOG.md @@ -7,7 +7,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] -## [0.0.0.38] - 2023-02-22 +## [0.0.0.38] - 2023-03-09 - Added `SAVEPOINTS_ROLLBACKS.md` design document From 767235d079835590e2a2086d4cb67177bd63a19b Mon Sep 17 00:00:00 2001 From: Alessandro De Blasis Date: Thu, 6 Apr 2023 22:42:40 +0100 Subject: [PATCH 13/13] docs(persistence): GITHUB_WIKI --- persistence/docs/SAVEPOINTS_ROLLBACKS.md | 1 + 1 file changed, 1 insertion(+) diff --git a/persistence/docs/SAVEPOINTS_ROLLBACKS.md b/persistence/docs/SAVEPOINTS_ROLLBACKS.md index ac2eaffd7..0df7b1b9d 100644 --- a/persistence/docs/SAVEPOINTS_ROLLBACKS.md +++ b/persistence/docs/SAVEPOINTS_ROLLBACKS.md @@ -152,3 +152,4 @@ The following are some of the properties that a rollback mechanism must have: - I wonder if serialized, compressed and signed `Merkle Patricia Trie`s could be leveraged as a media for storing `Savepoint`s in a space-efficient and "blockchain-native" way 🤔. +