feat: add draft implementation for canonical raft by niebayes · Pull Request #5933 · jina-ai/serve

niebayes · 2023-06-26T16:07:31Z

Goals:

Add draft implementation for canonical Raft, i.e. all requests, whether read or write requests, go to the Raft layer.

JoanFM

should we consider consistency mlde as part of the general raft_configuration?

JoanFM · 2023-06-26T19:26:33Z

jina/parsers/orchestrate/pod.py

        nargs='+',
    )

+    gp.add_argument(


this should be at Deployment level, not Pod level

actually, there is a specific section whrre u will find raft related args

Look for mixin_raft_parser in https://github.com/jina-ai/jina/blob/defd001ab257ac16c7e8b38113d3a4bffd5d226b/jina/parsers/orchestrate/runtimes/runtime.py#L27

JoanFM · 2023-06-26T19:27:21Z

jina/serve/consensus/jina.proto

    repeated string endpoints = 1;
-    repeated string write_endpoints = 2;
+    // TODO(niebayes): regenerate proto files.
+    repeated string read_endpoints = 2;


no need to add an extra list, the difference between all the endpoints and write endpoints is already the read endpoints

Are there any endpoints not for read nor for write? For e.g. endpoints about shutting down an executor, or about upgrade an executor?
On the other hand, some endpoints may not want to involve Raft at all, i.e. we simply forward them to the executor directly rather than letting raft worker forward them.

There are only write and read.

JoanFM · 2023-06-26T19:36:27Z

jina/serve/executors/decorators.py

            else:
                return fn

+        def _unwrap_read_decorator(self, fn):


forget about read decorator, everything that js not write, is read

JoanFM · 2023-06-26T21:08:35Z

Are we sure this is what we want? I think it does not make sense to have Read requests go fully into the Raft Layer. This would make the log grow insanely large for no reason. Isn't there a better way to achieve strong consensus?

niebayes · 2023-06-27T02:59:57Z

@JoanFM You are right, if letting read requests go into the Raft layer, the log will grow insanely if the workload is read skewed. However, if not do so, the strong consistency mode is not guaranteed.
I'm currently adding an optimization called read index which makes the read requests not go into the Raft layer. It works as such:

client sends a read request to the leader
leader asks the Raft layer for its committed index. This index is stored and associated with the read request. We call this index the read index.
leader then broadcasts a checkQuorum RPC to all others.
followers respond with acknowledging the leadership or rejecting.
if leader accepts a majority of acknowledgements, the leader could confirm its leadership and it assures the leader has the most up-to-date log. Now, the leader waits for its applied index go equal or beyond the read index. At that moment, the leader could safely read the FSM without worrying breaks the strong consistency mode.

This optimization only involves an extra round of RPC interaction and read requests do not need to go fully into the Raft layer and hence log would not grow

I've investigated that Hashicorp Raft supports querying about committed index and applied index. But I'm not sure if it supports invoking a checkQuorum-like RPC. If not, I have to slightly modify Hashicorp Raft manually. Don't worry, I could handle it.

niebayes · 2023-06-27T03:10:29Z

@JoanFM About whether or not wrapping consistency mode into the raft general configuration:
If you mean the configuration for Hashicorp Raft, no, we shall not. Since the consistency mode controls the behavior of the server layer.
Besides, I'm reading through Hashicorp consul codebase so that I can get more clue about how consul interacts with Hashicorp Raft.

niebayes · 2023-06-27T03:13:46Z

@JoanFM About the RpcInterface:

Why we create an RpcInterface instance for each server registration? Is there any chance we could pass the reference for a single RpcInterface instance?

I want to implement the RpcInterface as a coordinator which manages the communication between the raft layer and the executor FSM. Can you provide any advices?

JoanFM · 2023-06-27T06:25:32Z

@JoanFM About the RpcInterface:

Why we create an RpcInterface instance for each server registration? Is there any chance we could pass the reference for a single RpcInterface instance?

I want to implement the RpcInterface as a coordinator which manages the communication between the raft layer and the executor FSM. Can you provide any advices?

For this I am not sure, maybe u are right and we could go with only an instance

JoanFM · 2023-06-27T06:27:10Z

@JoanFM About whether or not wrapping consistency mode into the raft general configuration: If you mean the configuration for Hashicorp Raft, no, we shall not. Since the consistency mode controls the behavior of the server layer. Besides, I'm reading through Hashicorp consul codebase so that I can get more clue about how consul interacts with Hashicorp Raft.

@JoanFM You are right, if letting read requests go into the Raft layer, the log will grow insanely if the workload is read skewed. However, if not do so, the strong consistency mode is not guaranteed. I'm currently adding an optimization called read index which makes the read requests not go into the Raft layer. It works as such:

client sends a read request to the leader

leader asks the Raft layer for its committed index. This index is stored and associated with the read request. We call this index the read index.

leader then broadcasts a checkQuorum RPC to all others.

followers respond with acknowledging the leadership or rejecting.

if leader accepts a majority of acknowledgements, the leader could confirm its leadership and it assures the leader has the most up-to-date log. Now, the leader waits for its applied index go equal or beyond the read index. At that moment, the leader could safely read the FSM without worrying breaks the strong consistency mode.

This optimization only involves an extra round of RPC interaction and read requests do not need to go fully into the Raft layer and hence log would not grow

I've investigated that Hashicorp Raft supports querying about committed index and applied index. But I'm not sure if it supports invoking a checkQuorum-like RPC. If not, I have to slightly modify Hashicorp Raft manually. Don't worry, I could handle it.

Manually editing Hashicorl Raft is not a good option as then we would need to mantain the 2 codebases and make sure that updates on hashicorp are compatible, to me this is only good if Hashicorp takes a PR.

JoanFM · 2023-06-27T06:33:44Z

general configuration: If you mean the configuration

this can be a useful link:
hashicorp/raft#436

niebayes · 2023-06-27T08:25:06Z

I've made a PR hashicorp/raft#560 about adding a CommitIndex for Hashicorp Raft.
With this, we can request the current commit index from Hashicorp Raft

niebayes · 2023-06-27T08:29:31Z

I've also noticed that Hashicorp Raft has provided an API verifyLeader which seems equivalent to checkQuorum.

I'm not familiar about the usage of the API and the future. I'm wondering what I need to do is calling the verifyLeader API which gives me a future and then calling Error on the future to wait for the response. Am I right?

JoanFM

Changes requested, I like the refactoring on the RpcInterface.

Highlights:

Remove the logic about ReadEndpoint as it is unnecessary.
Processing ReadRequests in Strong consistency mode has to be done more optimally but without changing Hashicorp Raft library.

Good progress and understanding of the library and codebase I am seeing, thanks a lot and congrats

JoanFM · 2023-06-27T08:26:12Z

jina/parsers/orchestrate/pod.py

+    gp.add_argument(
+        '--consistency-mode',
+        type=str,
+        default='Strong',


Make Eventual the default. Plus get the string from the Enum

JoanFM · 2023-06-27T08:27:58Z

jina/parsers/orchestrate/pod.py

        nargs='+',
    )

+    gp.add_argument(


Look for mixin_raft_parser in https://github.com/jina-ai/jina/blob/defd001ab257ac16c7e8b38113d3a4bffd5d226b/jina/parsers/orchestrate/runtimes/runtime.py#L27

JoanFM · 2023-06-27T08:28:36Z

jina/serve/consensus/jina_raft/fsm.go

    }
+
+    // TODO(niebayes): regenerate proto files.
+    read_endpoints := response.ReadEndpoints


no need for this, the logic of write and read was there, there is no other set of endpoints

JoanFM · 2023-06-27T08:30:32Z

jina/serve/consensus/jina_raft/rpc.go

        }
+    }
+
+    if !found {


this does not exist

JoanFM · 2023-06-27T08:30:51Z

jina/serve/executors/__init__.py

            threading.Lock()
        )  # watch because this makes it no serializable

+        overlap_endpoints = set(self.read_endpoints).intersection(self.write_endpoints)


no need for this logic

JoanFM · 2023-06-27T08:30:57Z

jina/serve/executors/__init__.py

            return self._requests

+    @property
+    def read_endpoints(self):


JoanFM · 2023-06-27T08:31:01Z

jina/serve/executors/decorators.py

        _inherit_from_parent_class_inner(cls)


+def read(


JoanFM · 2023-06-27T08:31:12Z

jina/serve/runtimes/worker/request_handling.py

        endpoints_proto = jina_pb2.EndpointsProto()
        endpoints_proto.endpoints.extend(list(self._executor.requests.keys()))
+        # TODO(niebayes): add read_endpoints to the proto.
+        endpoints_proto.read_endpoints.extend(list(self._executor.read_endpoints))


JoanFM · 2023-06-27T08:33:11Z

I've also noticed that Hashicorp Raft has provided an API verifyLeader which seems equivalent to checkQuorum. I'm not familiar about the usage of the API and the future. I'm wondering what I need to do is calling the verifyLeader API which gives me a future and then calling Error on the future to wait for the response. Am I right?

I believe so yes

niebayes · 2023-06-27T08:38:38Z

As we know, the read index requires the applied index. However, the Raft layer only knows the last applied index which is the index of the log it most recently sends to the FSM but the FSM maybe not consume the log yet. So the AppliedIndex API maybe returns a more up-to-date index.

So, in order to implement the read index optimization. We need to record the last applied index the executor FSM has consumed.

niebayes · 2023-06-27T08:43:20Z

Another issue is the determinism of the FSM Apply. As you can see, the Hashicorp Raft requires the Apply implementation to be deterministic.

But our Apply implementation has many error handlings. Would they make the Apply not deterministic?

JoanFM · 2023-06-27T08:47:48Z

recently sends to the FSM but the FSM maybe not consume the log yet. So th

This is a known issue. Indeed, if there is a connection errror to the Executor, it may be problematic to the FSM consistency.

This is why I would like to have the Python object wrapped inside the Golang code to avoid having this communication layer.

The Unmarshalling error should be deterministic however.

add draft implementation for canonical raft

b71c564

niebayes marked this pull request as draft June 26, 2023 16:09

JoanFM suggested changes Jun 26, 2023

View reviewed changes

JoanFM suggested changes Jun 27, 2023

View reviewed changes

Merge branch 'master' into gsoc_raft

f9ac586

niebayes closed this by deleting the head repository Jan 7, 2024

Conversation

niebayes commented Jun 26, 2023

Uh oh!

JoanFM left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

niebayes Jun 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JoanFM commented Jun 26, 2023

Uh oh!

niebayes commented Jun 27, 2023

Uh oh!

niebayes commented Jun 27, 2023

Uh oh!

niebayes commented Jun 27, 2023

Uh oh!

JoanFM commented Jun 27, 2023

Uh oh!

JoanFM commented Jun 27, 2023

Uh oh!

JoanFM commented Jun 27, 2023

Uh oh!

niebayes commented Jun 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

niebayes commented Jun 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JoanFM left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JoanFM commented Jun 27, 2023

Uh oh!

niebayes commented Jun 27, 2023

Uh oh!

niebayes commented Jun 27, 2023

Uh oh!

JoanFM commented Jun 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

niebayes Jun 27, 2023 •

edited

Loading

niebayes commented Jun 27, 2023 •

edited

Loading

niebayes commented Jun 27, 2023 •

edited

Loading

JoanFM commented Jun 27, 2023 •

edited

Loading