Skip to content

Comments

Test and fix Raft peer additions#7632

Merged
neilalexander merged 3 commits intomainfrom
raft-peer-add-disjoint-majorities
Dec 11, 2025
Merged

Test and fix Raft peer additions#7632
neilalexander merged 3 commits intomainfrom
raft-peer-add-disjoint-majorities

Conversation

@sciascid
Copy link
Contributor

This PR makes the following changes:

  1. Add test helpers so that we can exercise peer addition in unit tests
  2. Fix a case where a partition and peer addition could lead to disjoint majorities

Signed-off-by: Daniele Sciascia daniele@nats.io

@sciascid sciascid force-pushed the raft-peer-add-disjoint-majorities branch 2 times, most recently from 687a136 to 6326456 Compare December 11, 2025 08:31
Copy link
Member

@MauriceVanVeen MauriceVanVeen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Daniele Sciascia <daniele@nats.io>
@sciascid sciascid force-pushed the raft-peer-add-disjoint-majorities branch 2 times, most recently from fa04062 to 5edec33 Compare December 11, 2025 16:09
This commit fixes a bug where disjoint majorities could form
during peer additions. Previously, node would update their
cluster size only after the `EntryPeerAdd` log entry was
committed by the cluster. This led to a situation where a
leader could have a new peer in its peer set, before committing
the corresponding EntryPeerAdd, while keeping the quorum size
unchanged. This created the opportunity for a partitioned
leader to commit EntryPeerAdd using only append entry responses
from newly added nodes. For instance, a partitioned leader in
a three node cluster, could form a majority with a newly added
peer. Resulting in a 4 node membership, now requiring a quorum
of 3. A subsequent peer addition would also succeed, using
with the help of the two newly added nodes. Resulting in a
5 node membership with 3 functioning nodes on one side of the
partition, and the old followers forming a different majority
on the other side of the partition.

The fix ensures that a node adds a new peer and adjusts its
quorum calculation at the same time, right after appending the
corresponding EntryAddPeer to its WAL. This is in line with
the membership change algorithm given by Raft's original
author (section 4.1 of Ongaro's thesis).
With the revised logic, a partitioned leader of a 3 node cluster
is unable to form a majority using the acknowledgement from the
to be added peer. As the quorum now requires at least one
acknowledgement coming from the original followers.

Signed-off-by: Daniele Sciascia <daniele@nats.io>
Signed-off-by: Daniele Sciascia <daniele@nats.io>
@sciascid sciascid force-pushed the raft-peer-add-disjoint-majorities branch from 5edec33 to be7e595 Compare December 11, 2025 16:30
@sciascid sciascid marked this pull request as ready for review December 11, 2025 17:04
@sciascid sciascid requested a review from a team as a code owner December 11, 2025 17:04
Copy link
Member

@neilalexander neilalexander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@neilalexander neilalexander merged commit d936b08 into main Dec 11, 2025
88 of 92 checks passed
@neilalexander neilalexander deleted the raft-peer-add-disjoint-majorities branch December 11, 2025 17:17
neilalexander added a commit that referenced this pull request Dec 11, 2025
Includes the following:

- #7622
- #7619
- #7624
- #7625
- #7627
- #7630
- #7631
- #7632

Signed-off-by: Neil Twigg <neil@nats.io>
neilalexander added a commit that referenced this pull request Jan 6, 2026
Includes the following:

- #7565
- #7589
- #7600
- #7602
- #7609
- #7610
- #7632
- #7649
- #7642
- #7658
- #7659
- #7661
- #7662
- #7663
- #7668
- #7683
- #7685
- #7686
- #7678
- #7691
- #7696
- #7698
- #7699
- #7700

Signed-off-by: Neil Twigg <neil@nats.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants