Skip to content

Comments

Avoid concurrent raft membership changes #7565

Merged
neilalexander merged 2 commits intomainfrom
raft-no-concurrent-membership
Nov 27, 2025
Merged

Avoid concurrent raft membership changes #7565
neilalexander merged 2 commits intomainfrom
raft-no-concurrent-membership

Conversation

@sciascid
Copy link
Contributor

A simple approach based to avoid concurrent membership changes. The leader simply drops a membership change if there is another membership change in progress. A membership is in progress if it has been proposed, but not committed yet.

Signed-off-by: Daniele Sciascia daniele@nats.io

server/raft.go Outdated
return false
}
for _, v := range ae.entries {
if v.Type == EntryAddPeer || v.Type == EntryRemovePeer {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to include EntryPeerState here too?

@sciascid sciascid force-pushed the raft-no-concurrent-membership branch 7 times, most recently from e1f4aa2 to 39c8e6d Compare November 21, 2025 15:49
@sciascid
Copy link
Contributor Author

This could be considered complete.
Possible improvements:

  • Since this is tracking some state (the boolean) we might as well track the entire entry, for example. This would allow for some sanity checks.
  • Could return an error when there's an attempt to add / remove concurrently, instead of dropping the proposal silently
  • Could queue up concurrent proposals instead of dropping them
  • Testing: it's complicated... tried very hard to come up with a test that shows concurrent proposals are not possible, even under leader change. Don't think it is possible right now.

@MauriceVanVeen
Copy link
Member

My primary worry with only disallowing membership in Raft is that we'll be protected against concurrent changes on the lowest level. But the user issuing nats server cluster peer-remove will still observe "success" for all meta peer-remove requests, even if they'd try to do so concurrently or when there's no quorum on the peer-remove. This is because the upper layer will still always respond without waiting for quorum, or knowing that another peer-remove was already in progress.

@sciascid sciascid force-pushed the raft-no-concurrent-membership branch 2 times, most recently from 3bcf2f6 to 53ef284 Compare November 25, 2025 13:36
@sciascid sciascid marked this pull request as ready for review November 25, 2025 14:02
@sciascid sciascid requested a review from a team as a code owner November 25, 2025 14:02
Copy link
Member

@MauriceVanVeen MauriceVanVeen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM, but could you try removing these loops in monitorStream and monitorConsumer that are used for scale moves and see if the tests still pass? Since that in essence would still rely on being able to do concurrent membership changes.

for _, p := range oldPeers {
	n.ProposeRemovePeer(p)
}

Avoid concurrent peer additions and removals by having the leader
check that there are no uncommitted entries of type EntryAddPeer
or EntryRemovePeer.

Signed-off-by: Daniele Sciascia <daniele@nats.io>
@sciascid sciascid force-pushed the raft-no-concurrent-membership branch from 53ef284 to 434b02b Compare November 25, 2025 15:07
@sciascid
Copy link
Contributor Author

Generally LGTM, but could you try removing these loops in monitorStream and monitorConsumer that are used for scale moves and see if the tests still pass? Since that in essence would still rely on being able to do concurrent membership changes.

for _, p := range oldPeers {
	n.ProposeRemovePeer(p)
}

Removed the loops with a call to ProposeKnownPeers()

ProposeRemovePeer can no longer be used in tight loops without
error checking, because it can't handle more than one membership
change at a time. Replacing those loops in the stream / consumer
move scenarios with an equivalent call to ProposeKnownPeers.

Signed-off-by: Daniele Sciascia <daniele@nats.io>
@sciascid sciascid force-pushed the raft-no-concurrent-membership branch from 30c3907 to 3cb9017 Compare November 26, 2025 17:07
Copy link
Member

@MauriceVanVeen MauriceVanVeen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@neilalexander neilalexander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@neilalexander neilalexander merged commit e9be22c into main Nov 27, 2025
69 of 70 checks passed
@neilalexander neilalexander deleted the raft-no-concurrent-membership branch November 27, 2025 17:37
neilalexander added a commit that referenced this pull request Dec 5, 2025
Includes the following:

- #7581
- #7585
- #7586
- #7565
- #7588
- #7593
- #7589
- #7594
- #7595
- #7596
- #7597
- #7598
- #7600
- #7601
- #7602
- #7604
- #7605
- #7607
- #7609
- #7610
- #7616
- #7614

Signed-off-by: Neil Twigg <neil@nats.io>
neilalexander added a commit that referenced this pull request Jan 6, 2026
Includes the following:

- #7565
- #7589
- #7600
- #7602
- #7609
- #7610
- #7632
- #7649
- #7642
- #7658
- #7659
- #7661
- #7662
- #7663
- #7668
- #7683
- #7685
- #7686
- #7678
- #7691
- #7696
- #7698
- #7699
- #7700

Signed-off-by: Neil Twigg <neil@nats.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants