NRG: Fix cluster size drop to 1 on replaying EntryAddPeer after restart by sciascid · Pull Request #7850 · nats-io/nats-server

sciascid · 2026-02-19T10:03:40Z

On restart, replaying EntryAddPeer could incorrectly leave a raft node at cluster size 1 instead of restoring the expected size and quorum from persisted state.
This bug could lead to the following scenario: a node in a 3 node cluster could restart, reset set cluster size to 1. If the node did not receive any message from other nodes, it could campaign to become leader. Being in a single node cluster, it would win the election. Resulting in the original cluster splitting into two clusters (or two leaders at the same time).
Specifically, if an EntryAddPeer was replayed on from the log, it would overwrite the cluster size and quorum to 1. The peer set is now restored before the log is replayed, and it is taken from the snapshot (if no snapshot is present then we fallback to peer.idx).
If a log entry that changes membership is replayed, it will now update the cluster and quorum size correctly.

Signed-off-by: Daniele Sciascia daniele@nats.io

server/raft.go

server/raft_test.go

sciascid · 2026-02-19T11:45:05Z

Notice that before this PR initSingleMemRaftNode would start in a weird configuration. The node would have cluster size = 3, but only 1 node its peer list.

With the changes in this PR we go back to a sane initial state:

  func TestNRGInitSingleMemRaftNodeDefaults(t *testing.T) {
          n, cleanup := initSingleMemRaftNode(t)
          defer cleanup()
          require_Equal(t, n.ID(), "esFhDys3")
          require_Equal(t, len(n.Peers()), 1)
          require_Equal(t, n.Peers()[0].ID, "esFhDys3")
          require_Equal(t, n.ClusterSize(), 1)
          require_True(t, n.Quorum())
  }

Some tests were relying on that initial configuration. Which explains why some tests need tweaking.

MauriceVanVeen

LGTM!

neilalexander

LGTM

neilalexander · 2026-02-19T12:51:02Z

@sciascid Looks like there's now a merge conflict in the tests, mind rebasing & resolving please?

On restart, replaying EntryAddPeer could incorrectly leave a raft node at cluster size 1 instead of restoring the expected size and quorum from persisted state. This bug could lead to the following scenario: a node in a 3 node cluster could restart, reset set cluster size to 1. If the node did not receive any message from other nodes, it could campaign to become leader. Being in a single node cluster, it would win the election. Resulting in the original cluster splitting into two clusters (or two leaders at the same time). Specifically, if an EntryAddPeer was replayed on from the log, it would overwrite the cluster size and quorum to 1. The peer set is now restored before the log is replayed, and it is taken from the snapshot (if no snapshot is present then we fallback to peer.idx). If a log entry that changes membership is replayed, it will now update the cluster and quorum size correctly. Signed-off-by: Daniele Sciascia <daniele@nats.io>

sciascid · 2026-02-19T13:08:46Z

@neilalexander Done

Includes the following: - #7839 - #7843 - #7824 - #7826 - #7845 - #7844 - #7840 - #7827 - #7846 - #7848 - #7849 - #7855 - #7850 - #7857 - #7856 Signed-off-by: Neil Twigg <neil@nats.io>

sciascid requested a review from a team as a code owner February 19, 2026 10:03

MauriceVanVeen reviewed Feb 19, 2026

View reviewed changes

sciascid force-pushed the raft-single-node-cluster-after-restart branch 4 times, most recently from 01cd563 to 09f117b Compare February 19, 2026 11:35

sciascid force-pushed the raft-single-node-cluster-after-restart branch from 09f117b to 3e7f03e Compare February 19, 2026 12:30

MauriceVanVeen approved these changes Feb 19, 2026

View reviewed changes

neilalexander approved these changes Feb 19, 2026

View reviewed changes

sciascid force-pushed the raft-single-node-cluster-after-restart branch from 3e7f03e to b26f715 Compare February 19, 2026 13:07

neilalexander merged commit 0e7df38 into main Feb 19, 2026
48 checks passed

neilalexander deleted the raft-single-node-cluster-after-restart branch February 19, 2026 15:38

neilalexander mentioned this pull request Feb 20, 2026

Cherry-picks for 2.12.5-RC.2 #7861

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comments

NRG: Fix cluster size drop to 1 on replaying EntryAddPeer after restart#7850

NRG: Fix cluster size drop to 1 on replaying EntryAddPeer after restart#7850
neilalexander merged 1 commit intomainfrom
raft-single-node-cluster-after-restart

sciascid commented Feb 19, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sciascid commented Feb 19, 2026

Uh oh!

MauriceVanVeen left a comment

Uh oh!

neilalexander left a comment

Uh oh!

neilalexander commented Feb 19, 2026

Uh oh!

sciascid commented Feb 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Comments

Conversation

sciascid commented Feb 19, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sciascid commented Feb 19, 2026

Uh oh!

MauriceVanVeen left a comment

Choose a reason for hiding this comment

Uh oh!

neilalexander left a comment

Choose a reason for hiding this comment

Uh oh!

neilalexander commented Feb 19, 2026

Uh oh!

sciascid commented Feb 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants