-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
Describe the bug
PR #20221 added BWC rolling upgrade tests for remote cluster state publication. These tests were found to be intermittently failing (#20910) because remote cluster state does not handle the mixed-version clusters that can be created by the tests. The test commit was reverted in #20942.
In traditional (non-remote) cluster state publication, the cluster manager sends cluster state to each node individually over the transport layer. The transport handshake ensures the cluster manager knows each node's version and serializes accordingly — a 3.x cluster
manager writes 2.19-format state when sending to a 2.19 node. Old nodes never see a format they can't understand.
Remote cluster state changes this model. The cluster manager writes a single blob to remote storage using Version.CURRENT, and all nodes read the same blob. There is no per-reader version targeting. When a newer-version node is elected cluster manager in a mixed-version
cluster, it writes state that older nodes cannot deserialize, causing:
- IndexMetadata XContent failures — fromXContent throws on unknown fields or changed structures (e.g. Can't get text on a START_ARRAY).
- DiscoveryNodes binary failures — binary deserialization hits unexpected bytes (e.g. unexpected byte [0x08]).
Next steps
- Re-implement backward compatibility tests that pass reliably
- Clarify upgrade expectations and possibly harden against the forward compatibility failure (i.e. don't let a new-version leader be elected in a mixed-version cluster)
Related component
Cluster Manager
Related issues
Metadata
Metadata
Assignees
Labels
Type
Projects
Status