Skip to content

[BUG] Remote cluster state backward compatibility tests #20948

@andrross

Description

@andrross

Describe the bug

PR #20221 added BWC rolling upgrade tests for remote cluster state publication. These tests were found to be intermittently failing (#20910) because remote cluster state does not handle the mixed-version clusters that can be created by the tests. The test commit was reverted in #20942.

In traditional (non-remote) cluster state publication, the cluster manager sends cluster state to each node individually over the transport layer. The transport handshake ensures the cluster manager knows each node's version and serializes accordingly — a 3.x cluster
manager writes 2.19-format state when sending to a 2.19 node. Old nodes never see a format they can't understand.

Remote cluster state changes this model. The cluster manager writes a single blob to remote storage using Version.CURRENT, and all nodes read the same blob. There is no per-reader version targeting. When a newer-version node is elected cluster manager in a mixed-version
cluster, it writes state that older nodes cannot deserialize, causing:

  1. IndexMetadata XContent failures — fromXContent throws on unknown fields or changed structures (e.g. Can't get text on a START_ARRAY).
  2. DiscoveryNodes binary failures — binary deserialization hits unexpected bytes (e.g. unexpected byte [0x08]).

Next steps

  • Re-implement backward compatibility tests that pass reliably
  • Clarify upgrade expectations and possibly harden against the forward compatibility failure (i.e. don't let a new-version leader be elected in a mixed-version cluster)

Related component

Cluster Manager

Related issues

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    🆕 New

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions