Skip to content

[BUG] Join Failure - Mixed Version cluster #19272

@rajiv-kv

Description

@rajiv-kv

Describe the bug

During node-join, leader sends the latest cluster-state to joining nodes for validation. The cluster manager compresses and caches this serialized cluster-state (JoinHelper#serializedState) to reuse across multiple joining nodes.

Version-specific attributes introduced in 2.17 and 3.1 make the cached serializedState incompatible across different opensearch versions. This can cause join failures in mixed-version clusters (e.g., 2.19 and 3.1) when nodes fail to deserialize incompatible cluster state.

Related component

No response

To Reproduce

  • Create a cluster with mixed version (2.19 and 3.1)
  • Bounce the nodes alternatively between 2.19 and 3.1 in quick succession
  • Verify the logs for error

Expected behavior

We need to introduce additional logic to ensure that the node joining the cluster can deserialize the cached state before sending it as part of validation request. If it is not compatible, the version specific cluster-state needs to be regenerated based on joining node version.

Additional Details

Plugins
Please list all plugins currently enabled.

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • OS: [e.g. iOS]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

Status

✅ Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions