Skip to content

[BUG] Version Upgrades in Remote Store domains can getting stuck #19236

@linuxpi

Description

@linuxpi

Describe the bug

During OpenSearch version upgrade from 2.x to 3.x, the upgrade process fails specifically for Remote Store domains. This happens because RemoteSegmentMetadata generated by primary shard running on 3.x nodes could not be read by Replica shards running on version 2.x nodes. This occurred because NodeVersionAllocationDecider, which was supposed to present primaries moving to new version nodes before replica, was not working as expected due to incorrect index settings being passed to it.

While we observed this issue for RemoteSegmentMetadata, Lucene segment generated in new version nodes by primary shard copy would also be incompatible with replica on old version nodes

Related component

Storage

To Reproduce

  • Have a Remote Store domain running on OpenSearch 2.x
  • Create multiple indices
  • Attempt to upgrade to OpenSearch 3.x
  • During upgrade, NodeVersionAllocationDecider receives incorrect index settings (node level plugin settings instead of index-specific settings)
  • This causes the decider to use default replication type (DOCUMENT)
  • As a result, primary shards move to new version nodes before replicas
  • Primary shards on 3.x generate metadata that 2.x replicas cannot read

Expected behavior

  • NodeVersionAllocationDecider should receive correct index settings for the specific index being decided upon
  • For Segment Replication enabled indices, primaries should be the last to move to new version nodes during upgrade
  • Replicas should move to new version nodes first
  • This ensures compatibility as no older version replicas need to read data from newer version primaries

Additional Details

Additional context

Metadata

Metadata

Assignees

Labels

StorageIssues and PRs relating to data and metadata storagebugSomething isn't working

Type

No type

Projects

Status

🏗 In progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions