[BUG] Shard fails to re-assign after a rolling restart

### Describe the bug

During a rolling restart of our OpenSearch cluster, some replica shards fail to re-assign to available nodes. The logs indicate that the destination node rejects the data because a "stale metadata checkpoint" is received from the primary shard. This suggests that the primary's state is changing during the recovery process, leading to a replication failure.

The shard fails to assign itself after 5 retries, and the cluster gives up. The log message explicitly states, shard has exceeded the maximum number of retries [5] on failed allocation attempts. The root cause is identified as a ReplicationFailedException due to a stale checkpoint.

```
shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2025-09-03T23:50:03.463Z], failed_attempts[5], failed_nodes[[Qm1RnXJQQYqSrlqcBq-X6Q]], delayed=false, details[failed shard on node [Qm1RnXJQQYqSrlqcBq-X6Q]: failed recovery, failure RecoveryFailedException[[logstash-2025.08.22][5]: Recovery failed from {prod-eu2-opensearch-logs-g4nt}{jh9ILZaaQvOZGcrO3MiFwA}{CY6nBxCVSeStO8Tv-oTAuQ}{10.202.0.19}{10.202.0.19:9300}{dimr}{shard_indexing_pressure_enabled=true} into {prod-eu2-opensearch-logs-lntz}{Qm1RnXJQQYqSrlqcBq-X6Q}{6FOVN8scQTWmeIvfSoG8pQ}{10.202.0.17}{10.202.0.17:9300}{dimr}{shard_indexing_pressure_enabled=true} ([logstash-2025.08.22][5]: Recovery failed from {prod-eu2-opensearch-logs-g4nt}{jh9ILZaaQvOZGcrO3MiFwA}{CY6nBxCVSeStO8Tv-oTAuQ}{10.202.0.19}{10.202.0.19:9300}{dimr}{shard_indexing_pressure_enabled=true} into {prod-eu2-opensearch-logs-lntz}{Qm1RnXJQQYqSrlqcBq-X6Q}{6FOVN8scQTWmeIvfSoG8pQ}{10.202.0.17}{10.202.0.17:9300}{dimr}{shard_indexing_pressure_enabled=true})]; nested: RecoveryFailedException[[logstash-2025.08.22][5]: Recovery failed from {prod-eu2-opensearch-logs-g4nt}{jh9ILZaaQvOZGcrO3MiFwA}{CY6nBxCVSeStO8Tv-oTAuQ}{10.202.0.19}{10.202.0.19:9300}{dimr}{shard_indexing_pressure_enabled=true} into {prod-eu2-opensearch-logs-lntz}{Qm1RnXJQQYqSrlqcBq-X6Q}{6FOVN8scQTWmeIvfSoG8pQ}{10.202.0.17}{10.202.0.17:9300}{dimr}{shard_indexing_pressure_enabled=true}]; nested: RemoteTransportException[[prod-eu2-opensearch-logs-g4nt][10.202.0.19:9300][internal:index/shard/recovery/start_recovery]]; nested: RemoteTransportException[[prod-eu2-opensearch-logs-lntz][10.202.0.17:9300][internal:index/shard/replication/segments_sync]]; nested: ReplicationFailedException[Segment Replication failed]; nested: ReplicationFailedException[Rejecting stale metadata checkpoint [ReplicationCheckpoint{shardId=[logstash-2025.08.22][5], primaryTerm=3, segmentsGen=171, version=13559, size=32294083233, codec=ZSTD912, timestamp=0}] since initial checkpoint [ReplicationCheckpoint{shardId=[logstash-2025.08.22][5], primaryTerm=3, segmentsGen=1066, version=14449, size=32294083233, codec=ZSTD101, timestamp=1756943403278888666}] is ahead of it]; ], allocation_status[no_attempt]]]
```

This appears to be a bug where the primary and replica shards get out of sync during the recovery process. The primary's state changes while it's trying to send an old copy of the data, which the new replica correctly rejects.


### Related component

Other

### To Reproduce

1. Disable shard allocation.
2. Restart an OpenSearch node.
3. Enable shard allocation.
4. The cluster never becomes `green`, as the shards remain unassigned, preventing subsequent steps in the rolling restart process.

### Expected behavior

The shard should successfully re-assign to the new node, completing the recovery process, and the cluster should transition back to a `green` status.


### Additional Details

#### Environment

* **OpenSearch Version:** `3.2.0`
* **JVM Version:** `OpenJDK Runtime Environment Temurin-24.0.2+12 (build 24.0.2+12`
* **OS:** `Ubuntu 22.04`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Shard fails to re-assign after a rolling restart #19234

Describe the bug

Related component

To Reproduce

Expected behavior

Additional Details

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] Shard fails to re-assign after a rolling restart #19234

Description

Describe the bug

Related component

To Reproduce

Expected behavior

Additional Details

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions