Skip to content

[BUG] RemoteStoreRestoreIT tests are flaky - Missing cluster-manager, expected nodes #11085

@peternied

Description

@peternied

Describe the bug

  • org.opensearch.remotestore.RemoteStoreRestoreIT.testRTSRestoreWithRefreshedDataPrimaryReplicaDown
  • org.opensearch.remotestore.RemoteStoreRestoreIT.testRemoteTranslogRestoreWithCommittedData
  • org.opensearch.remotestore.RemoteStoreRestoreIT.testRemoteTranslogRestoreWithRefreshedData
  • org.opensearch.remotestore.RemoteStoreRestoreIT.testRemoteTranslogRestoreWithNoDataPostCommit
  • org.opensearch.remotestore.RemoteStoreRestoreIT.testRTSRestoreWithNoDataPostRefreshPrimaryReplicaDown
  • org.opensearch.remotestore.RemoteStoreRestoreIT.testRTSRestoreDataOnlyInTranslog
  • org.opensearch.remotestore.RemoteStoreRestoreIT.testRTSRestoreWithCommittedDataPrimaryReplicaDown

tests are flaky.

Stacktrace

Consistent with all but testRTSRestoreWithRefreshedDataPrimaryReplicaDown

java.lang.AssertionError: Missing cluster-manager, expected nodes: [{node_s1}{GD7uyrgiTxSKXQLlfKgK_A}{EpXKR9vyS6mfjExohbCtsA}{127.0.0.1}{127.0.0.1:39151}{d}{shard_indexing_pressure_enabled=true}, {node_s2}{QOKIN5VyR06merR9uOI4bw}{QteIuzBmRZWq6-fJZkbdEg}{127.0.0.1}{127.0.0.1:37443}{d}{shard_indexing_pressure_enabled=true}, {node_s0}{Kr_DbhdESPyQswOYSNJVQw}{f5hbAyrURuOHqIbtC_b8_Q}{127.0.0.1}{127.0.0.1:35393}{m}{shard_indexing_pressure_enabled=true}] and actual cluster states [cluster uuid: AzELHIGZT6uSizEPPUXcuA [committed: true]
version: 3
state uuid: borCjOpOQFWpk6JLeG0TmQ
from_diff: false
meta data version: 2
   coordination_metadata:
      term: 1
      last_committed_config: VotingConfiguration{Kr_DbhdESPyQswOYSNJVQw}
      last_accepted_config: VotingConfiguration{Kr_DbhdESPyQswOYSNJVQw}
      voting tombstones: []
metadata customs:
   repositories: {"test-remote-store-repo-2":{"type":"fs","settings":{"system_repository":"true","location":"/var/jenkins/workspace/gradle-check/search/server/build/testrun/internalClusterTest/temp/org.opensearch.remotestore.RemoteStoreRestoreIT_A166C2386E979736-001/tempDir-002/repos/SbGDsuuFWS"},"generation":-2,"pending_generation":-1},"test-remote-store-repo":{"type":"fs","settings":{"system_repository":"true","location":"/var/jenkins/workspace/gradle-check/search/server/build/testrun/internalClusterTest/temp/org.opensearch.remotestore.RemoteStoreRestoreIT_A166C2386E979736-001/tempDir-002/repos/DVrmjempuo"},"generation":-2,"pending_generation":-1}}   index-graveyard: IndexGraveyard[[]]
nodes: 
   {node_s0}{Kr_DbhdESPyQswOYSNJVQw}{f5hbAyrURuOHqIbtC_b8_Q}{127.0.0.1}{127.0.0.1:35393}{m}{shard_indexing_pressure_enabled=true}, local, cluster-manager
   {node_s2}{QOKIN5VyR06merR9uOI4bw}{QteIuzBmRZWq6-fJZkbdEg}{127.0.0.1}{127.0.0.1:37443}{d}{shard_indexing_pressure_enabled=true}
routing_table (version 1):
routing_nodes:
-----node_id[QOKIN5VyR06merR9uOI4bw][V]
---- unassigned
, cluster uuid: _na_ [committed: false]
version: 0
state uuid: DDhrUTuzS0-TkQaTHp03EQ
from_diff: false
meta data version: 0
   coordination_metadata:
      term: 0
      last_committed_config: VotingConfiguration{}
      last_accepted_config: VotingConfiguration{}
      voting tombstones: []
metadata customs:
   index-graveyard: IndexGraveyard[[]]
blocks: 
   _global_:
      1,state not recovered / initialized, blocks READ,WRITE,METADATA_READ,METADATA_WRITE,CREATE_INDEX      2,no cluster-manager, blocks METADATA_WRITE
nodes: 
   {node_s1}{GD7uyrgiTxSKXQLlfKgK_A}{EpXKR9vyS6mfjExohbCtsA}{127.0.0.1}{127.0.0.1:39151}{d}{shard_indexing_pressure_enabled=true}, local
routing_table (version 0):
routing_nodes:
-----node_id[GD7uyrgiTxSKXQLlfKgK_A][V]
---- unassigned
, cluster uuid: AzELHIGZT6uSizEPPUXcuA [committed: true]
version: 3
state uuid: borCjOpOQFWpk6JLeG0TmQ
from_diff: false
meta data version: 2
   coordination_metadata:
      term: 1
      last_committed_config: VotingConfiguration{Kr_DbhdESPyQswOYSNJVQw}
      last_accepted_config: VotingConfiguration{Kr_DbhdESPyQswOYSNJVQw}
      voting tombstones: []
metadata customs:
   repositories: {"test-remote-store-repo-2":{"type":"fs","settings":{"system_repository":"true","location":"/var/jenkins/workspace/gradle-check/search/server/build/testrun/internalClusterTest/temp/org.opensearch.remotestore.RemoteStoreRestoreIT_A166C2386E979736-001/tempDir-002/repos/SbGDsuuFWS"},"generation":-2,"pending_generation":-1},"test-remote-store-repo":{"type":"fs","settings":{"system_repository":"true","location":"/var/jenkins/workspace/gradle-check/search/server/build/testrun/internalClusterTest/temp/org.opensearch.remotestore.RemoteStoreRestoreIT_A166C2386E979736-001/tempDir-002/repos/DVrmjempuo"},"generation":-2,"pending_generation":-1}}   index-graveyard: IndexGraveyard[[]]
nodes: 
   {node_s0}{Kr_DbhdESPyQswOYSNJVQw}{f5hbAyrURuOHqIbtC_b8_Q}{127.0.0.1}{127.0.0.1:35393}{m}{shard_indexing_pressure_enabled=true}, cluster-manager
   {node_s2}{QOKIN5VyR06merR9uOI4bw}{QteIuzBmRZWq6-fJZkbdEg}{127.0.0.1}{127.0.0.1:37443}{d}{shard_indexing_pressure_enabled=true}, local
routing_table (version 1):
routing_nodes:
-----node_id[QOKIN5VyR06merR9uOI4bw][V]
---- unassigned
]

From testRTSRestoreWithRefreshedDataPrimaryReplicaDown

java.lang.AssertionError:  inconsistent generation 

To Reproduce
CI - https://build.ci.opensearch.org/job/gradle-check/29522/testReport/

Expected behavior
Test should always pass

Metadata

Metadata

Assignees

Labels

Storage:RemotebugSomething isn't workingflaky-testRandom test failure that succeeds on second runlucene

Type

No type

Projects

Status

🆕 New

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions