Skip to content

[Remote Store] Exception in RemoteStoreRefreshListener.afterRefresh() during relocation for remote-backed indexes #5844

@ashking94

Description

@ashking94

Is your feature request related to a problem? Please describe.
During primary relocation, the new primary gets bootstrapped with NRTReplicationEngine. Now, the check for primary shard routing and remote store enabled evaluates as true during primary relocation. So, RemoteStoreRefreshListener.afterRefresh() can be invoked with InternalEngine as well as NRTReplicationEngine. However, within the afterRefresh() we are casting the engine to InternalEngine without knowing the exact implementation.

((InternalEngine) indexShard.getEngine()).lastRefreshedCheckpoint();

Exception thrown -

[2023-01-12T10:01:48,118][ERROR][o.o.i.s.RemoteStoreRefreshListener] [opensearch-node1] Exception in RemoteStoreRefreshListener.afterRefresh()
java.lang.ClassCastException: class org.opensearch.index.engine.NRTReplicationEngine cannot be cast to class org.opensearch.index.engine.InternalEngine (org.opensearch.index.engine.NRTReplicationEngine and org.opensearch.index.engine.InternalEngine are in unnamed module of loader 'app')
	at org.opensearch.index.shard.RemoteStoreRefreshListener.uploadSegmentInfosSnapshot(RemoteStoreRefreshListener.java:191) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.index.shard.RemoteStoreRefreshListener.afterRefresh(RemoteStoreRefreshListener.java:133) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.apache.lucene.search.ReferenceManager.notifyRefreshListenersRefreshed(ReferenceManager.java:275) [lucene-core-9.5.0-snapshot-0878271.jar:9.5.0-snapshot-0878271 08782710435618f15825f777ae2a5bee9b6f681a - runner - 2022-12-27 14:43:13]
	at org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:182) [lucene-core-9.5.0-snapshot-0878271.jar:9.5.0-snapshot-0878271 08782710435618f15825f777ae2a5bee9b6f681a - runner - 2022-12-27 14:43:13]
	at org.apache.lucene.search.ReferenceManager.maybeRefresh(ReferenceManager.java:213) [lucene-core-9.5.0-snapshot-0878271.jar:9.5.0-snapshot-0878271 08782710435618f15825f777ae2a5bee9b6f681a - runner - 2022-12-27 14:43:13]
	at org.opensearch.index.engine.NRTReplicationReaderManager.updateSegments(NRTReplicationReaderManager.java:81) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.index.engine.NRTReplicationEngine.updateSegments(NRTReplicationEngine.java:130) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.index.shard.IndexShard.finalizeReplication(IndexShard.java:1412) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.indices.replication.SegmentReplicationTarget.lambda$finalizeReplication$5(SegmentReplicationTarget.java:217) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.action.ActionListener.completeWith(ActionListener.java:342) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.indices.replication.SegmentReplicationTarget.finalizeReplication(SegmentReplicationTarget.java:202) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.indices.replication.SegmentReplicationTarget.lambda$startReplication$3(SegmentReplicationTarget.java:166) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.action.ActionListener$1.onResponse(ActionListener.java:80) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.common.util.concurrent.ListenableFuture$1.doRun(ListenableFuture.java:126) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.common.util.concurrent.OpenSearchExecutors$DirectExecutorService.execute(OpenSearchExecutors.java:341) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.common.util.concurrent.ListenableFuture.notifyListener(ListenableFuture.java:120) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.common.util.concurrent.ListenableFuture.lambda$done$0(ListenableFuture.java:112) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at java.util.ArrayList.forEach(ArrayList.java:1511) [?:?]
	at org.opensearch.common.util.concurrent.ListenableFuture.done(ListenableFuture.java:112) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.common.util.concurrent.BaseFuture.set(BaseFuture.java:160) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.common.util.concurrent.ListenableFuture.onResponse(ListenableFuture.java:141) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.action.StepListener.innerOnResponse(StepListener.java:77) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.action.NotifyOnceListener.onResponse(NotifyOnceListener.java:55) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.action.ActionListener$4.onResponse(ActionListener.java:180) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.action.ActionListener$6.onResponse(ActionListener.java:299) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.action.support.RetryableAction$RetryingListener.onResponse(RetryableAction.java:181) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:69) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1381) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.transport.InboundHandler.doHandleResponse(InboundHandler.java:393) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.transport.InboundHandler.lambda$handleResponse$1(InboundHandler.java:387) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:747) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [?:?]
	at java.lang.Thread.run(Thread.java:1589) [?:?]

Describe the solution you'd like
The class cast code to InternalEngine is used for performing cleanup of translogs on local machine and remote. We need to need to handle this by skipping setMinSeqNoToKeep if the underlying engine is NRTReplicationEngine.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Metadata

Metadata

Assignees

Labels

Storage:DurabilityIssues and PRs related to the durability frameworkenhancementEnhancement or improvement to existing feature or requestv2.6.0'Issues and PRs related to version v2.6.0'

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions