-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
Is your feature request related to a problem? Please describe.
During primary relocation, the new primary gets bootstrapped with NRTReplicationEngine. Now, the check for primary shard routing and remote store enabled evaluates as true during primary relocation. So, RemoteStoreRefreshListener.afterRefresh() can be invoked with InternalEngine as well as NRTReplicationEngine. However, within the afterRefresh() we are casting the engine to InternalEngine without knowing the exact implementation.
((InternalEngine) indexShard.getEngine()).lastRefreshedCheckpoint();
Exception thrown -
[2023-01-12T10:01:48,118][ERROR][o.o.i.s.RemoteStoreRefreshListener] [opensearch-node1] Exception in RemoteStoreRefreshListener.afterRefresh()
java.lang.ClassCastException: class org.opensearch.index.engine.NRTReplicationEngine cannot be cast to class org.opensearch.index.engine.InternalEngine (org.opensearch.index.engine.NRTReplicationEngine and org.opensearch.index.engine.InternalEngine are in unnamed module of loader 'app')
at org.opensearch.index.shard.RemoteStoreRefreshListener.uploadSegmentInfosSnapshot(RemoteStoreRefreshListener.java:191) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.index.shard.RemoteStoreRefreshListener.afterRefresh(RemoteStoreRefreshListener.java:133) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.apache.lucene.search.ReferenceManager.notifyRefreshListenersRefreshed(ReferenceManager.java:275) [lucene-core-9.5.0-snapshot-0878271.jar:9.5.0-snapshot-0878271 08782710435618f15825f777ae2a5bee9b6f681a - runner - 2022-12-27 14:43:13]
at org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:182) [lucene-core-9.5.0-snapshot-0878271.jar:9.5.0-snapshot-0878271 08782710435618f15825f777ae2a5bee9b6f681a - runner - 2022-12-27 14:43:13]
at org.apache.lucene.search.ReferenceManager.maybeRefresh(ReferenceManager.java:213) [lucene-core-9.5.0-snapshot-0878271.jar:9.5.0-snapshot-0878271 08782710435618f15825f777ae2a5bee9b6f681a - runner - 2022-12-27 14:43:13]
at org.opensearch.index.engine.NRTReplicationReaderManager.updateSegments(NRTReplicationReaderManager.java:81) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.index.engine.NRTReplicationEngine.updateSegments(NRTReplicationEngine.java:130) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.index.shard.IndexShard.finalizeReplication(IndexShard.java:1412) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.indices.replication.SegmentReplicationTarget.lambda$finalizeReplication$5(SegmentReplicationTarget.java:217) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.action.ActionListener.completeWith(ActionListener.java:342) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.indices.replication.SegmentReplicationTarget.finalizeReplication(SegmentReplicationTarget.java:202) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.indices.replication.SegmentReplicationTarget.lambda$startReplication$3(SegmentReplicationTarget.java:166) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.action.ActionListener$1.onResponse(ActionListener.java:80) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.common.util.concurrent.ListenableFuture$1.doRun(ListenableFuture.java:126) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.common.util.concurrent.OpenSearchExecutors$DirectExecutorService.execute(OpenSearchExecutors.java:341) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.common.util.concurrent.ListenableFuture.notifyListener(ListenableFuture.java:120) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.common.util.concurrent.ListenableFuture.lambda$done$0(ListenableFuture.java:112) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at java.util.ArrayList.forEach(ArrayList.java:1511) [?:?]
at org.opensearch.common.util.concurrent.ListenableFuture.done(ListenableFuture.java:112) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.common.util.concurrent.BaseFuture.set(BaseFuture.java:160) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.common.util.concurrent.ListenableFuture.onResponse(ListenableFuture.java:141) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.action.StepListener.innerOnResponse(StepListener.java:77) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.action.NotifyOnceListener.onResponse(NotifyOnceListener.java:55) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.action.ActionListener$4.onResponse(ActionListener.java:180) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.action.ActionListener$6.onResponse(ActionListener.java:299) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.action.support.RetryableAction$RetryingListener.onResponse(RetryableAction.java:181) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:69) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1381) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.transport.InboundHandler.doHandleResponse(InboundHandler.java:393) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.transport.InboundHandler.lambda$handleResponse$1(InboundHandler.java:387) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:747) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [?:?]
at java.lang.Thread.run(Thread.java:1589) [?:?]
Describe the solution you'd like
The class cast code to InternalEngine is used for performing cleanup of translogs on local machine and remote. We need to need to handle this by skipping setMinSeqNoToKeep if the underlying engine is NRTReplicationEngine.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.