-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[Segment Replication] [BUG] file handle leak in SegmentFileTransferHandler #4205
Copy link
Copy link
Closed
Labels
bugSomething isn't workingSomething isn't workingdistributed frameworkenhancementEnhancement or improvement to existing feature or requestEnhancement or improvement to existing feature or request
Description
Describe the bug
store.directory().openInput in SegmentFileTransferHandler fails with runtime file handle leak exception. Based on remaining usages, this call can be added in try-with-resources block as IndexInput implements the Closeable interface.
file handle leaks: [FileChannel(/Users/singhnjb/OpenSearch/server/build/testrun/internalClusterTest/temp/org.opensearch.indices.replication.SegmentReplicationIT_E528474D6EDB4FDE-001/tempDir-005/node_t1/d0/nodes/0/indices/m2VhNPQ4SueBsbqz3y6juA/0/index/_b_Lucene90_0.pos), FileChannel(/Users/singhnjb/OpenSearch/server/build/testrun/internalClusterTest/temp/org.opensearch.indices.replication.SegmentReplicationIT_E528474D6EDB4FDE-001/tempDir-005/node_t1/d0/nodes/0/indices/m2VhNPQ4SueBsbqz3y6juA/0/index/_b.fdt), FileChannel(/Users/singhnjb/OpenSearch/server/build/testrun/internalClusterTest/temp/org.opensearch.indices.replication.SegmentReplicationIT_E528474D6EDB4FDE-001/tempDir-005/node_t1/d0/nodes/0/indices/m2VhNPQ4SueBsbqz3y6juA/0/index/_b_Lucene90_0.pos), FileChannel(/Users/singhnjb/OpenSearch/server/build/testrun/internalClusterTest/temp/org.opensearch.indices.replication.SegmentReplicationIT_E528474D6EDB4FDE-001/tempDir-005/node_t1/d0/nodes/0/indices/m2VhNPQ4SueBsbqz3y6juA/0/index/_b.fdt), FileChannel(/Users/singhnjb/OpenSearch/server/build/testrun/internalClusterTest/temp/org.opensearch.indices.replication.SegmentReplicationIT_E528474D6EDB4FDE-001/tempDir-005/node_t1/d0/nodes/0/indices/m2VhNPQ4SueBsbqz3y6juA/0/index/_b.fdt)]
java.lang.RuntimeException: file handle leaks: [FileChannel(/Users/singhnjb/OpenSearch/server/build/testrun/internalClusterTest/temp/org.opensearch.indices.replication.SegmentReplicationIT_E528474D6EDB4FDE-001/tempDir-005/node_t1/d0/nodes/0/indices/m2VhNPQ4SueBsbqz3y6juA/0/index/_b_Lucene90_0.pos), FileChannel(/Users/singhnjb/OpenSearch/server/build/testrun/internalClusterTest/temp/org.opensearch.indices.replication.SegmentReplicationIT_E528474D6EDB4FDE-001/tempDir-005/node_t1/d0/nodes/0/indices/m2VhNPQ4SueBsbqz3y6juA/0/index/_b.fdt), FileChannel(/Users/singhnjb/OpenSearch/server/build/testrun/internalClusterTest/temp/org.opensearch.indices.replication.SegmentReplicationIT_E528474D6EDB4FDE-001/tempDir-005/node_t1/d0/nodes/0/indices/m2VhNPQ4SueBsbqz3y6juA/0/index/_b_Lucene90_0.pos), FileChannel(/Users/singhnjb/OpenSearch/server/build/testrun/internalClusterTest/temp/org.opensearch.indices.replication.SegmentReplicationIT_E528474D6EDB4FDE-001/tempDir-005/node_t1/d0/nodes/0/indices/m2VhNPQ4SueBsbqz3y6juA/0/index/_b.fdt), FileChannel(/Users/singhnjb/OpenSearch/server/build/testrun/internalClusterTest/temp/org.opensearch.indices.replication.SegmentReplicationIT_E528474D6EDB4FDE-001/tempDir-005/node_t1/d0/nodes/0/indices/m2VhNPQ4SueBsbqz3y6juA/0/index/_b.fdt)]
at __randomizedtesting.SeedInfo.seed([E528474D6EDB4FDE]:0)
at org.apache.lucene.tests.mockfile.LeakFS.onClose(LeakFS.java:63)
at org.apache.lucene.tests.mockfile.FilterFileSystem.close(FilterFileSystem.java:69)
at org.apache.lucene.tests.mockfile.FilterFileSystem.close(FilterFileSystem.java:70)
at org.apache.lucene.tests.util.TestRuleTemporaryFilesCleanup.afterAlways(TestRuleTemporaryFilesCleanup.java:223)
at com.carrotsearch.randomizedtesting.rules.TestRuleAdapter$1.afterAlways(TestRuleAdapter.java:31)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:43)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.lang.Exception
at org.apache.lucene.tests.mockfile.LeakFS.onOpen(LeakFS.java:46)
at org.apache.lucene.tests.mockfile.HandleTrackingFS.callOpenHook(HandleTrackingFS.java:82)
at org.apache.lucene.tests.mockfile.HandleTrackingFS.newFileChannel(HandleTrackingFS.java:202)
at org.apache.lucene.tests.mockfile.HandleTrackingFS.newFileChannel(HandleTrackingFS.java:171)
at java.base/java.nio.channels.FileChannel.open(FileChannel.java:298)
at java.base/java.nio.channels.FileChannel.open(FileChannel.java:357)
at org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:78)
at org.opensearch.index.store.FsDirectoryFactory$HybridDirectory.openInput(FsDirectoryFactory.java:166)
at org.apache.lucene.tests.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:816)
at org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:101)
at org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:101)
at org.opensearch.indices.replication.SegmentFileTransferHandler$1.onNewResource(SegmentFileTransferHandler.java:107)
at org.opensearch.indices.replication.SegmentFileTransferHandler$1.onNewResource(SegmentFileTransferHandler.java:97)
at org.opensearch.indices.recovery.MultiChunkTransfer.getNextRequest(MultiChunkTransfer.java:183)
at org.opensearch.indices.recovery.MultiChunkTransfer.handleItems(MultiChunkTransfer.java:157)
at org.opensearch.indices.recovery.MultiChunkTransfer$1.write(MultiChunkTransfer.java:98)
at org.opensearch.common.util.concurrent.AsyncIOProcessor.processList(AsyncIOProcessor.java:123)
at org.opensearch.common.util.concurrent.AsyncIOProcessor.drainAndProcessAndRelease(AsyncIOProcessor.java:111)
at org.opensearch.common.util.concurrent.AsyncIOProcessor.put(AsyncIOProcessor.java:102)
at org.opensearch.indices.recovery.MultiChunkTransfer.addItem(MultiChunkTransfer.java:109)
at org.opensearch.indices.recovery.MultiChunkTransfer.lambda$handleItems$3(MultiChunkTransfer.java:151)
at org.opensearch.action.ActionListener$1.onResponse(ActionListener.java:80)
at org.opensearch.action.ActionListener$6.onResponse(ActionListener.java:299)
at org.opensearch.action.ActionListener$4.onResponse(ActionListener.java:180)
at org.opensearch.action.ActionListener$6.onResponse(ActionListener.java:299)
at org.opensearch.action.support.RetryableAction$RetryingListener.onResponse(RetryableAction.java:161)
at org.opensearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:69)
at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1369)
at org.opensearch.transport.InboundHandler.doHandleResponse(InboundHandler.java:393)
at org.opensearch.transport.InboundHandler.lambda$handleResponse$1(InboundHandler.java:387)
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:747)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
... 1 more
To Reproduce
Steps to reproduce the behavior:
This issue occurs randomly on executing SegmentReplicationIT.java tests. I found out this issue while running below new integration test (created to repro a different issue)
public void testPrimaryShardAllocatorUsesFurthestAheadReplica() throws Exception {
final Settings settings = Settings.builder()
.put(indexSettings()).put(IndexMetadata.SETTING_NUMBER_OF_REPLICAS, 6)
.put(IndexMetadata.SETTING_REPLICATION_TYPE, ReplicationType.SEGMENT)
.build();
final String clusterManagerNode = internalCluster().startClusterManagerOnlyNode(Settings.EMPTY);
final String primaryNode = internalCluster().startDataOnlyNode(Settings.EMPTY);
createIndex(INDEX_NAME, settings);
final String firstReplica = internalCluster().startDataOnlyNode(Settings.EMPTY);
final String secondReplica = internalCluster().startDataOnlyNode(Settings.EMPTY);
// Index docs & refresh to bring all replicas to initial checkpoint
indexDocs(scaledRandomIntBetween(20, 200));
flushAndRefresh(INDEX_NAME);
final String thirdReplica = internalCluster().startDataOnlyNode(Settings.EMPTY);
final String fourthReplica = internalCluster().startDataOnlyNode(Settings.EMPTY);
final String fifthReplica = internalCluster().startDataOnlyNode(Settings.EMPTY);
final String sixthReplica = internalCluster().startDataOnlyNode(Settings.EMPTY);
for(int i=0;i<10;i++) {
logger.info("Iteration {} --> ", i);
indexDocsAndRefresh(scaledRandomIntBetween(10, 100));
}
final Index index = resolveIndex(INDEX_NAME);
logger.info("--> primaryShard RC {}", getIndexShard(primaryNode).getLatestReplicationCheckpoint());
internalCluster().stopRandomNode(InternalTestCluster.nameFilter(primaryNode));
for(int i=0;i<5;i++) {
logger.info("Iteration {} --> ", i);
indexDocsAndRefresh(scaledRandomIntBetween(10, 100));
}
ensureYellow(INDEX_NAME);
}
Expected behavior
Test should fail with runtime exception.
Host/Environment (please complete the following information):
- OS: iOS
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingdistributed frameworkenhancementEnhancement or improvement to existing feature or requestEnhancement or improvement to existing feature or request