Skip to content

[Segment Replication] [BUG] file handle leak in SegmentFileTransferHandler #4205

@dreamer-89

Description

@dreamer-89

Describe the bug

store.directory().openInput in SegmentFileTransferHandler fails with runtime file handle leak exception. Based on remaining usages, this call can be added in try-with-resources block as IndexInput implements the Closeable interface.

file handle leaks: [FileChannel(/Users/singhnjb/OpenSearch/server/build/testrun/internalClusterTest/temp/org.opensearch.indices.replication.SegmentReplicationIT_E528474D6EDB4FDE-001/tempDir-005/node_t1/d0/nodes/0/indices/m2VhNPQ4SueBsbqz3y6juA/0/index/_b_Lucene90_0.pos), FileChannel(/Users/singhnjb/OpenSearch/server/build/testrun/internalClusterTest/temp/org.opensearch.indices.replication.SegmentReplicationIT_E528474D6EDB4FDE-001/tempDir-005/node_t1/d0/nodes/0/indices/m2VhNPQ4SueBsbqz3y6juA/0/index/_b.fdt), FileChannel(/Users/singhnjb/OpenSearch/server/build/testrun/internalClusterTest/temp/org.opensearch.indices.replication.SegmentReplicationIT_E528474D6EDB4FDE-001/tempDir-005/node_t1/d0/nodes/0/indices/m2VhNPQ4SueBsbqz3y6juA/0/index/_b_Lucene90_0.pos), FileChannel(/Users/singhnjb/OpenSearch/server/build/testrun/internalClusterTest/temp/org.opensearch.indices.replication.SegmentReplicationIT_E528474D6EDB4FDE-001/tempDir-005/node_t1/d0/nodes/0/indices/m2VhNPQ4SueBsbqz3y6juA/0/index/_b.fdt), FileChannel(/Users/singhnjb/OpenSearch/server/build/testrun/internalClusterTest/temp/org.opensearch.indices.replication.SegmentReplicationIT_E528474D6EDB4FDE-001/tempDir-005/node_t1/d0/nodes/0/indices/m2VhNPQ4SueBsbqz3y6juA/0/index/_b.fdt)]
java.lang.RuntimeException: file handle leaks: [FileChannel(/Users/singhnjb/OpenSearch/server/build/testrun/internalClusterTest/temp/org.opensearch.indices.replication.SegmentReplicationIT_E528474D6EDB4FDE-001/tempDir-005/node_t1/d0/nodes/0/indices/m2VhNPQ4SueBsbqz3y6juA/0/index/_b_Lucene90_0.pos), FileChannel(/Users/singhnjb/OpenSearch/server/build/testrun/internalClusterTest/temp/org.opensearch.indices.replication.SegmentReplicationIT_E528474D6EDB4FDE-001/tempDir-005/node_t1/d0/nodes/0/indices/m2VhNPQ4SueBsbqz3y6juA/0/index/_b.fdt), FileChannel(/Users/singhnjb/OpenSearch/server/build/testrun/internalClusterTest/temp/org.opensearch.indices.replication.SegmentReplicationIT_E528474D6EDB4FDE-001/tempDir-005/node_t1/d0/nodes/0/indices/m2VhNPQ4SueBsbqz3y6juA/0/index/_b_Lucene90_0.pos), FileChannel(/Users/singhnjb/OpenSearch/server/build/testrun/internalClusterTest/temp/org.opensearch.indices.replication.SegmentReplicationIT_E528474D6EDB4FDE-001/tempDir-005/node_t1/d0/nodes/0/indices/m2VhNPQ4SueBsbqz3y6juA/0/index/_b.fdt), FileChannel(/Users/singhnjb/OpenSearch/server/build/testrun/internalClusterTest/temp/org.opensearch.indices.replication.SegmentReplicationIT_E528474D6EDB4FDE-001/tempDir-005/node_t1/d0/nodes/0/indices/m2VhNPQ4SueBsbqz3y6juA/0/index/_b.fdt)]
	at __randomizedtesting.SeedInfo.seed([E528474D6EDB4FDE]:0)
	at org.apache.lucene.tests.mockfile.LeakFS.onClose(LeakFS.java:63)
	at org.apache.lucene.tests.mockfile.FilterFileSystem.close(FilterFileSystem.java:69)
	at org.apache.lucene.tests.mockfile.FilterFileSystem.close(FilterFileSystem.java:70)
	at org.apache.lucene.tests.util.TestRuleTemporaryFilesCleanup.afterAlways(TestRuleTemporaryFilesCleanup.java:223)
	at com.carrotsearch.randomizedtesting.rules.TestRuleAdapter$1.afterAlways(TestRuleAdapter.java:31)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:43)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
	at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
	at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.lang.Exception
	at org.apache.lucene.tests.mockfile.LeakFS.onOpen(LeakFS.java:46)
	at org.apache.lucene.tests.mockfile.HandleTrackingFS.callOpenHook(HandleTrackingFS.java:82)
	at org.apache.lucene.tests.mockfile.HandleTrackingFS.newFileChannel(HandleTrackingFS.java:202)
	at org.apache.lucene.tests.mockfile.HandleTrackingFS.newFileChannel(HandleTrackingFS.java:171)
	at java.base/java.nio.channels.FileChannel.open(FileChannel.java:298)
	at java.base/java.nio.channels.FileChannel.open(FileChannel.java:357)
	at org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:78)
	at org.opensearch.index.store.FsDirectoryFactory$HybridDirectory.openInput(FsDirectoryFactory.java:166)
	at org.apache.lucene.tests.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:816)
	at org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:101)
	at org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:101)
	at org.opensearch.indices.replication.SegmentFileTransferHandler$1.onNewResource(SegmentFileTransferHandler.java:107)
	at org.opensearch.indices.replication.SegmentFileTransferHandler$1.onNewResource(SegmentFileTransferHandler.java:97)
	at org.opensearch.indices.recovery.MultiChunkTransfer.getNextRequest(MultiChunkTransfer.java:183)
	at org.opensearch.indices.recovery.MultiChunkTransfer.handleItems(MultiChunkTransfer.java:157)
	at org.opensearch.indices.recovery.MultiChunkTransfer$1.write(MultiChunkTransfer.java:98)
	at org.opensearch.common.util.concurrent.AsyncIOProcessor.processList(AsyncIOProcessor.java:123)
	at org.opensearch.common.util.concurrent.AsyncIOProcessor.drainAndProcessAndRelease(AsyncIOProcessor.java:111)
	at org.opensearch.common.util.concurrent.AsyncIOProcessor.put(AsyncIOProcessor.java:102)
	at org.opensearch.indices.recovery.MultiChunkTransfer.addItem(MultiChunkTransfer.java:109)
	at org.opensearch.indices.recovery.MultiChunkTransfer.lambda$handleItems$3(MultiChunkTransfer.java:151)
	at org.opensearch.action.ActionListener$1.onResponse(ActionListener.java:80)
	at org.opensearch.action.ActionListener$6.onResponse(ActionListener.java:299)
	at org.opensearch.action.ActionListener$4.onResponse(ActionListener.java:180)
	at org.opensearch.action.ActionListener$6.onResponse(ActionListener.java:299)
	at org.opensearch.action.support.RetryableAction$RetryingListener.onResponse(RetryableAction.java:161)
	at org.opensearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:69)
	at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1369)
	at org.opensearch.transport.InboundHandler.doHandleResponse(InboundHandler.java:393)
	at org.opensearch.transport.InboundHandler.lambda$handleResponse$1(InboundHandler.java:387)
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:747)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	... 1 more

To Reproduce
Steps to reproduce the behavior:
This issue occurs randomly on executing SegmentReplicationIT.java tests. I found out this issue while running below new integration test (created to repro a different issue)

    public void testPrimaryShardAllocatorUsesFurthestAheadReplica() throws Exception {
        final Settings settings = Settings.builder()
            .put(indexSettings()).put(IndexMetadata.SETTING_NUMBER_OF_REPLICAS, 6)
            .put(IndexMetadata.SETTING_REPLICATION_TYPE, ReplicationType.SEGMENT)
            .build();
        final String clusterManagerNode = internalCluster().startClusterManagerOnlyNode(Settings.EMPTY);
        final String primaryNode = internalCluster().startDataOnlyNode(Settings.EMPTY);
        createIndex(INDEX_NAME, settings);
        final String firstReplica = internalCluster().startDataOnlyNode(Settings.EMPTY);
        final String secondReplica = internalCluster().startDataOnlyNode(Settings.EMPTY);

        // Index docs & refresh to bring all replicas to initial checkpoint
        indexDocs(scaledRandomIntBetween(20, 200));
        flushAndRefresh(INDEX_NAME);

        final String thirdReplica = internalCluster().startDataOnlyNode(Settings.EMPTY);
        final String fourthReplica = internalCluster().startDataOnlyNode(Settings.EMPTY);
        final String fifthReplica = internalCluster().startDataOnlyNode(Settings.EMPTY);
        final String sixthReplica = internalCluster().startDataOnlyNode(Settings.EMPTY);

        for(int i=0;i<10;i++) {
            logger.info("Iteration {} --> ", i);
            indexDocsAndRefresh(scaledRandomIntBetween(10, 100));
        }

        final Index index = resolveIndex(INDEX_NAME);
        logger.info("--> primaryShard RC {}", getIndexShard(primaryNode).getLatestReplicationCheckpoint());
        internalCluster().stopRandomNode(InternalTestCluster.nameFilter(primaryNode));

        for(int i=0;i<5;i++) {
            logger.info("Iteration {} --> ", i);
            indexDocsAndRefresh(scaledRandomIntBetween(10, 100));
        }
        ensureYellow(INDEX_NAME);

    }

Expected behavior
Test should fail with runtime exception.

Host/Environment (please complete the following information):

  • OS: iOS

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingdistributed frameworkenhancementEnhancement or improvement to existing feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions