Fix flaky slowlog test by dzane17 · Pull Request #20726 · opensearch-project/OpenSearch

dzane17 · 2026-02-25T01:23:36Z

Description

The testMultipleSlowLoggersUseSingleLog4jLogger test was flaky because it compared total logger counts in the LoggerContext before and after creating SearchRequestSlowLog instances. Other code paths (ClusterService creation, test framework, parallel tests) could register additional loggers between measurements, causing intermittent failures.

Fixed by directly verifying the same logger instance is reused via assertSame(logger1, logger2) instead of counting total loggers.

Related Issues

Resolves #20665

Check List

Functionality includes testing.
API changes companion pull request created, if applicable.
Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

github-actions · 2026-02-25T01:24:45Z

PR Reviewer Guide 🔍

(Review updated until commit `cc223fc`)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review Test Validity The test retrieves `logger1` and `logger2` from the same `LoggerContext` using the same logger name after each `SearchRequestSlowLog` creation. Since Log4j always returns the same logger instance for the same name from the same context, `assertSame(logger1, logger2)` will always pass regardless of whether `SearchRequestSlowLog` internally reuses or creates new loggers. The test may not actually validate the intended behavior (that `SearchRequestSlowLog` reuses a single Log4j logger). Logger logger1 = context.getLogger(loggerName); ClusterService clusterService2 = ClusterServiceUtils.createClusterService( Settings.EMPTY, new ClusterSettings(Settings.EMPTY, ClusterSettings.BUILT_IN_CLUSTER_SETTINGS), null ); new SearchRequestSlowLog(clusterService2); Logger logger2 = context.getLogger(loggerName); assertSame(logger1, logger2);

github-actions · 2026-02-25T01:24:57Z

PR Code Suggestions ✨

Latest suggestions up to cc223fc

Explore these optional code suggestions:

Category	Suggestion	Impact
General	Close resources after test assertions The `ClusterService` instances created in the test are not being closed after use, which may cause resource leaks and interfere with other tests. Both `clusterService1` and `clusterService2` should be closed after the assertions, ideally using a try-finally block or by calling `clusterService1.close()` and `clusterService2.close()` at the end of the test. server/src/test/java/org/opensearch/action/search/SearchRequestSlowLogTests.java [98-109] new SearchRequestSlowLog(clusterService1); Logger logger1 = context.getLogger(loggerName); ClusterService clusterService2 = ClusterServiceUtils.createClusterService( Settings.EMPTY, new ClusterSettings(Settings.EMPTY, ClusterSettings.BUILT_IN_CLUSTER_SETTINGS), null ); new SearchRequestSlowLog(clusterService2); Logger logger2 = context.getLogger(loggerName); +assertSame(logger1, logger2); + +clusterService1.close(); +clusterService2.close(); + Suggestion importance[1-10]: 5 __ Why: The suggestion to close `clusterService1` and `clusterService2` after the test is valid to prevent resource leaks. However, the `improved_code` moves the `assertSame` call before the close calls, which changes the structure slightly but is functionally correct. This is a minor improvement for test hygiene.	Low

Previous suggestions

Suggestions up to commit 3eb29e8

Category Suggestion Impact

General

Close ClusterService instances properly

The created ClusterService instances are not being closed, which may lead to
resource leaks. Consider using try-with-resources or explicitly closing them after
the test completes to ensure proper cleanup of resources.

server/src/test/java/org/opensearch/action/search/SearchRequestSlowLogTests.java [93-107]

-ClusterService clusterService1 = ClusterServiceUtils.createClusterService(
+try (ClusterService clusterService1 = ClusterServiceUtils.createClusterService(
     Settings.EMPTY,
     new ClusterSettings(Settings.EMPTY, ClusterSettings.BUILT_IN_CLUSTER_SETTINGS),
     null
-);
-new SearchRequestSlowLog(clusterService1);
-Logger logger1 = context.getLogger(loggerName);
+)) {
+    new SearchRequestSlowLog(clusterService1);
+    Logger logger1 = context.getLogger(loggerName);
 
-ClusterService clusterService2 = ClusterServiceUtils.createClusterService(
-    Settings.EMPTY,
-    new ClusterSettings(Settings.EMPTY, ClusterSettings.BUILT_IN_CLUSTER_SETTINGS),
-    null
-);
-new SearchRequestSlowLog(clusterService2);
-Logger logger2 = context.getLogger(loggerName);
+    try (ClusterService clusterService2 = ClusterServiceUtils.createClusterService(
+        Settings.EMPTY,
+        new ClusterSettings(Settings.EMPTY, ClusterSettings.BUILT_IN_CLUSTER_SETTINGS),
+        null
+    )) {
+        new SearchRequestSlowLog(clusterService2);
+        Logger logger2 = context.getLogger(loggerName);
+        
+        assertSame(logger1, logger2);
+    }
+}

Suggestion importance[1-10]: 5

__

Why: The suggestion correctly identifies that ClusterService instances should be closed to prevent resource leaks. However, the improved code has a structural issue: logger1 would go out of scope before the assertion, making the test fail to compile. A better approach would be to close the services after the assertion or use a different cleanup mechanism.

Low

Suggestions up to commit 72d6752

Category Suggestion Impact

General

Close cluster service resources properly

The created ClusterService instances are not being closed, which may lead to
resource leaks. Consider using try-with-resources or explicitly closing these
services in a cleanup method to ensure proper resource management.

server/src/test/java/org/opensearch/action/search/SearchRequestSlowLogTests.java [93-107]

-ClusterService clusterService1 = ClusterServiceUtils.createClusterService(
+try (ClusterService clusterService1 = ClusterServiceUtils.createClusterService(
     Settings.EMPTY,
     new ClusterSettings(Settings.EMPTY, ClusterSettings.BUILT_IN_CLUSTER_SETTINGS),
     null
-);
-new SearchRequestSlowLog(clusterService1);
-...
-ClusterService clusterService2 = ClusterServiceUtils.createClusterService(
-    Settings.EMPTY,
-    new ClusterSettings(Settings.EMPTY, ClusterSettings.BUILT_IN_CLUSTER_SETTINGS),
-    null
-);
-new SearchRequestSlowLog(clusterService2);
+)) {
+    new SearchRequestSlowLog(clusterService1);
+    Logger logger1 = context.getLogger(loggerName);
+    
+    try (ClusterService clusterService2 = ClusterServiceUtils.createClusterService(
+        Settings.EMPTY,
+        new ClusterSettings(Settings.EMPTY, ClusterSettings.BUILT_IN_CLUSTER_SETTINGS),
+        null
+    )) {
+        new SearchRequestSlowLog(clusterService2);
+        Logger logger2 = context.getLogger(loggerName);
+        
+        assertSame(logger1, logger2);
+    }
+}

Suggestion importance[1-10]: 5

__

Why: The suggestion correctly identifies potential resource leaks with unclosed ClusterService instances. However, the improved code structure with nested try-with-resources blocks would prevent logger1 from being accessible for the assertSame comparison, making the suggested implementation problematic. The concern is valid but the solution needs refinement.

Low

github-actions · 2026-02-25T02:37:34Z

❌ Gradle check result for 72d6752: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2026-02-25T18:45:15Z

Persistent review updated to latest commit 3eb29e8

github-actions · 2026-02-25T20:10:24Z

❕ Gradle check result for 3eb29e8: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

codecov · 2026-02-25T20:15:43Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.25%. Comparing base (59be6ae) to head (cc223fc).
⚠️ Report is 9 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff              @@
##               main   #20726      +/-   ##
============================================
- Coverage     73.29%   73.25%   -0.05%     
- Complexity    72088    72100      +12     
============================================
  Files          5794     5794              
  Lines        329733   329733              
  Branches      47577    47577              
============================================
- Hits         241664   241532     -132     
- Misses        68612    68788     +176     
+ Partials      19457    19413      -44

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

CHANGELOG.md

Signed-off-by: David Zane <davizane@amazon.com>

github-actions · 2026-02-28T02:37:33Z

Persistent review updated to latest commit cc223fc

github-actions · 2026-02-28T04:07:28Z

❌ Gradle check result for cc223fc: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

server/src/test/java/org/opensearch/action/search/SearchRequestSlowLogTests.java

github-actions · 2026-03-04T23:34:08Z

✅ Gradle check result for cc223fc: SUCCESS

This was referenced Feb 25, 2026

[AUTOCUT] Gradle Check Flaky Test Report for ShardsLimitAllocationDeciderIT #19726

Closed

[AUTOCUT] Gradle Check Flaky Test Report for ClusterShardLimitIT #19916

Open

dzane17 force-pushed the slowlog-flaky-test branch from 72d6752 to 3eb29e8 Compare February 25, 2026 18:43

dzane17 marked this pull request as ready for review February 25, 2026 20:21

dzane17 requested a review from a team as a code owner February 25, 2026 20:21

github-actions bot added >test-failure Test failure from CI, local build, etc. autocut flaky-test Random test failure that succeeds on second run labels Feb 25, 2026

opensearch-ci-bot mentioned this pull request Feb 25, 2026

[AUTOCUT] Gradle Check Flaky Test Report for SegmentReplicationTargetServiceTests #15829

Closed

gaobinlong reviewed Feb 28, 2026

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

gaobinlong added the skip-changelog label Feb 28, 2026

Fix flaky slowlog test

cc223fc

Signed-off-by: David Zane <davizane@amazon.com>

dzane17 force-pushed the slowlog-flaky-test branch from 3eb29e8 to cc223fc Compare February 28, 2026 02:36

jainankitk reviewed Mar 3, 2026

View reviewed changes

server/src/test/java/org/opensearch/action/search/SearchRequestSlowLogTests.java Show resolved Hide resolved

jainankitk approved these changes Mar 4, 2026

View reviewed changes

jainankitk merged commit 56c726c into opensearch-project:main Mar 4, 2026
38 of 40 checks passed

opensearch-ci-bot mentioned this pull request Mar 9, 2026

[AUTOCUT] Gradle Check Flaky Test Report for DerivedSourceLeafReaderTests #20812

Closed

dzane17 deleted the slowlog-flaky-test branch March 9, 2026 22:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix flaky slowlog test#20726

Fix flaky slowlog test#20726
jainankitk merged 1 commit intoopensearch-project:mainfrom
dzane17:slowlog-flaky-test

dzane17 commented Feb 25, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 25, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 25, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 25, 2026

Uh oh!

github-actions bot commented Feb 25, 2026

Uh oh!

github-actions bot commented Feb 25, 2026

Uh oh!

codecov bot commented Feb 25, 2026 •

edited

Loading

Uh oh!

Uh oh!

github-actions bot commented Feb 28, 2026

Uh oh!

github-actions bot commented Feb 28, 2026

Uh oh!

Uh oh!

github-actions bot commented Mar 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dzane17 commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Check List

Uh oh!

github-actions bot commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit cc223fc)

Uh oh!

github-actions bot commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

github-actions bot commented Feb 25, 2026

Uh oh!

github-actions bot commented Feb 25, 2026

Uh oh!

github-actions bot commented Feb 25, 2026

Uh oh!

codecov bot commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

github-actions bot commented Feb 28, 2026

Uh oh!

github-actions bot commented Feb 28, 2026

Uh oh!

Uh oh!

github-actions bot commented Mar 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dzane17 commented Feb 25, 2026 •

edited

Loading

github-actions bot commented Feb 25, 2026 •

edited

Loading

(Review updated until commit `cc223fc`)

github-actions bot commented Feb 25, 2026 •

edited

Loading

codecov bot commented Feb 25, 2026 •

edited

Loading