Skip to content

Fix flaky slowlog test#20726

Merged
jainankitk merged 1 commit intoopensearch-project:mainfrom
dzane17:slowlog-flaky-test
Mar 4, 2026
Merged

Fix flaky slowlog test#20726
jainankitk merged 1 commit intoopensearch-project:mainfrom
dzane17:slowlog-flaky-test

Conversation

@dzane17
Copy link
Copy Markdown
Member

@dzane17 dzane17 commented Feb 25, 2026

Description

The testMultipleSlowLoggersUseSingleLog4jLogger test was flaky because it compared total logger counts in the LoggerContext before and after creating SearchRequestSlowLog instances. Other code paths (ClusterService creation, test framework, parallel tests) could register additional loggers between measurements, causing intermittent failures.

Fixed by directly verifying the same logger instance is reused via assertSame(logger1, logger2) instead of counting total loggers.

Related Issues

Resolves #20665

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 25, 2026

PR Reviewer Guide 🔍

(Review updated until commit cc223fc)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Test Validity

The test retrieves logger1 and logger2 from the same LoggerContext using the same logger name after each SearchRequestSlowLog creation. Since Log4j always returns the same logger instance for the same name from the same context, assertSame(logger1, logger2) will always pass regardless of whether SearchRequestSlowLog internally reuses or creates new loggers. The test may not actually validate the intended behavior (that SearchRequestSlowLog reuses a single Log4j logger).

Logger logger1 = context.getLogger(loggerName);

ClusterService clusterService2 = ClusterServiceUtils.createClusterService(
    Settings.EMPTY,
    new ClusterSettings(Settings.EMPTY, ClusterSettings.BUILT_IN_CLUSTER_SETTINGS),
    null
);
new SearchRequestSlowLog(clusterService2);
Logger logger2 = context.getLogger(loggerName);

assertSame(logger1, logger2);

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 25, 2026

PR Code Suggestions ✨

Latest suggestions up to cc223fc

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Close resources after test assertions

The ClusterService instances created in the test are not being closed after use,
which may cause resource leaks and interfere with other tests. Both clusterService1
and clusterService2 should be closed after the assertions, ideally using a
try-finally block or by calling clusterService1.close() and clusterService2.close()
at the end of the test.

server/src/test/java/org/opensearch/action/search/SearchRequestSlowLogTests.java [98-109]

 new SearchRequestSlowLog(clusterService1);
 Logger logger1 = context.getLogger(loggerName);
 
 ClusterService clusterService2 = ClusterServiceUtils.createClusterService(
     Settings.EMPTY,
     new ClusterSettings(Settings.EMPTY, ClusterSettings.BUILT_IN_CLUSTER_SETTINGS),
     null
 );
 new SearchRequestSlowLog(clusterService2);
 Logger logger2 = context.getLogger(loggerName);
 
+assertSame(logger1, logger2);
+
+clusterService1.close();
+clusterService2.close();
+
Suggestion importance[1-10]: 5

__

Why: The suggestion to close clusterService1 and clusterService2 after the test is valid to prevent resource leaks. However, the improved_code moves the assertSame call before the close calls, which changes the structure slightly but is functionally correct. This is a minor improvement for test hygiene.

Low

Previous suggestions

Suggestions up to commit 3eb29e8
CategorySuggestion                                                                                                                                    Impact
General
Close ClusterService instances properly

The created ClusterService instances are not being closed, which may lead to
resource leaks. Consider using try-with-resources or explicitly closing them after
the test completes to ensure proper cleanup of resources.

server/src/test/java/org/opensearch/action/search/SearchRequestSlowLogTests.java [93-107]

-ClusterService clusterService1 = ClusterServiceUtils.createClusterService(
+try (ClusterService clusterService1 = ClusterServiceUtils.createClusterService(
     Settings.EMPTY,
     new ClusterSettings(Settings.EMPTY, ClusterSettings.BUILT_IN_CLUSTER_SETTINGS),
     null
-);
-new SearchRequestSlowLog(clusterService1);
-Logger logger1 = context.getLogger(loggerName);
+)) {
+    new SearchRequestSlowLog(clusterService1);
+    Logger logger1 = context.getLogger(loggerName);
 
-ClusterService clusterService2 = ClusterServiceUtils.createClusterService(
-    Settings.EMPTY,
-    new ClusterSettings(Settings.EMPTY, ClusterSettings.BUILT_IN_CLUSTER_SETTINGS),
-    null
-);
-new SearchRequestSlowLog(clusterService2);
-Logger logger2 = context.getLogger(loggerName);
+    try (ClusterService clusterService2 = ClusterServiceUtils.createClusterService(
+        Settings.EMPTY,
+        new ClusterSettings(Settings.EMPTY, ClusterSettings.BUILT_IN_CLUSTER_SETTINGS),
+        null
+    )) {
+        new SearchRequestSlowLog(clusterService2);
+        Logger logger2 = context.getLogger(loggerName);
+        
+        assertSame(logger1, logger2);
+    }
+}
Suggestion importance[1-10]: 5

__

Why: The suggestion correctly identifies that ClusterService instances should be closed to prevent resource leaks. However, the improved code has a structural issue: logger1 would go out of scope before the assertion, making the test fail to compile. A better approach would be to close the services after the assertion or use a different cleanup mechanism.

Low
Suggestions up to commit 72d6752
CategorySuggestion                                                                                                                                    Impact
General
Close cluster service resources properly

The created ClusterService instances are not being closed, which may lead to
resource leaks. Consider using try-with-resources or explicitly closing these
services in a cleanup method to ensure proper resource management.

server/src/test/java/org/opensearch/action/search/SearchRequestSlowLogTests.java [93-107]

-ClusterService clusterService1 = ClusterServiceUtils.createClusterService(
+try (ClusterService clusterService1 = ClusterServiceUtils.createClusterService(
     Settings.EMPTY,
     new ClusterSettings(Settings.EMPTY, ClusterSettings.BUILT_IN_CLUSTER_SETTINGS),
     null
-);
-new SearchRequestSlowLog(clusterService1);
-...
-ClusterService clusterService2 = ClusterServiceUtils.createClusterService(
-    Settings.EMPTY,
-    new ClusterSettings(Settings.EMPTY, ClusterSettings.BUILT_IN_CLUSTER_SETTINGS),
-    null
-);
-new SearchRequestSlowLog(clusterService2);
+)) {
+    new SearchRequestSlowLog(clusterService1);
+    Logger logger1 = context.getLogger(loggerName);
+    
+    try (ClusterService clusterService2 = ClusterServiceUtils.createClusterService(
+        Settings.EMPTY,
+        new ClusterSettings(Settings.EMPTY, ClusterSettings.BUILT_IN_CLUSTER_SETTINGS),
+        null
+    )) {
+        new SearchRequestSlowLog(clusterService2);
+        Logger logger2 = context.getLogger(loggerName);
+        
+        assertSame(logger1, logger2);
+    }
+}
Suggestion importance[1-10]: 5

__

Why: The suggestion correctly identifies potential resource leaks with unclosed ClusterService instances. However, the improved code structure with nested try-with-resources blocks would prevent logger1 from being accessible for the assertSame comparison, making the suggested implementation problematic. The concern is valid but the solution needs refinement.

Low

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 72d6752: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 3eb29e8

@github-actions
Copy link
Copy Markdown
Contributor

❕ Gradle check result for 3eb29e8: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@codecov
Copy link
Copy Markdown

codecov bot commented Feb 25, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.25%. Comparing base (59be6ae) to head (cc223fc).
⚠️ Report is 9 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #20726      +/-   ##
============================================
- Coverage     73.29%   73.25%   -0.05%     
- Complexity    72088    72100      +12     
============================================
  Files          5794     5794              
  Lines        329733   329733              
  Branches      47577    47577              
============================================
- Hits         241664   241532     -132     
- Misses        68612    68788     +176     
+ Partials      19457    19413      -44     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dzane17 dzane17 marked this pull request as ready for review February 25, 2026 20:21
@dzane17 dzane17 requested a review from a team as a code owner February 25, 2026 20:21
@github-actions github-actions bot added >test-failure Test failure from CI, local build, etc. autocut flaky-test Random test failure that succeeds on second run labels Feb 25, 2026
Signed-off-by: David Zane <davizane@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit cc223fc

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for cc223fc: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 4, 2026

✅ Gradle check result for cc223fc: SUCCESS

@jainankitk jainankitk merged commit 56c726c into opensearch-project:main Mar 4, 2026
38 of 40 checks passed
@dzane17 dzane17 deleted the slowlog-flaky-test branch March 9, 2026 22:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

autocut flaky-test Random test failure that succeeds on second run skip-changelog >test-failure Test failure from CI, local build, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[AUTOCUT] Gradle Check Flaky Test Report for SearchRequestSlowLogTests

3 participants