[Flaky Test] Fix Flaky Test SearchTimeoutIT.testSimpleTimeout#16828
[Flaky Test] Fix Flaky Test SearchTimeoutIT.testSimpleTimeout#16828reta merged 1 commit intoopensearch-project:mainfrom
Conversation
|
❌ Gradle check result for d1fc8be: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
SpecificClusterManagerNodesIT.testElectOnlyBetweenClusterManagerNodes #15944 |
|
Thanks @kkewwei , I am actually a bit surprised by your findings
The query phase has to be terminated early by timeout, right? So it should be not much longer then timeout itself? |
|
❕ Gradle check result for d1fc8be: UNSTABLE Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
@reta Yes, the query phase has to be terminated early by timeout, but it may be much longer than In additional, if we should decrease the upper interval(100m) |
Thanks @kkewwei , I think this is the problem, not the test, right? If timeout does not early terminate the query within reasonable time margin, it is not very useful. |
@reta To avoid excessive timeouts, maybe we should decrease the upper interval(100m), such as 100k? |
@kkewwei thanks for staying with me, I only briefly looked at overall implementation and it looks like we may lost some Opened up #16882 |
@reta Of course, please free free to go ahead. |
server/src/internalClusterTest/java/org/opensearch/search/SearchTimeoutIT.java
Outdated
Show resolved
Hide resolved
Signed-off-by: kkewwei <kkewwei@163.com>
|
❕ Gradle check result for 3a34c9d: UNSTABLE Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
Signed-off-by: kkewwei <kkewwei@163.com> (cherry picked from commit 7050ecf) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
(cherry picked from commit 7050ecf) Signed-off-by: kkewwei <kkewwei@163.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
…16828) Signed-off-by: kkewwei <kkewwei@163.com>
…16828) Signed-off-by: kkewwei <kkewwei@163.com>
Description
When numDocs=1000:
testSimpleTimeoutwill cost several minutes, when scoring each doc, it will cost 500ms, it's a long time to iterating all the doc inqueryphase.OpenSearch/server/src/internalClusterTest/java/org/opensearch/search/SearchTimeoutIT.java
Line 138 in 5aa6509
ReaderContextis created before executingqueryphaseand released after thefetchphase.ReaderContextis 1min(determined bysearch.keep_alive_interval)queryPhasecosts too much time, theReaderContextmay be released beforefetchphase, so thefetch/Idwill be failed, which hit the case.Related Issues
Resolves #16056 #9401
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.