Skip to content

Search operation test flakiness fix#3602

Merged
stephen-crawford merged 49 commits intoopensearch-project:mainfrom
MaciejMierzwa:SearchOperationTest-flakiness-fix
Dec 14, 2023
Merged

Search operation test flakiness fix#3602
stephen-crawford merged 49 commits intoopensearch-project:mainfrom
MaciejMierzwa:SearchOperationTest-flakiness-fix

Conversation

@MaciejMierzwa
Copy link
Copy Markdown
Contributor

@MaciejMierzwa MaciejMierzwa commented Oct 26, 2023

Description

test PR

Issues Resolved

Similar to this task: #1917
From what I've noticed some logs are duplicated on faster machines. During test creation audit logging results were added based on actual results produced by tests. Now if there are slower moments where logging produces non-duplicated logs, those were marked as failed.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Maciej Mierzwa <dev.maciej.mierzwa@gmail.com>
Signed-off-by: Maciej Mierzwa <dev.maciej.mierzwa@gmail.com>
@codecov
Copy link
Copy Markdown

codecov bot commented Oct 26, 2023

Codecov Report

Merging #3602 (971d26a) into main (5114bb7) will increase coverage by 0.07%.
Report is 25 commits behind head on main.
The diff coverage is 100.00%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3602      +/-   ##
==========================================
+ Coverage   65.24%   65.32%   +0.07%     
==========================================
  Files         297      298       +1     
  Lines       21129    21159      +30     
  Branches     3451     3455       +4     
==========================================
+ Hits        13786    13822      +36     
- Misses       5643     5644       +1     
+ Partials     1700     1693       -7     
Files Coverage Δ
...earch/security/resolver/IndexResolverReplacer.java 69.35% <100.00%> (+0.24%) ⬆️

... and 7 files with indirect coverage changes

Signed-off-by: Maciej Mierzwa <dev.maciej.mierzwa@gmail.com>
Signed-off-by: Maciej Mierzwa <dev.maciej.mierzwa@gmail.com>
Signed-off-by: Maciej Mierzwa <dev.maciej.mierzwa@gmail.com>
Signed-off-by: Maciej Mierzwa <dev.maciej.mierzwa@gmail.com>
Signed-off-by: Maciej Mierzwa <dev.maciej.mierzwa@gmail.com>
Signed-off-by: Maciej Mierzwa <dev.maciej.mierzwa@gmail.com>
Signed-off-by: Maciej Mierzwa <dev.maciej.mierzwa@gmail.com>
Signed-off-by: Maciej Mierzwa <dev.maciej.mierzwa@gmail.com>
Signed-off-by: Maciej Mierzwa <dev.maciej.mierzwa@gmail.com>
@MaciejMierzwa MaciejMierzwa force-pushed the SearchOperationTest-flakiness-fix branch from d3a4a2c to 6296d7d Compare November 2, 2023 14:51
Signed-off-by: Maciej Mierzwa <dev.maciej.mierzwa@gmail.com>
@MaciejMierzwa MaciejMierzwa force-pushed the SearchOperationTest-flakiness-fix branch from 6296d7d to b3bf553 Compare November 2, 2023 15:04
Signed-off-by: Maciej Mierzwa <dev.maciej.mierzwa@gmail.com>
Signed-off-by: Maciej Mierzwa <dev.maciej.mierzwa@gmail.com>
Signed-off-by: Maciej Mierzwa <dev.maciej.mierzwa@gmail.com>
Signed-off-by: Maciej Mierzwa <dev.maciej.mierzwa@gmail.com>
Signed-off-by: MaciejMierzwa <dev.maciej.mierzwa@gmail.com>
Signed-off-by: Maciej Mierzwa <dev.maciej.mierzwa@gmail.com>
@MaciejMierzwa
Copy link
Copy Markdown
Contributor Author

MaciejMierzwa commented Nov 7, 2023

Cople things cause the flakiness.

  1. The way requests are performed on the cluster. When HighLevelRestClient is created depending on availability, nodes are fetched in random order. Then during https calls available nodes fetched in the first step are rotated using round robin. Because of that rotation with each run different nodes might get request which causes different amount of traffic between the nodes.
  2. Another thing causing a lot of internal traffic was internal shard allocation within the cluster there might be different amount of transport messages. For some cases what seems to be helpful is to tightly control index settings and to use fewer shards. Tests containing bulk requests are example of that.
  3. For tests dealing with snapshot creation Awaitility library is used. During check for snapshot creation, the default poll interval is 100 miliseconds, if the call wasn't succesfull in this time it would perform another request, increasing number of audit logs.

Signed-off-by: Maciej Mierzwa <dev.maciej.mierzwa@gmail.com>
Signed-off-by: Maciej Mierzwa <dev.maciej.mierzwa@gmail.com>
Signed-off-by: Maciej Mierzwa <dev.maciej.mierzwa@gmail.com>
Signed-off-by: Maciej Mierzwa <dev.maciej.mierzwa@gmail.com>
peternied
peternied previously approved these changes Dec 1, 2023
Copy link
Copy Markdown
Contributor

@stephen-crawford stephen-crawford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we cannot get a hard count for the transport messages and still have to use the "at Least" version? If so, please add it as documentation somewhere in the change otherwise please swap to the exact versions. Otherwise looks good!

Signed-off-by: Maciej Mierzwa <dev.maciej.mierzwa@gmail.com>
Signed-off-by: Maciej Mierzwa <dev.maciej.mierzwa@gmail.com>
@MaciejMierzwa MaciejMierzwa force-pushed the SearchOperationTest-flakiness-fix branch from 51775e7 to 971d26a Compare December 12, 2023 15:42
@peternied
Copy link
Copy Markdown
Member

@scrawfor99 @DarshitChanpura @cwperks Want to take another look?

Copy link
Copy Markdown
Contributor

@stephen-crawford stephen-crawford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Thanks Maciej!

@stephen-crawford stephen-crawford merged commit 9da4a78 into opensearch-project:main Dec 14, 2023
@opensearch-trigger-bot
Copy link
Copy Markdown
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/security/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/security/backport-2.x
# Create a new branch
git switch --create backport/backport-3602-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 9da4a78f237f238bfebb5ecd2dff58a814927542
# Push it to GitHub
git push --set-upstream origin backport/backport-3602-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/security/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-3602-to-2.x.

@peternied
Copy link
Copy Markdown
Member

@MaciejMierzwa Could you make the backport to 2.x for this change?

MaciejMierzwa added a commit to MaciejMierzwa/security that referenced this pull request Dec 18, 2023
Search operation test flakiness fix
- opensearch-project#3426
- opensearch-project#2141
- opensearch-project#2169

Similar to this task:
opensearch-project#1917
From what I've noticed some logs are duplicated on faster machines.
During test creation audit logging results were added based on actual
results produced by tests. Now if there are slower moments where logging
produces non-duplicated logs, those were marked as failed.

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and
signing off your commits, please check
[here](https://github.com/opensearch-project/OpenSearch/blob/main/CONTRIBUTING.md#developer-certificate-of-origin).

---------

Signed-off-by: Maciej Mierzwa <dev.maciej.mierzwa@gmail.com>
Signed-off-by: MaciejMierzwa <dev.maciej.mierzwa@gmail.com>
(cherry picked from commit 9da4a78)
MaciejMierzwa added a commit to MaciejMierzwa/security that referenced this pull request Dec 18, 2023
Search operation test flakiness fix
- opensearch-project#3426
- opensearch-project#2141
- opensearch-project#2169

Similar to this task:
opensearch-project#1917
From what I've noticed some logs are duplicated on faster machines.
During test creation audit logging results were added based on actual
results produced by tests. Now if there are slower moments where logging
produces non-duplicated logs, those were marked as failed.

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and
signing off your commits, please check
[here](https://github.com/opensearch-project/OpenSearch/blob/main/CONTRIBUTING.md#developer-certificate-of-origin).

---------

Signed-off-by: Maciej Mierzwa <dev.maciej.mierzwa@gmail.com>
Signed-off-by: MaciejMierzwa <dev.maciej.mierzwa@gmail.com>
(cherry picked from commit 9da4a78)
Signed-off-by: Maciej Mierzwa <dev.maciej.mierzwa@gmail.com>
MaciejMierzwa added a commit to MaciejMierzwa/security that referenced this pull request Dec 18, 2023
Search operation test flakiness fix
- opensearch-project#3426
- opensearch-project#2141
- opensearch-project#2169

Similar to this task:
opensearch-project#1917
From what I've noticed some logs are duplicated on faster machines.
During test creation audit logging results were added based on actual
results produced by tests. Now if there are slower moments where logging
produces non-duplicated logs, those were marked as failed.

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and
signing off your commits, please check
[here](https://github.com/opensearch-project/OpenSearch/blob/main/CONTRIBUTING.md#developer-certificate-of-origin).

---------

Signed-off-by: Maciej Mierzwa <dev.maciej.mierzwa@gmail.com>
Signed-off-by: MaciejMierzwa <dev.maciej.mierzwa@gmail.com>
(cherry picked from commit 9da4a78)
Signed-off-by: Maciej Mierzwa <dev.maciej.mierzwa@gmail.com>
stephen-crawford pushed a commit that referenced this pull request Dec 19, 2023
### Description
Integration test PR. I followed directions from
#3602 (comment)
not sure if it should look like that though.

### Issues Resolved
This is backport
#3426
#3602

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and
signing off your commits, please check
[here](https://github.com/opensearch-project/OpenSearch/blob/main/CONTRIBUTING.md#developer-certificate-of-origin).

Signed-off-by: Maciej Mierzwa <dev.maciej.mierzwa@gmail.com>
Signed-off-by: MaciejMierzwa <dev.maciej.mierzwa@gmail.com>
dlin2028 pushed a commit to dlin2028/security that referenced this pull request May 1, 2024
### Description
test PR 
### Issues Resolved
- opensearch-project#3426
- opensearch-project#2141
- opensearch-project#2169

Similar to this task:
opensearch-project#1917
From what I've noticed some logs are duplicated on faster machines.
During test creation audit logging results were added based on actual
results produced by tests. Now if there are slower moments where logging
produces non-duplicated logs, those were marked as failed.

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and
signing off your commits, please check
[here](https://github.com/opensearch-project/OpenSearch/blob/main/CONTRIBUTING.md#developer-certificate-of-origin).

---------

Signed-off-by: Maciej Mierzwa <dev.maciej.mierzwa@gmail.com>
Signed-off-by: MaciejMierzwa <dev.maciej.mierzwa@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants