Skip to content

Making shard copy count a multiple of attribute count#3462

Merged
Bukhtawar merged 4 commits intoopensearch-project:mainfrom
gbbafna:replica-count
Jul 29, 2022
Merged

Making shard copy count a multiple of attribute count#3462
Bukhtawar merged 4 commits intoopensearch-project:mainfrom
gbbafna:replica-count

Conversation

@gbbafna
Copy link
Copy Markdown
Contributor

@gbbafna gbbafna commented May 27, 2022

Description

A boolean cluster level setting cluster.routing.allocation.awareness.balance which is false by default . When true, we would validate that total copies is always a multiple of awareness attribute value count . If not, we will throw a validation exception. If there are multiple awareness attributes, the balance needs to ensure that largest varying attribute of awareness_attribute is equally balance. For ex, if there are 2 Awareness Attributes, zones and rack ids, each having 2 possible values , total copies will to be multiple of 2.

Tested

  1. Create/Update Index APIs
  2. Snapshot restore
  3. Index Template creation.

Issues Resolved

#3461

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@gbbafna gbbafna requested review from a team and reta as code owners May 27, 2022 10:10
@gbbafna gbbafna changed the title Making all copies a multiple of attribute count Making shard copy count a multiple of attribute count May 27, 2022
@opensearch-ci-bot
Copy link
Copy Markdown
Collaborator

❌   Gradle Check failure 5870d5aca952b773ac4632ebf44f0093bc3193bb
Log 5628

Reports 5628

@opensearch-ci-bot
Copy link
Copy Markdown
Collaborator

❌   Gradle Check failure 7fb9e069ea6c1bdcc109505f33366645630cd018
Log 5853

Reports 5853

@dblock dblock requested a review from andrross June 13, 2022 23:40
@dblock
Copy link
Copy Markdown
Member

dblock commented Jun 13, 2022

@gbbafna Are you trying to finish this? Care to explain a bit more what this accomplishes above?

@gbbafna gbbafna marked this pull request as draft June 14, 2022 04:00
@gbbafna
Copy link
Copy Markdown
Contributor Author

gbbafna commented Jun 14, 2022

@gbbafna Are you trying to finish this? Care to explain a bit more what this accomplishes above?

Hi @dblock , yes, I am working on this.

Please check the details in #3461 . Let me know on which part I need to give more context on .

@opensearch-ci-bot
Copy link
Copy Markdown
Collaborator

❌   Gradle Check failure 0f3ee95a0fa74f3c764aef072ab98f0855e03575
Log 5975

Reports 5975

@opensearch-ci-bot
Copy link
Copy Markdown
Collaborator

❌   Gradle Check failure db0cc2be50e27d2eafe05a1a8d88cca16610d159
Log 6020

Reports 6020

@opensearch-ci-bot
Copy link
Copy Markdown
Collaborator

❌   Gradle Check failure 870d6dc03ce34ae13537cc45c036cddb04512878
Log 6021

Reports 6021

@gbbafna gbbafna force-pushed the replica-count branch 2 times, most recently from 1e40a15 to 3328553 Compare June 15, 2022 09:15
@opensearch-ci-bot
Copy link
Copy Markdown
Collaborator

✅   Gradle Check success 332855374dbfcf2924bfb1c990ba1772a57ee0cc
Log 6023

Reports 6023

@opensearch-ci-bot
Copy link
Copy Markdown
Collaborator

✅   Gradle Check success 41e8297a97cbd3bd96c71675e10e99d19fad49b0
Log 6063

Reports 6063

@gbbafna gbbafna marked this pull request as ready for review June 17, 2022 04:12
@gbbafna gbbafna requested a review from Bukhtawar June 17, 2022 06:04
Comment on lines 1170 to 1180
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also need to mention which attribute zone/rack etc?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. It will add the complexity of returning attribute as well from awarenessAttributes function , just for logging purpose. So I prefer not doing it . On seeing the error message with and without the attribute, user would anyways need to revisit awareness attributes values . Do you think this behavior is okay ?

Comment on lines 202 to 203
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put the multiple logic as a Predicate in the AwarenessReplicaBalance

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. But we still need to expose awarenessAttributes() for the validation exception .

@gbbafna gbbafna requested a review from Bukhtawar June 27, 2022 07:50
@opensearch-ci-bot
Copy link
Copy Markdown
Collaborator

✅   Gradle Check success 8b401e5901a19de0bd5c36e7ac0798318e9255aa
Log 6358

Reports 6358

@github-actions
Copy link
Copy Markdown
Contributor

Gradle Check (Jenkins) Run Completed with:

Comment on lines +656 to +659
final AwarenessReplicaBalance awarenessReplicaBalance = new AwarenessReplicaBalance(
settings,
clusterService.getClusterSettings()
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need spl handling for system indices. How does this change modify dashboard, security and other indices

Copy link
Copy Markdown
Contributor Author

@gbbafna gbbafna Jul 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this . We definitely need to skip validation for system indices . However I realized that in OpenSearch , only OpenSearch-Dashboards create system indices. All the plugins don't extend SystemIndexPlugin and extend ActionPlugin instead , hence not creating system index . I have created an issue to discuss this .

For now, the alternate is not to apply the validation on dot indices . By that , we will not be applying it on data streams, but that should be taken care at the time of index template creation . I am implementing that for now. Let me know if you see any concerns .

Comment on lines +104 to +114
public Optional<String> validate(int replicaCount) {
if ((replicaCount + 1) % maxAwarenessAttributes() != 0) {
String errorMessage = "expected total copies needs to be a multiple of total awareness attributes ["
+ maxAwarenessAttributes()
+ "]";
return Optional.of(errorMessage);
}
return Optional.empty();
}

}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally this should throw a validation exception if the validation fails and the exception can carry the message as well. The interface looks inconsistent otherwise

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have modeled this on lines of checkShardLimit. The good thing about this is it can let the caller collect all validation errors and then throw validation exception.

… path

Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Gradle Check (Jenkins) Run Completed with:

@Bukhtawar
Copy link
Copy Markdown
Contributor

Would be great to see rollover tests as well.

@github-actions
Copy link
Copy Markdown
Contributor

Gradle Check (Jenkins) Run Completed with:

@gbbafna
Copy link
Copy Markdown
Contributor Author

gbbafna commented Jul 29, 2022

@github-actions
Copy link
Copy Markdown
Contributor

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Copy Markdown
Contributor

Gradle Check (Jenkins) Run Completed with:

@Bukhtawar Bukhtawar merged commit 82ff463 into opensearch-project:main Jul 29, 2022
@Bukhtawar Bukhtawar added the backport 2.x Backport to 2.x branch label Jul 29, 2022
@opensearch-trigger-bot
Copy link
Copy Markdown
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-3462-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 82ff463fd1bc07c241c8674b72182f2875a60425
# Push it to GitHub
git push --set-upstream origin backport/backport-3462-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-3462-to-2.x.

gbbafna added a commit to gbbafna/OpenSearch that referenced this pull request Aug 1, 2022
…ject#3462)

* Making all the copies a multiple of attribute count

Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>
(cherry picked from commit 82ff463)
gbbafna added a commit to gbbafna/OpenSearch that referenced this pull request Aug 2, 2022
…ject#3462)

* Making all the copies a multiple of attribute count

Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>
(cherry picked from commit 82ff463)
Bukhtawar pushed a commit that referenced this pull request Aug 2, 2022
* [Backport 2.x] Making all the copies a multiple of attribute count

Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 2.x Backport to 2.x branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants