Making shard copy count a multiple of attribute count#3462
Making shard copy count a multiple of attribute count#3462Bukhtawar merged 4 commits intoopensearch-project:mainfrom
Conversation
|
❌ Gradle Check failure 5870d5aca952b773ac4632ebf44f0093bc3193bb |
|
❌ Gradle Check failure 7fb9e069ea6c1bdcc109505f33366645630cd018 |
|
@gbbafna Are you trying to finish this? Care to explain a bit more what this accomplishes above? |
|
❌ Gradle Check failure 0f3ee95a0fa74f3c764aef072ab98f0855e03575 |
|
❌ Gradle Check failure db0cc2be50e27d2eafe05a1a8d88cca16610d159 |
|
❌ Gradle Check failure 870d6dc03ce34ae13537cc45c036cddb04512878 |
1e40a15 to
3328553
Compare
|
✅ Gradle Check success 332855374dbfcf2924bfb1c990ba1772a57ee0cc |
server/src/main/java/org/opensearch/cluster/routing/allocation/AwarenessReplicaBalance.java
Outdated
Show resolved
Hide resolved
|
✅ Gradle Check success 41e8297a97cbd3bd96c71675e10e99d19fad49b0 |
server/src/main/java/org/opensearch/cluster/routing/allocation/AwarenessReplicaBalance.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Do we also need to mention which attribute zone/rack etc?
There was a problem hiding this comment.
Good point. It will add the complexity of returning attribute as well from awarenessAttributes function , just for logging purpose. So I prefer not doing it . On seeing the error message with and without the attribute, user would anyways need to revisit awareness attributes values . Do you think this behavior is okay ?
There was a problem hiding this comment.
Can we put the multiple logic as a Predicate in the AwarenessReplicaBalance
There was a problem hiding this comment.
Done. But we still need to expose awarenessAttributes() for the validation exception .
...er/src/test/java/org/opensearch/cluster/routing/allocation/AwarenessReplicaBalanceTests.java
Outdated
Show resolved
Hide resolved
|
✅ Gradle Check success 8b401e5901a19de0bd5c36e7ac0798318e9255aa |
server/src/main/java/org/opensearch/cluster/routing/allocation/AwarenessReplicaBalance.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/cluster/metadata/MetadataUpdateSettingsService.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/cluster/routing/allocation/AwarenessReplicaBalance.java
Outdated
Show resolved
Hide resolved
Gradle Check (Jenkins) Run Completed with:
|
| final AwarenessReplicaBalance awarenessReplicaBalance = new AwarenessReplicaBalance( | ||
| settings, | ||
| clusterService.getClusterSettings() | ||
| ); |
There was a problem hiding this comment.
Do we need spl handling for system indices. How does this change modify dashboard, security and other indices
There was a problem hiding this comment.
Thanks for catching this . We definitely need to skip validation for system indices . However I realized that in OpenSearch , only OpenSearch-Dashboards create system indices. All the plugins don't extend SystemIndexPlugin and extend ActionPlugin instead , hence not creating system index . I have created an issue to discuss this .
For now, the alternate is not to apply the validation on dot indices . By that , we will not be applying it on data streams, but that should be taken care at the time of index template creation . I am implementing that for now. Let me know if you see any concerns .
| public Optional<String> validate(int replicaCount) { | ||
| if ((replicaCount + 1) % maxAwarenessAttributes() != 0) { | ||
| String errorMessage = "expected total copies needs to be a multiple of total awareness attributes [" | ||
| + maxAwarenessAttributes() | ||
| + "]"; | ||
| return Optional.of(errorMessage); | ||
| } | ||
| return Optional.empty(); | ||
| } | ||
|
|
||
| } |
There was a problem hiding this comment.
Ideally this should throw a validation exception if the validation fails and the exception can carry the message as well. The interface looks inconsistent otherwise
There was a problem hiding this comment.
I have modeled this on lines of checkShardLimit. The good thing about this is it can let the caller collect all validation errors and then throw validation exception.
… path Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>
Gradle Check (Jenkins) Run Completed with:
|
|
Would be great to see rollover tests as well. |
Gradle Check (Jenkins) Run Completed with:
|
|
Both the failing tests are passing locally and are unrelated to the change here. org.opensearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT.testRestoreSnapshotAllocationDoesNotExceedWatermark |
Gradle Check (Jenkins) Run Completed with:
|
Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
|
The backport to To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-3462-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 82ff463fd1bc07c241c8674b72182f2875a60425
# Push it to GitHub
git push --set-upstream origin backport/backport-3462-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.xThen, create a pull request where the |
…ject#3462) * Making all the copies a multiple of attribute count Signed-off-by: Gaurav Bafna <gbbafna@amazon.com> (cherry picked from commit 82ff463)
…ject#3462) * Making all the copies a multiple of attribute count Signed-off-by: Gaurav Bafna <gbbafna@amazon.com> (cherry picked from commit 82ff463)
Description
A boolean cluster level setting
cluster.routing.allocation.awareness.balancewhich is false by default . When true, we would validate that total copies is always a multiple of awareness attribute value count . If not, we will throw a validation exception. If there are multiple awareness attributes, the balance needs to ensure that largest varying attribute of awareness_attribute is equally balance. For ex, if there are 2 Awareness Attributes, zones and rack ids, each having 2 possible values , total copies will to be multiple of 2.Tested
Issues Resolved
#3461
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.