Fail weight update when decommission ongoing and fail decommission when attribute not weighed away#4839
Conversation
Signed-off-by: Rishab Nahata <rnnahata@amazon.com>
Signed-off-by: Rishab Nahata <rnnahata@amazon.com>
Signed-off-by: Rishab Nahata <rnnahata@amazon.com>
Signed-off-by: Rishab Nahata <rnnahata@amazon.com>
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
| WeightedRouting weightedRouting = weightedRoutingMetadata.getWeightedRouting(); | ||
| if (weightedRouting.attributeName().equals(decommissionAttribute.attributeName())) { | ||
| Double attributeValueWeight = weightedRouting.weights().get(decommissionAttribute.attributeValue()); | ||
| if (attributeValueWeight == null || attributeValueWeight != 0.0) { |
There was a problem hiding this comment.
is this value float or double, should we make specific comparisons. For instance Double.valueOf(0.0)
There was a problem hiding this comment.
It is Double. using .equals()
server/src/main/java/org/opensearch/cluster/decommission/DecommissionService.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/cluster/routing/WeightedRoutingService.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Rishab Nahata <rnnahata@amazon.com>
494a7a0 to
05eb55a
Compare
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Codecov Report
@@ Coverage Diff @@
## main #4839 +/- ##
============================================
- Coverage 70.73% 70.70% -0.03%
- Complexity 57884 57894 +10
============================================
Files 4689 4689
Lines 277296 277314 +18
Branches 40363 40367 +4
============================================
- Hits 196137 196075 -62
- Misses 64873 64968 +95
+ Partials 16286 16271 -15
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
|
|
||
| public void ensureNoOngoingDecommissionAction(ClusterState state) { | ||
| DecommissionAttributeMetadata decommissionAttributeMetadata = state.metadata().decommissionAttributeMetadata(); | ||
| if (decommissionAttributeMetadata != null && decommissionAttributeMetadata.status().equals(DecommissionStatus.FAILED) == false) { |
There was a problem hiding this comment.
What if status is SUCCESSFUL?
There was a problem hiding this comment.
Only when Decommission has failed, we can allow the weight update can go through. Not for any other status
There was a problem hiding this comment.
Isn't SUCCESSFUL a final state? The error message of this if condition says: a decommission action is ongoing with status...
There was a problem hiding this comment.
A succesful decommission state means the transport execution has completed. The nodes are decommissioned in this state (not able to start pre voting or join). Hence DecommissionAction ongoing means a zone is decommissioned and the cluster rejects pings from decommissioned node
|
Build is failing, please fix the build. |
|
|
||
| public void ensureNoOngoingDecommissionAction(ClusterState state) { | ||
| DecommissionAttributeMetadata decommissionAttributeMetadata = state.metadata().decommissionAttributeMetadata(); | ||
| if (decommissionAttributeMetadata != null && decommissionAttributeMetadata.status().equals(DecommissionStatus.FAILED) == false) { |
There was a problem hiding this comment.
If decommission status is FAILED can certain nodes be out of service or decommissioned while others not as in partially failed?
There was a problem hiding this comment.
No, if the decommission is FAILED, joins or prevoting is not blocked on those nodes and hence if some nodes are out of service (which can happen), the weights request can go through
There was a problem hiding this comment.
But then the zone isn't fully equipped to take traffic right like 90% nodes are out of service and we place weights 10:1:1 when 10 is for the zone only operating at 10% capacity?
Bukhtawar
left a comment
There was a problem hiding this comment.
Left a comment, LGTM otherwise
Signed-off-by: Rishab Nahata <rnnahata@amazon.com>
Gradle Check (Jenkins) Run Completed with:
|
…en attribute not weighed away (opensearch-project#4839) * Add checks for decommission before setting weights Signed-off-by: Rishab Nahata <rnnahata@amazon.com>
* Fail weight update when decommission ongoing and fail decommission when attribute not weighed away (#4839) * Add changes for graceful node decommission (#4586) * Add delay timeout for decommission request (#4931) * Integ Tests for Awareness Attribute Decommissioning (#4715) Signed-off-by: Rishab Nahata <rnnahata@amazon.com> Signed-off-by: pranikum <109206473+pranikum@users.noreply.github.com>
* Fail weight update when decommission ongoing and fail decommission when attribute not weighed away (opensearch-project#4839) * Add changes for graceful node decommission (opensearch-project#4586) * Add delay timeout for decommission request (opensearch-project#4931) * Integ Tests for Awareness Attribute Decommissioning (opensearch-project#4715) Signed-off-by: Rishab Nahata <rnnahata@amazon.com> Signed-off-by: pranikum <109206473+pranikum@users.noreply.github.com>
…en attribute not weighed away (opensearch-project#4839) * Add checks for decommission before setting weights Signed-off-by: Rishab Nahata <rnnahata@amazon.com>
Description
Issues Resolved
#4838
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.