Skip to content

[META] Making all copies of shards spread evenly across all Awareness Attribute #3367

@gbbafna

Description

@gbbafna

Is your feature request related to a problem? Please describe.

In cloud HA deployments , customer usually deploy over multiple zones. zone is usually the awareness.attributes in there . However, there is no enforcement of all copies spread evenly across all zones . This can cause uneven distribution of shards and also create shard hotspots. Failure in a single zone might also cause data loss and unavailability for that shard if the copies aren't evenly spread out.

Describe the solution you'd like

There are two solutions to this approach :

  1. [Choosen Approach]A boolean cluster level setting routing.allocation.awareness.balance which is false by default . When true, we would validate that total copies is always a maximum of awareness attribute value count . If not, we will throw a validation exception. If there are multiple awareness attributes, the balance needs to ensure that every variant of awareness_attribute is equally balance. For ex, if there are 2 Awareness Attributes, zones and rack ids, each having 2 possible values , total copies needs to be multiple of 2.
  2. A boolean cluster level setting auto_balance_across_awareness_attribute. If this is true, we would increase the total copies to be a multiple of AZ count . For instance, there are 3 AZs and index creation request comes with 7 replica. OpenSearch will create 8 replica, to ensure that there are total 9 copies .

Both the solutions will take in effect only upon cluster.routing.allocation.awareness.attributes and cluster.routing.allocation.awareness.force.zone.values being set . If not, the setting will not take in effect .

Trade offs

First approach : The plugins like ISM, CCR needs to do proactive validation while creation and updation of policy. If not, the actions/replication will fail silently at later point of time. As and when new policies or index creation paths are created , we will need to keep adding the validation there for a good experience.

Second approach : Since the replica count is adjusted by OpenSearch, the plugin and new index creation/modification paths don't need any handling and is very low maintenance. However, the fact that we are deviating from API supplied parameter may not look like a good user experience.

User Experience

  1. User sets cluster.routing.allocation.awareness.attributes and cluster.routing.allocation.awareness.force.zone.values
  2. If user enables routing.allocation.awareness.balance , the total copy needs to be a maximum of all possible values of awareness attribute. If not , we will do one of the following
  • Reject the create/update index
  • Auto expand the replica count as per need.

Why it should be built

This is to ensure that OpenSearch cluster remains well balanced as well as resilient to failures of zone/Rack etc.

What will it take to execute?

Changes in OpenSearch as well Plugins to honor the new flag .

Metadata

Metadata

Assignees

No one assigned

    Labels

    discussIssues intended to help drive brainstorming and decision makingenhancementEnhancement or improvement to existing feature or requestv2.2.0

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions