Skip to content

Implement recovery flow for the case when a slot range is completely lost #131

@voltbit

Description

@voltbit

When a leader and all its followers go out the data on the slot range assigned to them is lost, the operator can not recover from this scenario currently.

The manual procedure would be

1. scale down the operator
2. if the follower of the lost master is still up but empty, delete the pod
3. get the config of a leader, save it, change the name of the pod and the number of redis node
4. apply the config for the new pod
5. on a working leader: redis-cli cluster meet new-pod-ip port
6. repair the missing slots: redis-cli --cluster fix ip-working-leader:port --cluster-fix-with-unreachable-masters
7. rebalance: redis-cli --cluster rebalance ip-working-leader:port
8. scale up the operator, wait until it recreates the remaining follower too

The operation can be done via API or flags on the RDC resource.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions