Skip to content

HA: issue with resync of a replica when journal is not available  #1286

@lvca

Description

@lvca

In the case a replica is out of the cluster for a long time (or the replication log gets canceled or unavailable for any reason), the replica could never go back online if the cluster is under heavy load.

The underlying reason is that a full backup is needed, but under load, the fresh backup for the replica does not contain a snapshot of the database when it was requested, but a live copy.

So the solution is to apply the pending transactions happening during the backup after the backup is installed in the replica, but also ignore the pages that are already updated. A sort of overwrite mode where the MVCC is off until the replica is back online.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions