-
Notifications
You must be signed in to change notification settings - Fork 174
Description
When a CDC mirror is created with multiple tables (e.g. 5 tables), and one table fails during the initial snapshot phase, the entire mirror gets stuck in an infinite retry loop on the failed table. Successfully snapshotted tables are blocked, and there is no way to skip, pause, edit, or resume without deleting the entire mirror.
To Reproduce
Create a CDC mirror from MariaDB → ClickHouse with 5 tables, all with initial snapshot enabled.
Tables 1 and 2 complete initial snapshot successfully.
Table 3 has a misconfigured partition column (toYYYYMM(crt_dt) — a ClickHouse function — was used as the watermark/partition key, which is invalid on the MariaDB source side).
Table 3 fails with:
failed to get partitions from source: ERROR 1054 (42S22): Unknown column 'toYYYYMM(crt_dt)' in 'SELECT'
The mirror does not proceed to Tables 4 and 5. It retries Table 3 in an infinite loop.
There is no option to pause the mirror, skip the failing table, edit the configuration, or resume from a checkpoint.
Expected Behavior
Option A — Fault isolation per table: If one table's initial snapshot fails, the mirror should skip that table (marking it as failed), continue snapshotting the remaining tables, and report a partial success with clear status per table.
Option B — Pause + Edit + Resume: The mirror should allow the user to:
Pause the mirror mid-snapshot
Edit the failing table's configuration (e.g. fix the partition/watermark column)
Resume only the failed table's snapshot without re-running already-completed tables
Either approach is acceptable. Ideally both should be supported.
Current Behavior
Mirror is permanently stuck in a retry loop on the failed table
Already-completed tables (1 & 2) are not making CDC progress because the mirror never exits the snapshot phase
The only recovery option is to delete the entire mirror and start over, losing all snapshot progress on Tables 1 & 2
No UI option to pause, edit the table mapping, or resume a partial snapshot
Root Cause (in this case)
The partition/watermark expression toYYYYMM(crt_dt) is ClickHouse-specific syntax and was mistakenly used as the source-side partition column. PeerDB sends this expression directly to MariaDB in a SELECT, which MariaDB does not understand. Validation at mirror creation time should catch this and reject ClickHouse function expressions in the source partition column field.
Suggested Fixes
Input validation: At mirror creation/edit time, validate that the watermark/partition column is a plain column name and not a function expression. Show an error before the mirror is created.
Per-table error isolation: Don't let one table's failure block the rest of the mirror's snapshot progress.
Pause/Edit/Resume: Allow pausing a mirror that is stuck in snapshot phase, editing individual table configurations, and resuming only failed tables.
Mirror status per table: Show per-table status in the UI (Completed / In Progress / Failed) so users can see exactly which table is the problem.
Environment
PeerDB Version: [0.36.7]
Source: MariaDB
Destination: ClickHouse
Deployment: [Docker Compose]
