Option to disable Distributed table creation and use only ReplicatedMergeTree for fully replicated ClickHouse clusters

Description:
Currently, when PeerDB detects a ClickHouse cluster, it automatically creates two tables:

sites_shard — using ReplicatedMergeTree engine
sites — using Distributed engine as a proxy

Our Use Case / Requirement:
We are running a fully replicated ClickHouse cluster (2 Data Nodes + 3 Keeper Nodes) where 100% of data is available on every node. We do not need data to be sharded/distributed across nodes. Our application connects directly to a single node, and since all data is present locally, there is no need for the Distributed table overhead.
We want PeerDB to create only the ReplicatedMergeTree table without the accompanying Distributed table wrapper.

Expected Behavior:
PeerDB should provide a configuration option (e.g., disable_distributed_table: true or replication_mode: replicated_only) so that only the following is created:
sql-- Expected: Only this table should be created
CREATE TABLE sites ON CLUSTER '{cluster}'
(
    id       UInt64,
    name     String,
    url      String,
    status   UInt8,
    _peerdb_synced_at DateTime DEFAULT now(),
    _peerdb_is_deleted UInt8 DEFAULT 0,
    _peerdb_version    Int64 DEFAULT 0
)
ENGINE = ReplicatedMergeTree(
    '/clickhouse/tables/{shard}/sites',
    '{replica}'
)
ORDER BY (id);
```

**Expected output when running `ON CLUSTER` DDL:**
```
Query id: fccf05f4-320e-424a-873a-a2cdfe2ff4f3

   ┌─host──┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
1. │ test1 │ 9000 │      0 │       │                   1 │                1 │
2. │ test2 │ 9000 │      0 │       │                   0 │                0 │
   └───────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘
```

Both nodes confirm successful table creation with status `0` (no errors), and the table is automatically replicated to both nodes via ClickHouse Keeper using the Raft consensus.

---

**Current Behavior:**

PeerDB creates **two tables** even in a fully replicated setup:
```
SHOW TABLES

┌─name──────────────────────┐
1. │ _peerdb_raw_testing       │
2. │ _peerdb_raw_testing_shard │
3. │ sites                     │
4. │ sites_shard               │
└───────────────────────────┘
This results in:

Unnecessary Distributed table overhead
Confusing dual-table setup for a non-sharding use case
Queries on sites_shard return local data only, while sites proxies — but since data is 100% replicated, the Distributed layer adds no value


Cluster Architecture:

<img width="1124" height="324" alt="Image" src="https://github.com/user-attachments/assets/d55e4980-8d25-4d66-86c1-741eda03cfa2" />

2 Data Nodes (test1, test2) — each holding full replica of all data
3 ClickHouse Keeper Nodes — handling coordination via Raft consensus
Engine: ReplicatedMergeTree with ZooKeeper path pattern /clickhouse/tables/{shard}/sites
Replication confirmed working — both nodes hold identical data

<img width="945" height="766" alt="Image" src="https://github.com/user-attachments/assets/7132e9d7-97f6-4d73-b0f4-73efb5862674" />


Proposed Solution:
Add a PeerDB mirror/peer configuration option such as:
yamlclickhouse_config:
  table_engine: ReplicatedMergeTree   # default: Auto (creates Distributed)
  create_distributed_table: false
Or expose this as a toggle in the PeerDB UI when setting up a ClickHouse mirror.

Why This Matters:
For teams running ClickHouse in HA/failover mode (not sharding mode), the Distributed table is unnecessary complexity. Direct ReplicatedMergeTree queries are faster, simpler to manage, and indexes/projections are easier to maintain without the dual-table confusion.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Option to disable Distributed table creation and use only ReplicatedMergeTree for fully replicated ClickHouse clusters #3957

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Option to disable Distributed table creation and use only ReplicatedMergeTree for fully replicated ClickHouse clusters #3957

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions