[WIP] Export merge tree partition to object storage #939

arthurpassos · 2025-07-28T20:17:02Z

Changelog category (leave one):

New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Implement exporting partitions from merge tree tables to object storage in a different format (e.g, parquet). The files are converted to the destination format in-memory.

Syntax: ALTER TABLE merge_tree_table EXPORT PARTITION ID 'ABC' TO TABLE 's3_hive_table'.

Related settings: allow_experimental_export_merge_tree_partition.

The destination file names and paths, for now, are decided on the destination engine (I am only testing and thinking about S3 with hive, so <table_root>/pkey1=pvalue1/.../pkeyn=pvaluen/<snowflakeid>.parquet). Most likely in the future we'll not be using snowflakeids for the filenames.
A commit file should be uploaded at the end of the execution to signal the completion of the transaction, the filename is: commit_<partition_id>_<transaction_id>. It shall contain the list of files that were uploaded in that transaction.
A partition can not be exported twice. The limitation comes from the fact upon re-export we don't have a reliable way of telling which parts should be exported (we can't duplicate data). Parts might have been merged with un-exported parts and etc.
The parts selected for an export are not locked at all. We just keep references so they are not deleted from disk, it is totally ok to mutate or merge them meanwhile.
Exports should be able to recover from hard failures/disasters (hard re-start or crash). This is controlled using export manifests that are written on disk.
Exports should be able to recover from soft failures (i.e, failed to export a given part but did not crash)
Upon re-start, exports are scheduled based on when they were created.
For now, exports are being scheduled in the same list of disk moves. I still need to decide if I'll create yet another queue or re-use one of the existing ones.
Export manifests are being written on anyDisk.
There is some half-baked observability on system.exports and system.part_log

Documentation entry for user-facing changes

...

Exclude tests:

github-actions · 2025-07-28T20:17:53Z

Workflow [PR], commit [bad3bc0]

…silly tests

arthurpassos · 2025-09-04T12:27:14Z

There is one thing I am not doing yet, but I should: somehow manage fails to schedule an export task

arthurpassos · 2025-09-04T17:50:59Z

There is one thing I am not doing yet, but I should: somehow manage fails to schedule an export task

This needs to be addressed asap

…. very hackish for now, need to improve part selection and blocking

arthurpassos · 2025-09-05T13:11:16Z

src/Storages/StorageMergeTree.cpp

 {
-    if (areBackgroundMovesNeeded())
-        background_moves_assignee.start();
+    // if (areBackgroundMovesNeeded())


I need to think about this

arthurpassos · 2025-09-05T17:55:14Z

One idea is to hold references to the data parts as soon as the request comes in instead of locking the parts for merges/mutations.

This way we allow parts to be mutated or merged, that's not a problem as long as they remain on disk. Holding references will give us that guarantee.

We just need to make sure we grab those references upon re-start as well.

…disk. first attempt

arthurpassos · 2025-09-07T19:13:37Z

List of pending things from the top of my head:

Documentation
~~No need to capture exceptions in StorageObjectStorageMergeTreePartImporterSink anymore. It is ok for a pipeline to throw an exception, we'll catch it in the task~~
Make MergeTreeExportManifest a bit safer by checking json fields existence before extracting. Consider checksums.
Add fsync support for MergeTreeExportManifest
~~Make max_retries configurable~~
Persist attempt count
~~Exports throttler~~
~~Disable parallel formatting~~
Cancel mechanism?
Correctly set up apply_deleted_mask, read_with_direct_io and prefetch upon reading from merge tree parts
Validations around https://github.com/Altinity/ClickHouse/pull/939/files#diff-a3d77682f605bf66aacbda72a660aaa789ddc37f064494e9bbe9dc934d59282eR581
Determine the state of parts we are interested in.
Fix commit filepaths with extra '/'
Tests
QA

arthurpassos · 2025-09-08T12:32:03Z

src/Storages/ObjectStorage/FilePathGenerator.h

@@ -0,0 +1,61 @@
+#pragma once


This is a workaround / refactor needed for two things:

Override storage engine filenames (it was used when we wanted to preserve part names)

Compute filenames separately (as opposed to computing it inside the sink) so we are able to build a commit file

arthurpassos · 2025-09-08T13:49:34Z

Parking it right now in favor of a simpler version

squash export mt part to obj storage

9c0be2e

arthurpassos added 3 commits July 28, 2025 17:46

fix build1

65397b8

fix build for sure

55a7ac1

extension to lower

92f2f33

svb-alt added the antalya-25.6 label Jul 29, 2025

arthurpassos added 2 commits July 29, 2025 18:03

add tests and fix prefix

37ea31f

fix test

387cae4

arthurpassos mentioned this pull request Jul 30, 2025

[Draft] Export MergeTree part to Parquet #601

Closed

30 tasks

arthurpassos added 3 commits July 30, 2025 13:46

reduce changes

43abc4c

reduce changes even further

c7003ad

some adjustments

bb156ab

svb-alt added enhancement New feature or request tiered storage Antalya Roadmap: Tiered Storage labels Jul 30, 2025

arthurpassos added 2 commits July 30, 2025 17:46

rmv unused files

bb742af

rename a few things

4bac44a

svb-alt linked an issue Aug 8, 2025 that may be closed by this pull request

ALTER TABLE EXPORT to external table #595

Closed

arthurpassos added 13 commits August 19, 2025 09:14

Merge branch 'antalya-25.6.5' into export_mt_part_to_object_storage

b02789e

rewind the part names logic

ea3a2a5

tmp

180fda8

good for a demo

45bf82b

do not drop parts, lock partition for further exports

41020a1

add partition_id to commit filename, remove unused code and refactor …

61928e4

…silly tests

simplify the code a bit

f8ad06f

rename from commit id to transaction id

1859244

use snowflakeid as transaction id

cdfa5ab

add back the sync behavior

9f9fcb2

minor changes

bfb72ae

add missing include for build

7dbb53f

freakin ai code suggestions..

2506663

arthurpassos added 8 commits September 2, 2025 13:51

fiox tests

8f171b8

implement single part task

8a51270

fix privileges test

af3352b

improve system.exports, show failed exports

502b501

opsy

34f7130

remove old partition task

a39d63a

rmv no longer used method

6663949

fix settingschangehist?

46cda68

arthurpassos added 2 commits September 4, 2025 09:27

remove exportslist

d4f01f1

small changes

4c39630

arthurpassos added 2 commits September 4, 2025 15:03

try to remove deadlock, fail

46b1724

use the background scheduler instead of scheduling upon every request…

543ff36

…. very hackish for now, need to improve part selection and blocking

arthurpassos commented Sep 5, 2025

View reviewed changes

do not lock parts, only hold references so they are not deleted from …

61e43cf

…disk. first attempt

arthurpassos added 3 commits September 7, 2025 18:19

do not capture exception in importer sink

646a69f

disable file level parallelism

20244c3

set max_retries

8f2557f

arthurpassos commented Sep 8, 2025

View reviewed changes

arthurpassos added 2 commits September 8, 2025 09:51

exports throtler

ff551f6

add comment

bad3bc0

arthurpassos closed this Sep 8, 2025

arthurpassos mentioned this pull request Sep 10, 2025

simple export part #1009

Merged

1 task

svb-alt added antalya antalya-25.8 and removed antalya-25.6 labels Oct 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Export merge tree partition to object storage #939

[WIP] Export merge tree partition to object storage #939

Uh oh!

arthurpassos commented Jul 28, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jul 28, 2025 •

edited

Loading

Uh oh!

arthurpassos commented Sep 4, 2025

Uh oh!

arthurpassos commented Sep 4, 2025

Uh oh!

arthurpassos Sep 5, 2025

Uh oh!

arthurpassos commented Sep 5, 2025

Uh oh!

arthurpassos commented Sep 7, 2025 •

edited

Loading

Uh oh!

arthurpassos Sep 8, 2025

Uh oh!

arthurpassos commented Sep 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[WIP] Export merge tree partition to object storage #939

[WIP] Export merge tree partition to object storage #939

Uh oh!

Conversation

arthurpassos commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Documentation entry for user-facing changes

Exclude tests:

Uh oh!

github-actions bot commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arthurpassos commented Sep 4, 2025

Uh oh!

arthurpassos commented Sep 4, 2025

Uh oh!

arthurpassos Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

arthurpassos commented Sep 5, 2025

Uh oh!

arthurpassos commented Sep 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arthurpassos Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

arthurpassos commented Sep 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

arthurpassos commented Jul 28, 2025 •

edited

Loading

github-actions bot commented Jul 28, 2025 •

edited

Loading

arthurpassos commented Sep 7, 2025 •

edited

Loading