-
Notifications
You must be signed in to change notification settings - Fork 13
simple export part #1009
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
simple export part #1009
Conversation
| break; | ||
| case PartitionCommand::EXPORT_PART: | ||
| { | ||
| exportPartToTable(command, query_context); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am now confused if I need to grab the DROP lock or not. MergeTreeData::lockForShare. I'll have a look soon
| return; | ||
| } | ||
|
|
||
| /// todo implement these settings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to decide what to do here
| @@ -0,0 +1,59 @@ | |||
| #pragma once | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a hack to be able to override the filenames that would be generated from the storage engine itself.
This does not look good at all, but well, it is what it is
src/Interpreters/PartLog.cpp
Outdated
| "MutatePart — Mutating of a data part has finished, " | ||
| "MovePart — Moving the data part from the one disk to another one."}, | ||
| "MovePart — Moving the data part from the one disk to another one." | ||
| "ExportPart — Exporting the data part from a merge tree table to one (e.g, object storage)."}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "ExportPart — Exporting the data part from a merge tree table to one (e.g, object storage)."}, | |
| "ExportPart — Exporting the data part from a MergeTree table into a target table that represents external storage (e.g., object storage or a data lake)."}, |
| ContextPtr /*context*/, | ||
| bool /*async_insert*/); | ||
|
|
||
| virtual bool supportsImport() const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
slightly confusing that to have export you need to do the import in the code :)
would be nice to havbe some comments in the header
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still not sure what's the best name for this set of APIs. The current names (supportsImport and import) sound too generic, I could add mergeTree, but it's not like it can only be used for MergeTree parts.
For now, it is quite similar to the write api, it just won't do any partitioning and allows the filename to be overwritten.
In the future, we'll need to check for covering & intersecting parts. At that point, we'll have two options:
- Polute
StorageObjectStoragewith MergeTree logic; Not ideal, but that's ok tbh - Create a third class that is friend with both
*MergeTreeandStorageObjectStoragethat orchestrate the parts check + simplistic sink creation. By simplistic I mean no partitioning, data lake checks and etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe smth like canAcceptExport / acceptExportedPart / receiveExport etc.? (but i'm not too convinced)
| } | ||
|
|
||
| if (!dest_storage->supportsImport()) | ||
| throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Destination storage {} does not support importing merge tree partitions", dest_storage->getName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Destination storage {} does not support importing merge tree partitions", dest_storage->getName()); | |
| throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Destination storage {} does not support importing MergeTree parts or uses unsupported partitioning", dest_storage->getName()); |
| throw Exception(ErrorCodes::INCOMPATIBLE_COLUMNS, "Tables have different structure"); | ||
|
|
||
| if (query_to_string(src_snapshot->getPartitionKeyAST()) != query_to_string(destination_snapshot->getPartitionKeyAST())) | ||
| throw Exception(ErrorCodes::BAD_ARGUMENTS, "Tables have different partition key"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually the scenario of moving from more granular partitioning to less granular should not brake any invariants (like moving part from PARTITION BY (date, customer) to PARTITION BY (date) )
But it's ok for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, agreed
| }, moves_assignee_trigger, getStorageID())); | ||
| return true; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So active moves currently can prevent exports. But i have no strong opinion about what's best here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know either. As a safety measure, I gave preference to the existing functionality over this one. And was hoping to get feedback from users and you on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| status.destination_database = manifest.destination_storage_id.database_name; | ||
| status.destination_table = manifest.destination_storage_id.table_name; | ||
| status.create_time = manifest.create_time; | ||
| status.part_name = manifest.data_part->name; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some form of progress would be nice to have here ? like bytes read / bytes total / progress like in system.merges
| {"destination_table", std::make_shared<DataTypeString>(), "Name of the destination table."}, | ||
| {"create_time", std::make_shared<DataTypeDateTime>(), "Date and time when the export command was submitted for execution."}, | ||
| {"part_name", std::make_shared<DataTypeString>(), "Name of the part"} | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some form of progress ?
|
|
||
| DROP TABLE IF EXISTS 03572_mt_table, 03572_invalid_schema_table; | ||
|
|
||
| CREATE TABLE 03572_mt_table (id UInt64, year UInt16) ENGINE = MergeTree() PARTITION BY year ORDER BY tuple(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same test by for ReplicatedMergeTree ?
|
|
||
| bool StorageObjectStorage::supportsImport() const | ||
| { | ||
| return configuration->partition_strategy != nullptr && configuration->partition_strategy_type == PartitionStrategyFactory::StrategyType::HIVE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need a better message for the user (see above)
|
CICD looks ok. Only failure is Todo list from the PR comments:
|
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Very small subset of #939
Documentation entry for user-facing changes