-
Notifications
You must be signed in to change notification settings - Fork 12
Add {_snowflake_id} wildcard support to object storage #789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Cheap alternative to #697 The |
src/Core/Settings.cpp
Outdated
| Note that initially (24.12) there was a server setting (`send_settings_to_client`), but latter it got replaced with this client setting, for better usability. | ||
| )", 0) \ | ||
| DECLARE(Bool, object_storage_treat_key_wildcard_as_star, false, R"( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Three options here:
- Off by default
- On by default
- No setting at all, default behavior
ianton-ru
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pair of minor comments
| configuration->update(object_storage, local_context); | ||
|
|
||
| if (partition_by && configuration->withPartitionWildcard()) | ||
| auto config_clone = configuration->clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We make a clone every time, but actually change only in specific cases.
May be make here a smart_ptr on original config, and make a clone only when required?
|
|
||
| bool StorageObjectStorage::Configuration::withSnowflakeIdWildcard() const | ||
| { | ||
| static const String PARTITION_ID_WILDCARD = "{_snowflake_id}"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SNOWFLAKE_ID_WILDCARD
| Note that initially (24.12) there was a server setting (`send_settings_to_client`), but latter it got replaced with this client setting, for better usability. | ||
| )", 0) \ | ||
| DECLARE(Bool, object_storage_treat_key_related_wildcards_as_star, false, R"( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Three options here:
- make it the default behavior, not behind a setting
- make the setting on by default
- keep it off by default
|
From the usability point of view ChatGPT suggest to consider using |
That sounds like a terrible idea |
Add {_snowflake_id} wildcard support to object storage paths. Upon writing, ClickHouse will generate a snowflakeid on the fly and replace the wildcard. This will help us with parallel and concurrent writes to object storage.
Also introduce a new setting
object_storage_treat_key_related_wildcards_as_starto allow symmetrical reads & writes using a single table. Why is it needed? Consider the following:CREATE TABLE ... s3('path_to_table_root/**.parquet')Ok, we can select from it, but how do we write? How do we name the files? In which directory?
Therefore, we introduced the snowflake id.
CREATE TABLE ... s3('path_to_table_root/{_snowflake_id}.parquet')- we can now write to it because we know the file location and we know how to name it.But how do we read now? The path isn't globbed anymore. That's what the setting is for.
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Add {_snowflake_id} wildcard support to object storage paths. Also add a new setting
object_storage_treat_key_related_wildcards_as_starto allow symmetrical reads & writes using a single table.Documentation entry for user-facing changes