-
Notifications
You must be signed in to change notification settings - Fork 3k
Hive: Arrange common part of the code for Iceberg View. #10001
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
hive-metastore/src/main/java/org/apache/iceberg/hive/HMSParametersSetter.java
Outdated
Show resolved
Hide resolved
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java
Outdated
Show resolved
Hide resolved
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java
Outdated
Show resolved
Hide resolved
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveOperationsBase.java
Outdated
Show resolved
Hide resolved
| protected void doCommit(TableMetadata base, TableMetadata metadata) { | ||
| boolean newTable = base == null; | ||
| String newMetadataLocation = writeNewMetadataIfRequired(newTable, metadata); | ||
| boolean hiveEngineEnabled = hiveEngineEnabled(metadata, conf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All this common code has been moved to HiveOperationsBase .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my comment in https://github.com/apache/iceberg/pull/10001/files#r1533430710. Please revert these changes here. Once view support is in, we can revisit what we'll extract out
core/src/main/java/org/apache/iceberg/BaseMetastoreTableOperations.java
Outdated
Show resolved
Hide resolved
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java
Outdated
Show resolved
Hide resolved
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java
Outdated
Show resolved
Hide resolved
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java
Outdated
Show resolved
Hide resolved
| && e.getMessage().contains(String.format("new table %s already exists", to))) { | ||
| throw new org.apache.iceberg.exceptions.AlreadyExistsException( | ||
| "Table already exists: %s", to); | ||
| throwErrorForExistedToContent(fromIdentifier, removeCatalogName(toIdentifier)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to do this here, as this would do another call to retrieve the table. I would leave the code here as it was before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Motivation behind this changes is
ViewCatalogTests expects error message like Cannot rename %s to %s. %s already exists. It should explicitly tell the upstream TO tableType.
Also we are making only one call to retrieve TO content.
Nevertheless, What do you propose to match the error messages for ViewCatalogTests ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what I mean is that we shouldn't be doing whather it's doing in throwErrorForExistedToContent() at this point and just throw the AlreadyExistsException as it was done previously. We don't want do add new functionality in this PR IMO
| Maps.newHashMap(), | ||
| null, | ||
| null, | ||
| properties.getOrDefault(HiveCatalog.VIEW_ORIGINAL_TEXT, null), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this adding view-specific things?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might not be specific to Hive-View. This is another field in Hive-Table . IMO, This field either we can ignore for both iceberg-table and view or make it optional with properties.
Ref: https://github.com/apache/hive/blame/df45194268ffb10f9f7aae0ab4e3aec35a7b31d8/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Table.java#L23
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this PR should be a pure refactoring and not add new things/functionality. The current code on main doesn't look at this property, so why do we need to check this property as part of this PR?
|
@nastra , Please let me know if there are more comments ? |
core/src/main/java/org/apache/iceberg/BaseMetastoreOperations.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/BaseMetastoreOperations.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/BaseMetastoreTableOperations.java
Outdated
Show resolved
Hide resolved
| * @param newMetadataLocation newly written metadata location | ||
| * @return true if the new metadata location presents with current or previous metadata files. | ||
| */ | ||
| protected boolean checkCurrentMetadataLocation(String newMetadataLocation) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does this have to be protected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not Private , bcoz It should be available outside of the core-package. Not public as it should be only scoped too child class. Do we need public privilege here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be private. Why would it need to be available for subclasses? I don't think subclasses would call this code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends how the checkCommitStatus is being called. It it is using
BaseMetastoreTableOperations.checkCommitStatus it can be private . it it is using BaseMetastoreOperations.checkCommitStatus it should be protected.
| * Checks the new metadata location presents or not after refreshing the table. | ||
| * | ||
| * @param newMetadataLocation newly written metadata location | ||
| * @return true if the new metadata location presents with current or previous metadata files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * @return true if the new metadata location presents with current or previous metadata files. | |
| * @return true if the new metadata location is the current metadata location or present within previous metadata files. |
please also update the javadoc in L317 with a similar sentence as here
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java
Outdated
Show resolved
Hide resolved
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java
Outdated
Show resolved
Hide resolved
| TableIdentifier from, | ||
| TableIdentifier originalTo, | ||
| HiveOperationsBase.ContentType contentType) { | ||
| Preconditions.checkArgument(isValidIdentifier(from), "Invalid identifier: %s", from); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is changing existing behavior. Previously it would throw NoSuchTableException
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intent here was to have common exception for Table/View. But I think with View impl we will have to revisit this.
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java
Outdated
Show resolved
Hide resolved
| } | ||
|
|
||
| @SuppressWarnings("checkstyle:CyclomaticComplexity") | ||
| default void commitWithHiveLock( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we revert any changes in this class starting from L276 to L473? I feel like we're extracting too much functionality into a common place and some things are slightly different for tables/views later.
That's why I think it would be better to revert L276 to L473 at this point. In the first iteration of view support it's probably ok if some things are duplicated and once View support for Hive is in, we can do another round of refactoring to extract what's really common
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
Show resolved
Hide resolved
| return null; | ||
| } | ||
| } | ||
| public void setHmsParameters( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has been split into two parts , one is specific to Table like snapshot,stats, etc. Other pat is common parameters as schema, location, uuid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please revert this to how it was before
…able/View with other comments.
90e8aaa to
ac26860
Compare
| tableName, | ||
| e); | ||
| commitStatus = checkCommitStatus(newMetadataLocation, metadata); | ||
| commitStatus = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we can use the original method from BaseMetastoreTableOperations.checkCommitStatus too.
…us instead of BaseMetastoreOperations.checkCommitStatus for Hive-Table.
| import org.slf4j.Logger; | ||
| import org.slf4j.LoggerFactory; | ||
|
|
||
| public class BaseMetastoreOperations { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| public class BaseMetastoreOperations { | |
| public abstract class BaseMetastoreOperations { |
| public void fullTableNameWithDifferentValues() { | ||
| String uriTypeCatalogName = "thrift://host:port/db.table"; | ||
| String nameSpaceWithOneLevel = "ns"; | ||
| String nameSpaceWithTwoLevel = "ns.l2"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| String nameSpaceWithTwoLevel = "ns.l2"; | |
| String namespaceWithTwoLevels = "ns.l2"; |
| @Test | ||
| public void fullTableNameWithDifferentValues() { | ||
| String uriTypeCatalogName = "thrift://host:port/db.table"; | ||
| String nameSpaceWithOneLevel = "ns"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| String nameSpaceWithOneLevel = "ns"; | |
| String namespace = "ns"; |
| try { | ||
| Table table = clients.run(client -> client.getTable(fromDatabase, fromName)); | ||
| HiveOperationsBase.validateTableIsIceberg(table, fullTableName(name, from)); | ||
| validateTableIsIcebergTableOrView(contentType, name, table, from); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than passing the catalog name / table / from separately, it's probably better to just pass the table and the fullTableName: fullTableName(name, from)
|
|
||
| private void validateTableIsIcebergTableOrView( | ||
| HiveOperationsBase.ContentType contentType, | ||
| String catalogName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| String catalogName, |
| HiveOperationsBase.ContentType contentType, | ||
| String catalogName, | ||
| Table table, | ||
| TableIdentifier identifier) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| TableIdentifier identifier) { | |
| String fullTableName) { |
|
|
||
| String table(); | ||
|
|
||
| String catalogName(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should be introducing catalogName()/ contentType() to this API just for printing it in a TRACE log inside loadHmsTable(). Please remove catalogName()/ contentType() from this API
| return metaClients().run(client -> client.getTable(database(), table())); | ||
| } catch (NoSuchObjectException nte) { | ||
| LOG.trace( | ||
| "{} not found {}", contentType(), catalogName() + "." + database() + "." + table(), nte); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "{} not found {}", contentType(), catalogName() + "." + database() + "." + table(), nte); | |
| "{} not found {}", contentType(), database() + "." + table(), nte); |
this is just TRACE logging, so it should be ok to not have the catalog. I'm not even sure that logging stmt should even exist
| Optional.ofNullable(tbl.getParameters()).orElseGet(Maps::newHashMap); | ||
|
|
||
| if (!obsoleteProps.contains(TableProperties.UUID) && uuid != null) { | ||
| parameters.put(TableProperties.UUID, uuid); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In https://github.com/apache/iceberg/pull/9852/files#diff-db46657b84d66e084e15f31b8dab21577efb2ae7102863f94c6c9477782de676R206 you're passing the View UUID and I wonder whether this is actually correct to do so?
This is another example where it can be dangerous to have a common method to set something that seems could be common across tables/views.
I would suggest to remove setCommonHmsParameters() from this API
| return maxHiveTablePropertySize() > 0; | ||
| } | ||
|
|
||
| default void setSchema(TableMetadata metadata, Map<String, String> parameters) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to keep the original method (for backwards compatibiliy), but the method can refer to the new setSchema() method:
default void setSchema(TableMetadata metadata, Map<String, String> parameters) {
setSchema(metadata.schema(), parameters);
}
| } | ||
| } | ||
|
|
||
| static StorageDescriptor storageDescriptor(TableMetadata metadata, boolean hiveEngineEnabled) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above, we need to keep StorageDescriptor storageDescriptor(TableMetadata metadata, boolean hiveEngineEnabled) but it can internally call static StorageDescriptor storageDescriptor(Schema schema, String location, boolean hiveEngineEnabled)
| public void setHmsParameters( | ||
| TableMetadata metadata, Table tbl, String newMetadataLocation, Set<String> obsoleteProps) { | ||
|
|
||
| private void setHmsTableParameters( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please revert these changes in this method
| } | ||
|
|
||
| @Override | ||
| public String catalogName() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove these two methods
| .doesNotContainKey(CURRENT_SNAPSHOT_TIMESTAMP); | ||
|
|
||
| ops.setSchema(metadata, parameters); | ||
| ops.setSchema(metadata.schema(), parameters); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ops.setSchema(metadata.schema(), parameters); | |
| ops.setSchema(metadata, parameters); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to change this, as we need to keep the original method signature
…nd HiveOperationsBase.
Hive considers iceberg-view just another type of Hive-Table. So Iceberg-View on Hive should have same strategy for commit as Iceberg-Table on Hive. As part of this refactoring all the common code for table which can be used by view too has been moved to common places. Now
HiveOperationsBasehas all the common code for committing the iceberg-table and iceberg-view.This change also define a new abstract class
BaseMetastoreOperationswhich can contains common code fromBaseMetastoreTableOperationsandBaseViewOperations.Motivation behind this change to have less diff for implementing Iceberg-view on Hive.