[SUPPORT] Hudi partitions not dropped by Hive sync after `insert_overwrite_table` operation

**Describe the problem you faced**

After running an insert to overwrite a Hudi table inplace using `insert_overwrite_table`, partitions which no longer exist in the new input data are not removed by the Hive Sync. This causes some query engines to fail until the old partitions are manually removed (e.g. AWS Athena).

This is on Hudi 0.12.1, but I'm fairly sure this issue still exists on 0.13.0 - this change: https://github.com/apache/hudi/pull/6662 fixes this behaviour for `delete_partition` operations, but doesn't add any handling for `insert_overwrite_table`. 

I'd be happy to be proven otherwise if this is fixed in 0.13.0 - I don't have an environment to easily test this without working out how to upgrade on EMR without a release.

**To Reproduce**

Steps to reproduce the behavior:

1. Create a new Hudi table using input data with two partitions, e.g. partition_col=1, partition_col=2
2. Insert into the table using the operation `hoodie.datasource.write.operation=insert_overwrite_table` with input data containing 1/2 of the original partitions, e.g. only partition_col=2
3. Run HiveSyncTool or similar (doesn't work with Spark writer sync or HiveSyncTool)
4. Check the Hive partitions. Both partitions still exist

**Expected behavior**

I'd expect the partition which was not inserted to be removed, e.g. only partition_col=2 exists, partition_col=1 is deleted.

**Environment Description**

* Hudi version : 0.12.1

* Spark version : 3.3.1

* Hive version : AWS Glue

* Hadoop version :

* Storage (HDFS/S3/GCS..) : S3

* Running on Docker? (yes/no) : no


**Additional context**

Running on EMR 0.6.9


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SUPPORT] Hudi partitions not dropped by Hive sync after `insert_overwrite_table` operation #8114

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[SUPPORT] Hudi partitions not dropped by Hive sync after insert_overwrite_table operation #8114

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

[SUPPORT] Hudi partitions not dropped by Hive sync after `insert_overwrite_table` operation #8114