Implement overwrite save mode #26
Conversation
…itioned DML. Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>
Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>
Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>
spotless:apply was not fixing error that spotless:check was reporting. Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>
…nner/SpannerWriteBuilder.java Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>
… "recreate" Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>
…nner/SpannerWriteBuilder.java Temporary code. will be merged with catalog branch. Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…g class. Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>
…nner/SpannerWriteBuilder.java Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>
Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request implements the overwrite save mode for the Spanner Spark connector, supporting both truncate and recreate strategies, leveraging Spark's SupportsTruncate interface and including well-designed integration tests. However, a critical security issue was identified where the target table for truncation or recreation is determined by user-supplied write options rather than the SpannerTable instance properties. This could lead to operations being performed on unintended tables if conflicting options are provided, requiring careful validation to ensure destructive operations target the correct table. Additionally, there are suggestions to improve efficiency and exception handling in the implementation.
| String instanceId = SpannerUtils.getRequiredOption(opts, "instanceId"); | ||
| String databaseId = SpannerUtils.getRequiredOption(opts, "databaseId"); | ||
| String tableName = SpannerUtils.getRequiredOption(opts, "table"); |
There was a problem hiding this comment.
In recreateTable, the table name, instance ID, and database ID are retrieved from the user-supplied write options (opts) instead of the properties of the SpannerTable for which the WriteBuilder was created. If a user provides different values for these options in the write call, the recreate operation will be performed on the specified table instead of the intended one. This could lead to accidental or malicious data loss in a different table. It is recommended to use this.properties which contains the validated connection and table details from the SpannerTable instance.
| String projectId = SpannerUtils.getRequiredOption(opts, "projectId"); | ||
| String instanceId = SpannerUtils.getRequiredOption(opts, "instanceId"); | ||
| String databaseId = SpannerUtils.getRequiredOption(opts, "databaseId"); | ||
| String tableName = SpannerUtils.getRequiredOption(opts, "table"); |
There was a problem hiding this comment.
In truncateTable, the project ID, instance ID, database ID, and table name are retrieved from the user-supplied write options (opts) instead of the properties of the SpannerTable. This allows a user to potentially truncate an arbitrary table by providing its details in the write options, bypassing the intended target of the WriteBuilder. Additionally, these values are used without the validation performed in SpannerUtils.connectionFromProperties. It is recommended to use this.properties to ensure the operation is performed on the correct and validated target.
spark-3.1-spanner-lib/src/main/java/com/google/cloud/spark/spanner/SpannerWriteBuilder.java
Outdated
Show resolved
Hide resolved
spark-3.1-spanner-lib/src/main/java/com/google/cloud/spark/spanner/SpannerWriteBuilder.java
Outdated
Show resolved
Hide resolved
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces support for overwriteMode with truncate and recreate options, enhancing the flexibility of data writing operations. The changes include adding a truncateTableDml method in SpannerInformationSchema, updating SpannerTable to support TRUNCATE capability, and implementing the SupportsTruncate interface in SpannerWriteBuilder with logic for both overwrite modes. New integration tests have been added to validate the functionality of these overwrite modes.
spark-3.1-spanner-lib/src/main/java/com/google/cloud/spark/spanner/SpannerWriteBuilder.java
Show resolved
Hide resolved
…nner/SpannerWriteBuilder.java Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
It supports
overwriteModeoption with two possible values:truncate-- All rows in the table will be deleted. incoming dataframe must matching existing schema.recreate-- table will be dropped and recreated based on the schema of the incoming dataframe.