Skip to content

Implement overwrite save mode #26

Merged
MaxKsyunz merged 16 commits intointeg/overwrite_save_modefrom
dev/overwrite_mode_v3
Mar 10, 2026
Merged

Implement overwrite save mode #26
MaxKsyunz merged 16 commits intointeg/overwrite_save_modefrom
dev/overwrite_mode_v3

Conversation

@MaxKsyunz
Copy link
Copy Markdown

It supports overwriteMode option with two possible values:

  • truncate -- All rows in the table will be deleted. incoming dataframe must matching existing schema.
  • recreate -- table will be dropped and recreated based on the schema of the incoming dataframe.

MaxKsyunz and others added 12 commits March 9, 2026 10:32
…itioned DML.

Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>
Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>
Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>
spotless:apply was not fixing error that spotless:check was reporting.
Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>
…nner/SpannerWriteBuilder.java

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>
… "recreate"

Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>
…nner/SpannerWriteBuilder.java


Temporary code. will be merged with catalog branch.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…g class.

Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>
…nner/SpannerWriteBuilder.java

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>
Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>
@MaxKsyunz
Copy link
Copy Markdown
Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements the overwrite save mode for the Spanner Spark connector, supporting both truncate and recreate strategies, leveraging Spark's SupportsTruncate interface and including well-designed integration tests. However, a critical security issue was identified where the target table for truncation or recreation is determined by user-supplied write options rather than the SpannerTable instance properties. This could lead to operations being performed on unintended tables if conflicting options are provided, requiring careful validation to ensure destructive operations target the correct table. Additionally, there are suggestions to improve efficiency and exception handling in the implementation.

Comment on lines +76 to +78
String instanceId = SpannerUtils.getRequiredOption(opts, "instanceId");
String databaseId = SpannerUtils.getRequiredOption(opts, "databaseId");
String tableName = SpannerUtils.getRequiredOption(opts, "table");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

In recreateTable, the table name, instance ID, and database ID are retrieved from the user-supplied write options (opts) instead of the properties of the SpannerTable for which the WriteBuilder was created. If a user provides different values for these options in the write call, the recreate operation will be performed on the specified table instead of the intended one. This could lead to accidental or malicious data loss in a different table. It is recommended to use this.properties which contains the validated connection and table details from the SpannerTable instance.

Comment on lines +107 to +110
String projectId = SpannerUtils.getRequiredOption(opts, "projectId");
String instanceId = SpannerUtils.getRequiredOption(opts, "instanceId");
String databaseId = SpannerUtils.getRequiredOption(opts, "databaseId");
String tableName = SpannerUtils.getRequiredOption(opts, "table");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

In truncateTable, the project ID, instance ID, database ID, and table name are retrieved from the user-supplied write options (opts) instead of the properties of the SpannerTable. This allows a user to potentially truncate an arbitrary table by providing its details in the write options, bypassing the intended target of the WriteBuilder. Additionally, these values are used without the validation performed in SpannerUtils.connectionFromProperties. It is recommended to use this.properties to ensure the operation is performed on the correct and validated target.

MaxKsyunz and others added 3 commits March 9, 2026 15:15
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>
@MaxKsyunz
Copy link
Copy Markdown
Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for overwriteMode with truncate and recreate options, enhancing the flexibility of data writing operations. The changes include adding a truncateTableDml method in SpannerInformationSchema, updating SpannerTable to support TRUNCATE capability, and implementing the SupportsTruncate interface in SpannerWriteBuilder with logic for both overwrite modes. New integration tests have been added to validate the functionality of these overwrite modes.

…nner/SpannerWriteBuilder.java

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@MaxKsyunz MaxKsyunz merged commit 676dc77 into integ/overwrite_save_mode Mar 10, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant