Support arrays in SpannerCatalog.createTable. by MaxKsyunz · Pull Request #23 · Bit-Quill/spark-spanner-connector

MaxKsyunz · 2026-03-05T19:43:48Z

No description provided.

Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

MaxKsyunz · 2026-03-05T20:01:12Z

/gemini review

gemini-code-assist

Code Review

This pull request adds support for creating Spanner tables with array columns through SpannerCatalog.createTable. While the array type handling logic appears sound, a critical security vulnerability has been introduced in factorySpannerGraph by simplifying the merging of graph properties. The change from an allow-list approach to a blind putAll of user-supplied properties allows attackers to override catalog-level connection settings via malicious table names, potentially leading to unauthorized access to other Spanner databases. It is strongly recommended to revert to the allow-list approach for merging graph properties. Additionally, the sparkTypeToSpannerType method should be refactored to reduce code duplication in handling array types for different dialects, which will improve maintainability.

spark-3.1-spanner-lib/src/main/java/com/google/cloud/spark/spanner/SpannerCatalog.java

Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

…oogleCloudDataproc#170) * Support for Spark Catalogs and ErrorIfExists save mode. (#19) Main Changes: - Connector implements Catalog API. - ErrorIfExists and Ignore save modes are supported (via Catalog API). - Separate format (`cloud-spanner-graph`) and table provider for Spanner Graphs. Introducing a separate format for graphs was necessary because current implementation tightly couples Spark table definition and graph query. To support Catalog API for Spanner graphs and tables with one Spark table provider requires significantly refactoring graph support. Specifically, I ran into a case where loading a graph table and then querying resulted in type conversion errors because query code casts Id columns to `String` even though they are defined as `INT64`. --------- Signed-off-by: Max Ksyunz <max.ksyunz@improving.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update scripts/setup-test-db.sh Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Spotless cleanup Signed-off-by: Max Ksyunz <max.ksyunz@improving.com> * Addressing Gemini feedback Signed-off-by: Max Ksyunz <max.ksyunz@improving.com> * Refactor SpannerCatalog to support graph tables in `cloud-spanner` format(#22) - `SpannerCatalogTableProviderBase.extractIdentifier` encodes dataframe options supported by graphs into returned identifier. - These are: `graph`, `type`, `configs`, `graphQuery`, `timestamp`, `viewsEnabled. - `SpannerCatalog.loadTable` extracts these options to correctly instantiate `SpannerGraph` class. - Removed classes related to `cloud-spanner-graph`. Signed-off-by: Max Ksyunz <max.ksyunz@improving.com> * Support arrays in SpannerCatalog.createTable. (#23) Signed-off-by: Max Ksyunz <max.ksyunz@improving.com> * get emulatorHost from current properties. Signed-off-by: Max Ksyunz <max.ksyunz@improving.com> * quote identifiers used to construct create and drop DDL statements. Signed-off-by: Max Ksyunz <max.ksyunz@improving.com> * Quote table name in construct SQL statements Other refactorings and clean-up Signed-off-by: Max Ksyunz <max.ksyunz@improving.com> * Resolve merge conflicts with main. Signed-off-by: Max Ksyunz <max.ksyunz@improving.com> * Update examples Signed-off-by: Max Ksyunz <max.ksyunz@improving.com> * Validate projectId, databaseId, and instanceId as passed by the user Signed-off-by: Max Ksyunz <max.ksyunz@improving.com> * fix copyright year. Signed-off-by: Max Ksyunz <max.ksyunz@improving.com> * Separating dataframe-based and catalog-based integration tests - Make sure most integration tests continue to test with dataframe API without catalog - Add integration tests with Catalog-based write operations specifically - Store dataframe options for writer as case-insensitive map. Spark options are case-insensitive and sometimes they arrive normalized to all lower case. Signed-off-by: Max Ksyunz <max.ksyunz@improving.com> * Small improvements to ErrorIfExists and Ignore integration tests Signed-off-by: Max Ksyunz <max.ksyunz@improving.com> * Examples of writing using catalog. Signed-off-by: Max Ksyunz <max.ksyunz@improving.com> * Re-write testErrorIfExistsSaveMode and testIgnoreSaveMode test for clarity They both perform the same write operation twice with different results. Signed-off-by: Max Ksyunz <max.ksyunz@improving.com> --------- Signed-off-by: Max Ksyunz <max.ksyunz@improving.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Support arrays in SpannerCatalog.createTable.

63bf22d

Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

gemini-code-assist bot reviewed Mar 5, 2026

View reviewed changes

spark-3.1-spanner-lib/src/main/java/com/google/cloud/spark/spanner/SpannerCatalog.java Outdated Show resolved Hide resolved

spark-3.1-spanner-lib/src/main/java/com/google/cloud/spark/spanner/SpannerCatalog.java Show resolved Hide resolved

MaxKsyunz added 2 commits March 5, 2026 12:07

Redo code removed by mistake.

8a5fd76

Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

Cleanup formatting

78b80ef

Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

MaxKsyunz merged commit 22bd8c5 into integ/save_mode_simple Mar 5, 2026
1 check passed

MaxKsyunz deleted the dev/save_mode_simple_arrays branch March 5, 2026 20:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support arrays in SpannerCatalog.createTable.#23

Support arrays in SpannerCatalog.createTable.#23
MaxKsyunz merged 3 commits intointeg/save_mode_simplefrom
dev/save_mode_simple_arrays

MaxKsyunz commented Mar 5, 2026

Uh oh!

MaxKsyunz commented Mar 5, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MaxKsyunz commented Mar 5, 2026

Uh oh!

MaxKsyunz commented Mar 5, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant