Support for Spark Catalogs and ErrorIfExists save mode. by MaxKsyunz · Pull Request #19 · Bit-Quill/spark-spanner-connector

MaxKsyunz · 2026-02-28T02:52:54Z

Changes:

Connector implements Catalog API.
ErrorIfExists and Ignore save modes are supported (via Catalog API).
Separate format -- cloud-spanner-graph and table provider for Spanner Graphs.

Introducing a separate format for graphs was necessary because current implementation tightly couples Spark table definition and graph query. To support Catalog API for Spanner graphs with one Spark table provider requires significantly refactoring graph support.

Specifically, I ran into a case where loading a graph table and then querying resulted in type conversion errors because query code treats Id columns as String even though they are defined as INT64.

Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

- Add a script to set up a database for integration tests. - Add SPANNER_USE_EXISTING_DATABASE env variable to tell test framework to not create a new database. - Update CONTRIBUTING.md with details on how to run tests. Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

This was a case sensitivity issue (option was converted to lower case, then back into case-sensitive map) so replaced Map<String, String> with CaseInsensitveStringMap Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

Simplifies option management. Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

…ces right now. Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

Note that SpannerCatalog.createTable is looking for a column with "spanner.primaryKey" metadata set to true and uses this column (or columns) to define primary key on the table it creates. Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

- check if database is Postgres and lower case the table name since that's what Postgres server does. - Include primary key information in ScannerTable.schema return so that it is identical to the schema struct passed on creation. Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

… PostgreSql databases. Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

MaxKsyunz · 2026-02-28T02:53:58Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces significant enhancements by adding support for Spark's Catalog API, enabling DDL operations and new save modes like ErrorIfExists and Ignore. The refactoring to separate graph support into its own data source format (cloud-spanner-graph) is a solid architectural improvement. The code changes, including the new SpannerCatalog and updates to related classes, are well-structured and accompanied by a comprehensive set of new tests. I have a few suggestions to improve the robustness of the new test setup script and a minor style improvement in the catalog implementation.

scripts/setup-test-db.sh

spark-3.1-spanner-lib/src/main/java/com/google/cloud/spark/spanner/SpannerCatalog.java

...com/google/cloud/spark/spanner/integration/SparkSpannerTableProviderIntegrationTestBase.java

…nner/SpannerCatalog.java Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

…nner/integration/SparkSpannerTableProviderIntegrationTestBase.java Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

Main Changes: - Connector implements Catalog API. - ErrorIfExists and Ignore save modes are supported (via Catalog API). - Separate format (`cloud-spanner-graph`) and table provider for Spanner Graphs. Introducing a separate format for graphs was necessary because current implementation tightly couples Spark table definition and graph query. To support Catalog API for Spanner graphs and tables with one Spark table provider requires significantly refactoring graph support. Specifically, I ran into a case where loading a graph table and then querying resulted in type conversion errors because query code casts Id columns to `String` even though they are defined as `INT64`. --------- Signed-off-by: Max Ksyunz <max.ksyunz@improving.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

…oogleCloudDataproc#170) * Support for Spark Catalogs and ErrorIfExists save mode. (#19) Main Changes: - Connector implements Catalog API. - ErrorIfExists and Ignore save modes are supported (via Catalog API). - Separate format (`cloud-spanner-graph`) and table provider for Spanner Graphs. Introducing a separate format for graphs was necessary because current implementation tightly couples Spark table definition and graph query. To support Catalog API for Spanner graphs and tables with one Spark table provider requires significantly refactoring graph support. Specifically, I ran into a case where loading a graph table and then querying resulted in type conversion errors because query code casts Id columns to `String` even though they are defined as `INT64`. --------- Signed-off-by: Max Ksyunz <max.ksyunz@improving.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update scripts/setup-test-db.sh Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Spotless cleanup Signed-off-by: Max Ksyunz <max.ksyunz@improving.com> * Addressing Gemini feedback Signed-off-by: Max Ksyunz <max.ksyunz@improving.com> * Refactor SpannerCatalog to support graph tables in `cloud-spanner` format(#22) - `SpannerCatalogTableProviderBase.extractIdentifier` encodes dataframe options supported by graphs into returned identifier. - These are: `graph`, `type`, `configs`, `graphQuery`, `timestamp`, `viewsEnabled. - `SpannerCatalog.loadTable` extracts these options to correctly instantiate `SpannerGraph` class. - Removed classes related to `cloud-spanner-graph`. Signed-off-by: Max Ksyunz <max.ksyunz@improving.com> * Support arrays in SpannerCatalog.createTable. (#23) Signed-off-by: Max Ksyunz <max.ksyunz@improving.com> * get emulatorHost from current properties. Signed-off-by: Max Ksyunz <max.ksyunz@improving.com> * quote identifiers used to construct create and drop DDL statements. Signed-off-by: Max Ksyunz <max.ksyunz@improving.com> * Quote table name in construct SQL statements Other refactorings and clean-up Signed-off-by: Max Ksyunz <max.ksyunz@improving.com> * Resolve merge conflicts with main. Signed-off-by: Max Ksyunz <max.ksyunz@improving.com> * Update examples Signed-off-by: Max Ksyunz <max.ksyunz@improving.com> * Validate projectId, databaseId, and instanceId as passed by the user Signed-off-by: Max Ksyunz <max.ksyunz@improving.com> * fix copyright year. Signed-off-by: Max Ksyunz <max.ksyunz@improving.com> * Separating dataframe-based and catalog-based integration tests - Make sure most integration tests continue to test with dataframe API without catalog - Add integration tests with Catalog-based write operations specifically - Store dataframe options for writer as case-insensitive map. Spark options are case-insensitive and sometimes they arrive normalized to all lower case. Signed-off-by: Max Ksyunz <max.ksyunz@improving.com> * Small improvements to ErrorIfExists and Ignore integration tests Signed-off-by: Max Ksyunz <max.ksyunz@improving.com> * Examples of writing using catalog. Signed-off-by: Max Ksyunz <max.ksyunz@improving.com> * Re-write testErrorIfExistsSaveMode and testIgnoreSaveMode test for clarity They both perform the same write operation twice with different results. Signed-off-by: Max Ksyunz <max.ksyunz@improving.com> --------- Signed-off-by: Max Ksyunz <max.ksyunz@improving.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

MaxKsyunz added 23 commits February 18, 2026 00:15

ErrorIfExists integration tests

c1a4c34

Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

Implement catalog support for spanner.

4f3c4fd

Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

Fix write integration tests.

9d46d3f

Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

Interface needed for catalog support.

7ca9b27

Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

Fix more write tests.

9d6a8d7

Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

Fix SchemaValidationIntegrationTestBase.

d0931c5

Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

Fix SpannerScanBuilderIntegrationTest

41f6e6f

This was a case sensitivity issue (option was converted to lower case, then back into case-sensitive map) so replaced Map<String, String> with CaseInsensitveStringMap Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

misc clean-up

e35250a

Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

Remove out dated test.

7f97f4b

Create a separate TableProvider for Spanner Graphs

da4db25

Simplifies option management. Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

Implement catalog tableCreate and tableDrop

55afd6c

Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

Remove placehodler for SupportsNamespaces as we don't support namespa…

9af0435

…ces right now. Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

test clean-up

8215fee

Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

Add support for ErrorIfExists save mode.

ea6e05b

Note that SpannerCatalog.createTable is looking for a column with "spanner.primaryKey" metadata set to true and uses this column (or columns) to define primary key on the table it creates. Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

Clean-up formatting with spotless.

4739b6d

Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

Ensure options map has projectId, instanceId, databaseId

ed5d540

Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

Make sure table name is lower-cased in SpannerCatalog.tableExists for…

96ab35e

… PostgreSql databases. Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

Fix upsert integration test to work with Spark Catalyst optimizer.

dc24e93

Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

Test to verify that IgnoreIfExists save mode works.

8fba187

Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

Setup SpannerGraphTableProvider for all Spark versions

4bc8715

Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

Resolve test failures and misc code improvements

bd8c1b3

Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

MaxKsyunz mentioned this pull request Feb 28, 2026

Support for Spark Catalogs and ErrorIfExists save mode. #16

Closed

gemini-code-assist bot reviewed Feb 28, 2026

View reviewed changes

MaxKsyunz and others added 3 commits February 27, 2026 18:59

Update spark-3.1-spanner-lib/src/main/java/com/google/cloud/spark/spa…

ca5688c

…nner/SpannerCatalog.java Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update scripts/setup-test-db.sh

5e479b3

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update spark-3.1-spanner-lib/src/test/java/com/google/cloud/spark/spa…

8f8baac

…nner/integration/SparkSpannerTableProviderIntegrationTestBase.java Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

MaxKsyunz self-assigned this Feb 28, 2026

MaxKsyunz requested a review from stevelordbq February 28, 2026 04:06

MaxKsyunz added 3 commits February 27, 2026 20:09

Add missing license headers to new files.

2acc549

Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

Update acceptance tests to set up catalog correctly.

aa8d1b5

Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

Update example to work with catalog.

76d3688

Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

MaxKsyunz merged commit 034e703 into integ/save_mode_simple Mar 2, 2026
1 check passed

MaxKsyunz deleted the dev/save_mode_simple_graph_separate branch March 2, 2026 21:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Spark Catalogs and ErrorIfExists save mode.#19

Support for Spark Catalogs and ErrorIfExists save mode.#19
MaxKsyunz merged 29 commits intointeg/save_mode_simplefrom
dev/save_mode_simple_graph_separate

MaxKsyunz commented Feb 28, 2026

Uh oh!

MaxKsyunz commented Feb 28, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MaxKsyunz commented Feb 28, 2026

Uh oh!

MaxKsyunz commented Feb 28, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant