Support for Spark Catalogs and ErrorIfExists save mode. (#19) by MaxKsyunz · Pull Request #20 · Bit-Quill/spark-spanner-connector

MaxKsyunz · 2026-03-02T21:25:03Z

Main Changes:

Connector implements Catalog API.
ErrorIfExists and Ignore save modes are supported (via Catalog API).
Separate format (cloud-spanner-graph) and table provider for Spanner Graphs.

Introducing a separate format for graphs was necessary because current implementation tightly couples Spark table definition and graph query. To support Catalog API for Spanner graphs and tables with one Spark table provider requires significantly refactoring graph support.

Specifically, I ran into a case where loading a graph table and then querying resulted in type conversion errors because query code casts Id columns to String even though they are defined as INT64.

Main Changes: - Connector implements Catalog API. - ErrorIfExists and Ignore save modes are supported (via Catalog API). - Separate format (`cloud-spanner-graph`) and table provider for Spanner Graphs. Introducing a separate format for graphs was necessary because current implementation tightly couples Spark table definition and graph query. To support Catalog API for Spanner graphs and tables with one Spark table provider requires significantly refactoring graph support. Specifically, I ran into a case where loading a graph table and then querying resulted in type conversion errors because query code casts Id columns to `String` even though they are defined as `INT64`. --------- Signed-off-by: Max Ksyunz <max.ksyunz@improving.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

…rmat(#22) - `SpannerCatalogTableProviderBase.extractIdentifier` encodes dataframe options supported by graphs into returned identifier. - These are: `graph`, `type`, `configs`, `graphQuery`, `timestamp`, `viewsEnabled. - `SpannerCatalog.loadTable` extracts these options to correctly instantiate `SpannerGraph` class. - Removed classes related to `cloud-spanner-graph`. Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

Other refactorings and clean-up Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

# Conflicts: # spark-3.1-spanner-lib/src/test/java/com/google/cloud/spark/spanner/integration/WriteIntegrationTest.java

Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

- Make sure most integration tests continue to test with dataframe API without catalog - Add integration tests with Catalog-based write operations specifically - Store dataframe options for writer as case-insensitive map. Spark options are case-insensitive and sometimes they arrive normalized to all lower case. Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

…arity They both perform the same write operation twice with different results. Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

MaxKsyunz requested a review from stevelordbq March 2, 2026 21:25

MaxKsyunz and others added 2 commits March 2, 2026 13:41

Update scripts/setup-test-db.sh

d4a3399

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

MaxKsyunz force-pushed the integ/save_mode_simple branch from 8ab0e02 to d4a3399 Compare March 2, 2026 21:41

MaxKsyunz added 16 commits March 2, 2026 15:28

Spotless cleanup

028be31

Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

Addressing Gemini feedback

5b45cdd

Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

Support arrays in SpannerCatalog.createTable. (#23)

22bd8c5

Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

get emulatorHost from current properties.

35936b5

Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

quote identifiers used to construct create and drop DDL statements.

6145870

Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

Quote table name in construct SQL statements

03c7df5

Other refactorings and clean-up Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

Merge branch 'main' into integ/save_mode_simple

bdc45b8

# Conflicts: # spark-3.1-spanner-lib/src/test/java/com/google/cloud/spark/spanner/integration/WriteIntegrationTest.java

Resolve merge conflicts with main.

e580c76

Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

Update examples

b5cf38a

Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

Validate projectId, databaseId, and instanceId as passed by the user

c866dfc

Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

fix copyright year.

1c969ee

Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

Small improvements to ErrorIfExists and Ignore integration tests

8f6ef24

Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

Examples of writing using catalog.

08ffec6

Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

Re-write testErrorIfExistsSaveMode and testIgnoreSaveMode test for cl…

020f55e

…arity They both perform the same write operation twice with different results. Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Spark Catalogs and ErrorIfExists save mode. (#19)#20

Support for Spark Catalogs and ErrorIfExists save mode. (#19)#20
MaxKsyunz wants to merge 18 commits intomainfrom
integ/save_mode_simple

MaxKsyunz commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MaxKsyunz commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant