Support for Spark Catalogs and ErrorIfExists save mode. (#19)#20
Open
Support for Spark Catalogs and ErrorIfExists save mode. (#19)#20
Conversation
Main Changes: - Connector implements Catalog API. - ErrorIfExists and Ignore save modes are supported (via Catalog API). - Separate format (`cloud-spanner-graph`) and table provider for Spanner Graphs. Introducing a separate format for graphs was necessary because current implementation tightly couples Spark table definition and graph query. To support Catalog API for Spanner graphs and tables with one Spark table provider requires significantly refactoring graph support. Specifically, I ran into a case where loading a graph table and then querying resulted in type conversion errors because query code casts Id columns to `String` even though they are defined as `INT64`. --------- Signed-off-by: Max Ksyunz <max.ksyunz@improving.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
8ab0e02 to
d4a3399
Compare
Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>
Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>
…rmat(#22) - `SpannerCatalogTableProviderBase.extractIdentifier` encodes dataframe options supported by graphs into returned identifier. - These are: `graph`, `type`, `configs`, `graphQuery`, `timestamp`, `viewsEnabled. - `SpannerCatalog.loadTable` extracts these options to correctly instantiate `SpannerGraph` class. - Removed classes related to `cloud-spanner-graph`. Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>
Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>
Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>
Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>
Other refactorings and clean-up Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>
# Conflicts: # spark-3.1-spanner-lib/src/test/java/com/google/cloud/spark/spanner/integration/WriteIntegrationTest.java
Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>
Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>
Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>
Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>
- Make sure most integration tests continue to test with dataframe API without catalog - Add integration tests with Catalog-based write operations specifically - Store dataframe options for writer as case-insensitive map. Spark options are case-insensitive and sometimes they arrive normalized to all lower case. Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>
Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>
Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>
…arity They both perform the same write operation twice with different results. Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Main Changes:
cloud-spanner-graph) and table provider for Spanner Graphs.Introducing a separate format for graphs was necessary because current implementation tightly couples Spark table definition and graph query. To support Catalog API for Spanner graphs and tables with one Spark table provider requires significantly refactoring graph support.
Specifically, I ran into a case where loading a graph table and then querying resulted in type conversion errors because query code casts Id columns to
Stringeven though they are defined asINT64.