Adding PCNT notation support by davidrabinowitz · Pull Request #1470 · GoogleCloudDataproc/spark-bigquery-connector

davidrabinowitz · 2026-02-21T01:18:24Z

No description provided.

gemini-code-assist · 2026-02-21T01:18:41Z

Summary of Changes

Hello @davidrabinowitz, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces support for a more comprehensive table ID notation, allowing for project.catalog.namespace.table structures within the BigQuery connector. This enhancement improves the flexibility and robustness of table identification by updating parsing logic and streamlining how table paths are constructed for read sessions, ensuring compatibility with more complex naming conventions.

Highlights

Enhanced Table ID Parsing: The BigQueryUtil class has been updated to support a more flexible table ID notation, specifically allowing for catalog.namespace within the dataset component of a table name. This includes modifications to the DATASET_PATTERN regex and new logic in parseTableId to correctly interpret and separate project, catalog, and namespace elements.
Refactored Table Path Generation: The ReadSessionCreator class was refactored to directly utilize TableId.getIAMResourceName() for constructing table paths, leading to the removal of the now redundant toTablePath helper method. This streamlines how IAM resource names are generated for BigQuery tables.
Improved Test Coverage: A new unit test, testProjectCatalogNamespaceTableNotation_independent, was added to BigQueryUtilTest to validate the correct parsing of table IDs that include distinct project, catalog.namespace, and table components, ensuring the new parsing logic functions as expected.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

bigquery-connector-common/src/main/java/com/google/cloud/bigquery/connector/common/BigQueryUtil.java
- Modified the DATASET_PATTERN regex to allow for catalog.namespace in dataset names.
- Added new conditional logic within the parseTableId method to correctly extract project, catalog, and namespace from a combined string.
bigquery-connector-common/src/main/java/com/google/cloud/bigquery/connector/common/ReadSessionCreator.java
- Removed the static helper method toTablePath.
- Updated the create method to use table.getTableId().getIAMResourceName() directly for table path generation.
bigquery-connector-common/src/test/java/com/google/cloud/bigquery/connector/common/BigQueryUtilTest.java
- Added a new test case testProjectCatalogNamespaceTableNotation_independent to verify the parsing of table IDs with catalog and namespace components.
bigquery-connector-common/src/test/java/com/google/cloud/bigquery/connector/common/ReadSessionCreatorTest.java
- Updated the addCacheEntry method in tests to use table.getTableId().getIAMResourceName().

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

davidrabinowitz · 2026-02-21T01:19:37Z

/gcbrun

gemini-code-assist

Code Review

This pull request adds support for PCNT (Project.Catalog.Namespace.Table) notation for identifying BigQuery tables. The changes include updating the dataset pattern regex and adding logic in BigQueryUtil.parseTableId to correctly parse these complex identifiers. Additionally, the code is refactored to use TableId.getIAMResourceName() for generating table paths, which is a good improvement. The changes are logical, but I have a couple of suggestions: one to fix a minor typo in a comment, and another to enhance test coverage for the new parsing logic.

...-connector-common/src/main/java/com/google/cloud/bigquery/connector/common/BigQueryUtil.java

gemini-code-assist · 2026-02-21T01:20:09Z

...nector-common/src/test/java/com/google/cloud/bigquery/connector/common/BigQueryUtilTest.java

+  public void testProjectCatalogNamespaceTableNotation_independent() {
+    TableId tableId =
+        BigQueryUtil.parseTableId(
+            "table", Optional.of("catalog.namespace"), Optional.of("project"));
+    assertThat(tableId).isEqualTo(TableId.of("project", "catalog.namespace", "table"));
+  }


The added test is good for verifying catalog.namespace as a dataset when components are passed separately. However, it doesn't cover the new parsing logic in parseTableId for a single fully qualified table string. To ensure the new logic is robust and prevent regressions, it would be beneficial to add test cases that specifically exercise parsing PCNT notation from a single string.

For example:

@Test public void testProjectCatalogNamespaceTableNotation_singleString() { TableId tableId = BigQueryUtil.parseTableId("project.catalog.namespace.table"); assertThat(tableId).isEqualTo(TableId.of("project", "catalog.namespace", "table")); } @Test public void testProjectCatalogNamespaceTableNotation_singleStringMultipleLevels() { TableId tableId = BigQueryUtil.parseTableId("project.catalog1.catalog2.namespace.table"); assertThat(tableId).isEqualTo(TableId.of("project", "catalog1.catalog2.namespace", "table")); }

davidrabinowitz · 2026-02-21T01:20:23Z

/gemini review

gemini-code-assist

Code Review

The pull request introduces support for the PCNT (Project, Catalog, Namespace, Table) notation, which is a valuable addition for complex BigQuery environments. The implementation correctly handles the parsing of multi-part table identifiers and cleans up the resource name generation in ReadSessionCreator. I have identified a minor typo and a potential issue with domain-scoped projects that should be addressed to ensure full compatibility with all project ID formats.

gemini-code-assist · 2026-02-21T01:22:18Z

...-connector-common/src/main/java/com/google/cloud/bigquery/connector/common/BigQueryUtil.java


  private static final String PROJECT_PATTERN = "\\S+";
-  private static final String DATASET_PATTERN = "\\w+";
+  // The TableId dataset may be `catalog.namesapce`


Minor typo in the comment: namesapce should be namespace.

Suggested change

// The TableId dataset may be `catalog.namesapce`

// The TableId dataset may be `catalog.namespace`

gemini-code-assist · 2026-02-21T01:22:19Z

...-connector-common/src/main/java/com/google/cloud/bigquery/connector/common/BigQueryUtil.java

+      if (projectStr.contains(".") && !projectStr.contains(":")) {
+        int dotIndex = projectStr.indexOf(".");
+        parsedProject = Optional.of(projectStr.substring(0, dotIndex));
+        parsedDataset = Optional.of(projectStr.substring(dotIndex + 1) + "." + parsedDataset.get());
+      }


The current logic for splitting the project string into project and catalog does not support domain-scoped projects (e.g., example.com:project.catalog). In such cases, the project string contains both a colon and a dot, and the first dot is part of the domain name rather than a separator for the catalog. To support this, you should look for the first dot that appears after the colon (if a colon exists).

Suggested change

if (projectStr.contains(".") && !projectStr.contains(":")) {

int dotIndex = projectStr.indexOf(".");

parsedProject = Optional.of(projectStr.substring(0, dotIndex));

parsedDataset = Optional.of(projectStr.substring(dotIndex + 1) + "." + parsedDataset.get());

}

int dotIndex = projectStr.indexOf(".", projectStr.lastIndexOf(":") + 1);

if (dotIndex != -1) {

parsedProject = Optional.of(projectStr.substring(0, dotIndex));

parsedDataset = Optional.of(projectStr.substring(dotIndex + 1) + "." + parsedDataset.get());

}

gemini-code-assist · 2026-02-21T01:22:19Z

...nector-common/src/test/java/com/google/cloud/bigquery/connector/common/BigQueryUtilTest.java

+  public void testProjectCatalogNamespaceTableNotation_independent() {
+    TableId tableId =
+        BigQueryUtil.parseTableId(
+            "table", Optional.of("catalog.namespace"), Optional.of("project"));
+    assertThat(tableId).isEqualTo(TableId.of("project", "catalog.namespace", "table"));
+  }


The added test case testProjectCatalogNamespaceTableNotation_independent uses a simple table name and relies on the Optional parameters. It would be beneficial to add a test case that specifically exercises the new parsing logic for a 4-part qualified table string (e.g., project.catalog.namespace.table) to ensure the regex and splitting logic correctly handle the combined project/catalog string.

@gemini-code-assist check testProjectCatalogNamespaceTableNotation_four_part()

You are absolutely right, testProjectCatalogNamespaceTableNotation_four_part() perfectly covers the scenario I suggested. Thank you for adding it!

davidrabinowitz · 2026-02-21T01:44:31Z

/gcbrun

davidrabinowitz · 2026-02-21T01:44:33Z

/gemini review

sonarqubecloud · 2026-02-21T01:45:24Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

gemini-code-assist

Code Review

The pull request successfully adds support for the PCNT (Project.Catalog.Namespace.Table) notation. This is achieved by updating the DATASET_PATTERN regex to allow for catalog.namespace in the dataset part and introducing a new parsing logic in BigQueryUtil.parseTableId to correctly re-interpret table IDs where the project component might implicitly contain a catalog part (e.g., project.catalog.dataset.table). The changes also include a refactoring in ReadSessionCreator.java to use the standard TableId.getIAMResourceName() method, which simplifies the code. Comprehensive unit tests have been added to validate the new parsing logic, including edge cases like illegal nested catalogs. Overall, the changes are well-implemented and improve the flexibility of table ID parsing.

gemini-code-assist · 2026-02-21T01:47:06Z

...-connector-common/src/main/java/com/google/cloud/bigquery/connector/common/BigQueryUtil.java

+    if (parsedProject.isPresent() && parsedDataset.isPresent()) {
+      String projectStr = parsedProject.get();
+      if (projectStr.contains(".") && !projectStr.contains(":")) {
+        int dotIndex = projectStr.indexOf(".");
+        parsedProject = Optional.of(projectStr.substring(0, dotIndex));
+        String newDataset = projectStr.substring(dotIndex + 1) + "." + parsedDataset.get();
+        // The dataset part should not have more than one dot (i.e. catalog.namespace)
+        if (newDataset.indexOf(".") != newDataset.lastIndexOf(".")) {
+          throw new IllegalArgumentException(
+              format("Invalid Table ID '%s'. Must match '%s'", rawTable, QUALIFIED_TABLE_REGEX));
+        }
+        parsedDataset = Optional.of(newDataset);
+      }
+    }


This block of code performs a crucial re-interpretation of the parsed project and dataset, especially when the initial regex might have captured a project.catalog segment as the project. While the logic appears correct, adding a Javadoc comment to this if block would greatly enhance readability and maintainability. It should explain the specific scenario this re-interpretation handles (e.g., project.catalog.dataset.table where project.catalog is initially grouped as the project) and why it's necessary to adjust parsedProject and parsedDataset.

davidrabinowitz · 2026-02-21T03:53:11Z

/gcbrun

yalimu-g · 2026-02-23T17:19:12Z

...-connector-common/src/main/java/com/google/cloud/bigquery/connector/common/BigQueryUtil.java

+
+    if (parsedProject.isPresent() && parsedDataset.isPresent()) {
+      String projectStr = parsedProject.get();
+      if (projectStr.contains(".") && !projectStr.contains(":")) {


Should we create constant for this given it used multiple times here?

Adding PCNT notation support

7e31534

davidrabinowitz self-assigned this Feb 21, 2026

davidrabinowitz requested review from kmjung and yalimu-g February 21, 2026 01:18

spotless applied

bbd0228

gemini-code-assist bot reviewed Feb 21, 2026

View reviewed changes

fix 5+ part name

ca035da

gemini-code-assist bot reviewed Feb 21, 2026

View reviewed changes

yalimu-g approved these changes Feb 23, 2026

View reviewed changes

	// The TableId dataset may be `catalog.namesapce`
	// The TableId dataset may be `catalog.namespace`

Conversation

davidrabinowitz commented Feb 21, 2026

Uh oh!

gemini-code-assist bot commented Feb 21, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

davidrabinowitz commented Feb 21, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

davidrabinowitz commented Feb 21, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

davidrabinowitz Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

davidrabinowitz commented Feb 21, 2026

Uh oh!

davidrabinowitz commented Feb 21, 2026

Uh oh!

sonarqubecloud bot commented Feb 21, 2026

Quality Gate passed

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

davidrabinowitz commented Feb 21, 2026

Uh oh!

yalimu-g Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants