Skip to content

Adding PCNT notation support#1470

Open
davidrabinowitz wants to merge 3 commits intoGoogleCloudDataproc:masterfrom
davidrabinowitz:pcnt
Open

Adding PCNT notation support#1470
davidrabinowitz wants to merge 3 commits intoGoogleCloudDataproc:masterfrom
davidrabinowitz:pcnt

Conversation

@davidrabinowitz
Copy link
Member

No description provided.

@davidrabinowitz davidrabinowitz self-assigned this Feb 21, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @davidrabinowitz, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces support for a more comprehensive table ID notation, allowing for project.catalog.namespace.table structures within the BigQuery connector. This enhancement improves the flexibility and robustness of table identification by updating parsing logic and streamlining how table paths are constructed for read sessions, ensuring compatibility with more complex naming conventions.

Highlights

  • Enhanced Table ID Parsing: The BigQueryUtil class has been updated to support a more flexible table ID notation, specifically allowing for catalog.namespace within the dataset component of a table name. This includes modifications to the DATASET_PATTERN regex and new logic in parseTableId to correctly interpret and separate project, catalog, and namespace elements.
  • Refactored Table Path Generation: The ReadSessionCreator class was refactored to directly utilize TableId.getIAMResourceName() for constructing table paths, leading to the removal of the now redundant toTablePath helper method. This streamlines how IAM resource names are generated for BigQuery tables.
  • Improved Test Coverage: A new unit test, testProjectCatalogNamespaceTableNotation_independent, was added to BigQueryUtilTest to validate the correct parsing of table IDs that include distinct project, catalog.namespace, and table components, ensuring the new parsing logic functions as expected.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • bigquery-connector-common/src/main/java/com/google/cloud/bigquery/connector/common/BigQueryUtil.java
    • Modified the DATASET_PATTERN regex to allow for catalog.namespace in dataset names.
    • Added new conditional logic within the parseTableId method to correctly extract project, catalog, and namespace from a combined string.
  • bigquery-connector-common/src/main/java/com/google/cloud/bigquery/connector/common/ReadSessionCreator.java
    • Removed the static helper method toTablePath.
    • Updated the create method to use table.getTableId().getIAMResourceName() directly for table path generation.
  • bigquery-connector-common/src/test/java/com/google/cloud/bigquery/connector/common/BigQueryUtilTest.java
    • Added a new test case testProjectCatalogNamespaceTableNotation_independent to verify the parsing of table IDs with catalog and namespace components.
  • bigquery-connector-common/src/test/java/com/google/cloud/bigquery/connector/common/ReadSessionCreatorTest.java
    • Updated the addCacheEntry method in tests to use table.getTableId().getIAMResourceName().
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@davidrabinowitz
Copy link
Member Author

/gcbrun

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for PCNT (Project.Catalog.Namespace.Table) notation for identifying BigQuery tables. The changes include updating the dataset pattern regex and adding logic in BigQueryUtil.parseTableId to correctly parse these complex identifiers. Additionally, the code is refactored to use TableId.getIAMResourceName() for generating table paths, which is a good improvement. The changes are logical, but I have a couple of suggestions: one to fix a minor typo in a comment, and another to enhance test coverage for the new parsing logic.

Comment on lines +186 to +191
public void testProjectCatalogNamespaceTableNotation_independent() {
TableId tableId =
BigQueryUtil.parseTableId(
"table", Optional.of("catalog.namespace"), Optional.of("project"));
assertThat(tableId).isEqualTo(TableId.of("project", "catalog.namespace", "table"));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The added test is good for verifying catalog.namespace as a dataset when components are passed separately. However, it doesn't cover the new parsing logic in parseTableId for a single fully qualified table string. To ensure the new logic is robust and prevent regressions, it would be beneficial to add test cases that specifically exercise parsing PCNT notation from a single string.

For example:

  @Test
  public void testProjectCatalogNamespaceTableNotation_singleString() {
    TableId tableId = BigQueryUtil.parseTableId("project.catalog.namespace.table");
    assertThat(tableId).isEqualTo(TableId.of("project", "catalog.namespace", "table"));
  }

  @Test
  public void testProjectCatalogNamespaceTableNotation_singleStringMultipleLevels() {
    TableId tableId = BigQueryUtil.parseTableId("project.catalog1.catalog2.namespace.table");
    assertThat(tableId).isEqualTo(TableId.of("project", "catalog1.catalog2.namespace", "table"));
  }

@davidrabinowitz
Copy link
Member Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces support for the PCNT (Project, Catalog, Namespace, Table) notation, which is a valuable addition for complex BigQuery environments. The implementation correctly handles the parsing of multi-part table identifiers and cleans up the resource name generation in ReadSessionCreator. I have identified a minor typo and a potential issue with domain-scoped projects that should be addressed to ensure full compatibility with all project ID formats.


private static final String PROJECT_PATTERN = "\\S+";
private static final String DATASET_PATTERN = "\\w+";
// The TableId dataset may be `catalog.namesapce`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Minor typo in the comment: namesapce should be namespace.

Suggested change
// The TableId dataset may be `catalog.namesapce`
// The TableId dataset may be `catalog.namespace`

Comment on lines +232 to +236
if (projectStr.contains(".") && !projectStr.contains(":")) {
int dotIndex = projectStr.indexOf(".");
parsedProject = Optional.of(projectStr.substring(0, dotIndex));
parsedDataset = Optional.of(projectStr.substring(dotIndex + 1) + "." + parsedDataset.get());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current logic for splitting the project string into project and catalog does not support domain-scoped projects (e.g., example.com:project.catalog). In such cases, the project string contains both a colon and a dot, and the first dot is part of the domain name rather than a separator for the catalog. To support this, you should look for the first dot that appears after the colon (if a colon exists).

Suggested change
if (projectStr.contains(".") && !projectStr.contains(":")) {
int dotIndex = projectStr.indexOf(".");
parsedProject = Optional.of(projectStr.substring(0, dotIndex));
parsedDataset = Optional.of(projectStr.substring(dotIndex + 1) + "." + parsedDataset.get());
}
int dotIndex = projectStr.indexOf(".", projectStr.lastIndexOf(":") + 1);
if (dotIndex != -1) {
parsedProject = Optional.of(projectStr.substring(0, dotIndex));
parsedDataset = Optional.of(projectStr.substring(dotIndex + 1) + "." + parsedDataset.get());
}

Comment on lines +186 to +191
public void testProjectCatalogNamespaceTableNotation_independent() {
TableId tableId =
BigQueryUtil.parseTableId(
"table", Optional.of("catalog.namespace"), Optional.of("project"));
assertThat(tableId).isEqualTo(TableId.of("project", "catalog.namespace", "table"));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The added test case testProjectCatalogNamespaceTableNotation_independent uses a simple table name and relies on the Optional parameters. It would be beneficial to add a test case that specifically exercises the new parsing logic for a 4-part qualified table string (e.g., project.catalog.namespace.table) to ensure the regex and splitting logic correctly handle the combined project/catalog string.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gemini-code-assist check testProjectCatalogNamespaceTableNotation_four_part()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are absolutely right, testProjectCatalogNamespaceTableNotation_four_part() perfectly covers the scenario I suggested. Thank you for adding it!

@davidrabinowitz
Copy link
Member Author

/gcbrun

@davidrabinowitz
Copy link
Member Author

/gemini review

@sonarqubecloud
Copy link

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request successfully adds support for the PCNT (Project.Catalog.Namespace.Table) notation. This is achieved by updating the DATASET_PATTERN regex to allow for catalog.namespace in the dataset part and introducing a new parsing logic in BigQueryUtil.parseTableId to correctly re-interpret table IDs where the project component might implicitly contain a catalog part (e.g., project.catalog.dataset.table). The changes also include a refactoring in ReadSessionCreator.java to use the standard TableId.getIAMResourceName() method, which simplifies the code. Comprehensive unit tests have been added to validate the new parsing logic, including edge cases like illegal nested catalogs. Overall, the changes are well-implemented and improve the flexibility of table ID parsing.

Comment on lines +230 to +243
if (parsedProject.isPresent() && parsedDataset.isPresent()) {
String projectStr = parsedProject.get();
if (projectStr.contains(".") && !projectStr.contains(":")) {
int dotIndex = projectStr.indexOf(".");
parsedProject = Optional.of(projectStr.substring(0, dotIndex));
String newDataset = projectStr.substring(dotIndex + 1) + "." + parsedDataset.get();
// The dataset part should not have more than one dot (i.e. catalog.namespace)
if (newDataset.indexOf(".") != newDataset.lastIndexOf(".")) {
throw new IllegalArgumentException(
format("Invalid Table ID '%s'. Must match '%s'", rawTable, QUALIFIED_TABLE_REGEX));
}
parsedDataset = Optional.of(newDataset);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This block of code performs a crucial re-interpretation of the parsed project and dataset, especially when the initial regex might have captured a project.catalog segment as the project. While the logic appears correct, adding a Javadoc comment to this if block would greatly enhance readability and maintainability. It should explain the specific scenario this re-interpretation handles (e.g., project.catalog.dataset.table where project.catalog is initially grouped as the project) and why it's necessary to adjust parsedProject and parsedDataset.

@davidrabinowitz
Copy link
Member Author

/gcbrun


if (parsedProject.isPresent() && parsedDataset.isPresent()) {
String projectStr = parsedProject.get();
if (projectStr.contains(".") && !projectStr.contains(":")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we create constant for this given it used multiple times here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants