Skip to content

[Data] - Remove the calls to Iceberg Catalog Table in write tasks#60476

Merged
alexeykudinkin merged 5 commits intoray-project:masterfrom
goutamvenkat-anyscale:goutam/reduce_iceberg_catalog_calls
Jan 26, 2026
Merged

[Data] - Remove the calls to Iceberg Catalog Table in write tasks#60476
alexeykudinkin merged 5 commits intoray-project:masterfrom
goutamvenkat-anyscale:goutam/reduce_iceberg_catalog_calls

Conversation

@goutamvenkat-anyscale
Copy link
Contributor

Description

Removing the calls to catalog.load_table to reduce the likelihood of running into rate limits from the remote Iceberg catalog. Have to serialize the underlying FileIO and TableMetadata objects to ensure the writers can write to the cloud storage in lieu of accessing it from the table instance.

Related issues

Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234".

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

Signed-off-by: Goutam <goutam@anyscale.com>
@goutamvenkat-anyscale goutamvenkat-anyscale requested a review from a team as a code owner January 24, 2026 17:04
@goutamvenkat-anyscale goutamvenkat-anyscale changed the title [Data] - Remove the calls to Iceberg Catalog in tasks [Data] - Remove the calls to Iceberg Catalog in write tasks Jan 24, 2026
@goutamvenkat-anyscale goutamvenkat-anyscale added data Ray Data-related issues go add ONLY when ready to merge, run all tests labels Jan 24, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively removes the need for worker tasks to call the Iceberg catalog by serializing the FileIO and TableMetadata objects on the driver and passing them to the workers. This is a good optimization to prevent potential rate-limiting issues with the remote catalog. The implementation is clean and directly addresses the problem. I have one minor suggestion to improve code readability.

@goutamvenkat-anyscale goutamvenkat-anyscale changed the title [Data] - Remove the calls to Iceberg Catalog in write tasks [Data] - Remove the calls to Iceberg Catalog Table in write tasks Jan 24, 2026
goutamvenkat-anyscale and others added 3 commits January 25, 2026 16:06
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Goutam <goutam@anyscale.com>
Signed-off-by: Goutam <goutam@anyscale.com>
@alexeykudinkin alexeykudinkin merged commit 7d33f91 into ray-project:master Jan 26, 2026
6 checks passed
jinbum-kim pushed a commit to jinbum-kim/ray that referenced this pull request Jan 29, 2026
400Ping pushed a commit to 400Ping/ray that referenced this pull request Feb 1, 2026
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Ray Data-related issues go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ray fails to serialize self-reference objects

2 participants