Skip to content

[4/n] IPv6 support: Add IPv6 support for sockets#56147

Merged
jjyao merged 33 commits intoray-project:masterfrom
Yicheng-Lu-llll:support-ipv6-for-socket
Oct 29, 2025
Merged

[4/n] IPv6 support: Add IPv6 support for sockets#56147
jjyao merged 33 commits intoray-project:masterfrom
Yicheng-Lu-llll:support-ipv6-for-socket

Conversation

@Yicheng-Lu-llll
Copy link
Member

@Yicheng-Lu-llll Yicheng-Lu-llll commented Sep 2, 2025

Why are these changes needed?

This PR updates all socket-related code to ensure compatibility with both IPv4 and IPv6.

Related issue number

Checks

See all the tests I have done here: https://docs.google.com/document/d/129aZnlNu4U3KaJbbriLHvUizUaZDnx9WgdNh50tp-c8/edit?tab=t.0

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Note

Introduces IPv6-aware networking (address parsing, IP detection, socket binding, and port checks) across C++ and Python, updating services and helpers to prefer correct AF and exposing new utilities.

  • Networking Utilities (C++)
    • Add IsIPv6, GetNodeIpAddressFromPerspective, and CheckPortFree(family, port) in ray/util/network_util.{h,cc}; keep BuildAddress/ParseAddress and URL helpers; extend tests.
    • Update build to include gflags; migrate C++ callers to new IP detection and port-check APIs (raylet WorkerPool, tests, redis async context, runtime init paths).
  • Python Networking Helpers
    • Expose C++ functions via Cython (node_ip_address_from_perspective, is_ipv6), add get_localhost_ip, and adjust is_localhost.
    • Replace legacy IP detection and use IPv6-aware logic in services (get_node_ip_address, dashboard bind checks, TLS cert SANs).
  • Socket Binding & Port Utilities
    • Bind sockets with AF_INET6 when host is IPv6 across Serve HTTP, client proxier, rpdb, node startup, Spark utils, AIR/train/collective utilities; make find_free_port accept family.
  • Raylet/Workers
    • Track node address family and use family-aware CheckPortFree; improve port selection robustness.
  • Docs/Examples
    • Simplify LM training address to tcp://{ip}:{port} replacing build_address.

Written by Cursor Bugbot for commit 3e235d8. This will update automatically on new commits. Configure here.

@Yicheng-Lu-llll Yicheng-Lu-llll force-pushed the support-ipv6-for-socket branch 2 times, most recently from a54b8e2 to 5cb3e54 Compare September 3, 2025 19:36
auto bootstrap_address = ConfigInternal::Instance().bootstrap_ip;
if (bootstrap_address.empty()) {
bootstrap_address = GetNodeIpAddress();
bootstrap_address = ray::GetNodeIpAddressFromPerspective();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed the function to GetNodeIpAddressFromPerspective to keep it consistent with the Python function node_ip_address_from_perspective. Additionally, I used CPython bindings to avoid maintaining duplicate implementations in both C++ and Python.

@Yicheng-Lu-llll Yicheng-Lu-llll force-pushed the support-ipv6-for-socket branch 2 times, most recently from fa61584 to 697488d Compare September 4, 2025 15:41
@Yicheng-Lu-llll Yicheng-Lu-llll marked this pull request as ready for review September 4, 2025 16:21
@ray-gardener ray-gardener bot added core Issues that should be addressed in Ray Core community-contribution Contributed by the community labels Sep 4, 2025
Copy link
Collaborator

@aslonnie aslonnie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(for core team to review)

@Yicheng-Lu-llll Yicheng-Lu-llll force-pushed the support-ipv6-for-socket branch 2 times, most recently from 21b3c6b to ccc093d Compare September 12, 2025 02:55
Copy link
Collaborator

@jjyao jjyao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late review.

return result;
}

std::string GetNodeIpAddressFromPerspective(const std::string &address) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer to use std::optional instead of empty string since it's more explicit.

std::string GetNodeIpAddressFromPerspective(const std::string &address) {
static const auto detect_ip = [](const std::string &target_address) -> std::string {
std::vector<std::pair<std::string, boost::asio::ip::udp>> test_addresses;
if (!target_address.empty()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this target address must be ip, if so we can check whether it's ipv4 or ipv6 and use the protocol correspondingly. We don't need to try both everytime?

"""IP address by which the local node can be reached *from* the `address`.

If no address is given, defaults to public DNS servers for detection. For
performance, the result is cached when using the default address (empty string).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the performance matter here? I'd imagine that for each Ray node, we only detect it once and use it for all the other places by passing node_ip_address around.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. The reason I did this was that I had a create_socket function that’s called in many places, and it called node_ip_address_from_perspective() to automatically decide which socket to create.

Now I understand your point — I’ll try using node_ip_address directly to decide which socket to create, so node_ip_address_from_perspective() won’t need to be cached.

allocated_ports = set()

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s = create_socket(socket.SOCK_STREAM)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, we should alway know the node-ip-address and whether it's ipv4 or ipv6 and we should select AF_INET based on that so I think the code should be like

s = socket.socket(socket.AF_INET6 if is_ip_ipv6(self.node_ip_address) else socket.AF_INET, socket.SOCK_STREAM)

We don't need another function for it.

Copy link
Member Author

@Yicheng-Lu-llll Yicheng-Lu-llll Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point!
My original idea was to add a safe helper create_socket for cases where node_ip_address wasn’t set yet or was hard to retrieve (but that assumption seems turn out to be false). It would:

    node_ip = get_node_ip_address() # which might call `node_ip_address_from_perspective`
    family = socket.AF_INET6 if is_ipv6_ip(node_ip) else socket.AF_INET

    return socket.socket(family, socket_type)

This lets call sites avoid worrying about IPv4 vs. IPv6.
Since create_socket is used in many places, I cached node_ip_address_from_perspective() to avoid repeated lookups.

I’ll try using node_ip_address directly in all places, which should let us remove the cache entirely for node_ip_address_from_perspective()

# ray start --node-ip-address. You should instead use
# get_cached_node_ip_address.
def get_node_ip_address(address="8.8.8.8:53"):
def get_node_ip_address(address=""):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def get_node_ip_address(address=""):
def get_node_ip_address(address=None):

start = time.time()
while time_elapsed <= timeout_ms:
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s = create_socket(socket.SOCK_STREAM)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should pick family based on whether address is ipv4 or ipv6

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it — my previous thought was to unify the behavior based on the node IP. I’ll make the change!

@Yicheng-Lu-llll
Copy link
Member Author

I’m going to reset the commit to b473c14, which is the last commit before I started reverting and debugging. I’ll update the branch once the flaky tests are fixed.

@Yicheng-Lu-llll Yicheng-Lu-llll force-pushed the support-ipv6-for-socket branch from a779dec to b473c14 Compare October 13, 2025 21:23
host = "127.0.0.1"
port = self._get_unused_port(
socket.AF_INET6 if is_ipv6(host) else socket.AF_INET
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: IPv6 Support Issue in Localhost Handling

The code hardcodes host = "127.0.0.1" (IPv4) and then immediately checks is_ipv6(host) to determine socket family. Since "127.0.0.1" is always IPv4, is_ipv6(host) will always return False, effectively disabling IPv6 support in this code path. In IPv6-only environments, this could prevent proper functionality. The code should use get_localhost_ip() instead to get the appropriate localhost address for the system.

Fix in Cursor Fix in Web

ip = ray.get(workers[0].get_node_ip.remote())
port = ray.get(workers[0].find_free_port.remote())
address = f"tcp://{build_address(ip, port)}"
address = f"tcp://{ip}:{port}"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: IPv6 Address Formatting Error

The code was changed from address = f"tcp://{build_address(ip, port)}" to address = f"tcp://{ip}:{port}". This breaks IPv6 address formatting because IPv6 addresses must be wrapped in brackets when combined with a port (e.g., "[::1]:8080" not "::1:8080"). The build_address() function handles this correctly, but the manual string formatting does not, resulting in malformed URLs for IPv6 addresses.

Fix in Cursor Fix in Web

@jjyao
Copy link
Collaborator

jjyao commented Oct 29, 2025

Failed windows and mac tests are unrelated.

@jjyao jjyao merged commit a6bc1fe into ray-project:master Oct 29, 2025
5 of 6 checks passed
edoakes pushed a commit that referenced this pull request Oct 31, 2025
This PR is a follow-up to
#56147 (comment)

The `find_free_port()` function was duplicated across multiple locations
in the codebase. This PR consolidates all implementations into a single
canonical location.

---------

Signed-off-by: Yicheng-Lu-llll <luyc58576@gmail.com>
YoussefEssDS pushed a commit to YoussefEssDS/ray that referenced this pull request Nov 8, 2025
Signed-off-by: Yicheng-Lu-llll <luyc58576@gmail.com>
Signed-off-by: Yicheng-Lu-llll <51814063+Yicheng-Lu-llll@users.noreply.github.com>
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com>
YoussefEssDS pushed a commit to YoussefEssDS/ray that referenced this pull request Nov 8, 2025
This PR is a follow-up to
ray-project#56147 (comment)

The `find_free_port()` function was duplicated across multiple locations
in the codebase. This PR consolidates all implementations into a single
canonical location.

---------

Signed-off-by: Yicheng-Lu-llll <luyc58576@gmail.com>
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
Signed-off-by: Yicheng-Lu-llll <luyc58576@gmail.com>
Signed-off-by: Yicheng-Lu-llll <51814063+Yicheng-Lu-llll@users.noreply.github.com>
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com>
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
This PR is a follow-up to
ray-project#56147 (comment)

The `find_free_port()` function was duplicated across multiple locations
in the codebase. This PR consolidates all implementations into a single
canonical location.

---------

Signed-off-by: Yicheng-Lu-llll <luyc58576@gmail.com>
Aydin-ab pushed a commit to Aydin-ab/ray-aydin that referenced this pull request Nov 19, 2025
Signed-off-by: Yicheng-Lu-llll <luyc58576@gmail.com>
Signed-off-by: Yicheng-Lu-llll <51814063+Yicheng-Lu-llll@users.noreply.github.com>
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com>
Signed-off-by: Aydin Abiar <aydin@anyscale.com>
Aydin-ab pushed a commit to Aydin-ab/ray-aydin that referenced this pull request Nov 19, 2025
This PR is a follow-up to
ray-project#56147 (comment)

The `find_free_port()` function was duplicated across multiple locations
in the codebase. This PR consolidates all implementations into a single
canonical location.

---------

Signed-off-by: Yicheng-Lu-llll <luyc58576@gmail.com>
Signed-off-by: Aydin Abiar <aydin@anyscale.com>
SheldonTsen pushed a commit to SheldonTsen/ray that referenced this pull request Dec 1, 2025
This PR is a follow-up to
ray-project#56147 (comment)

The `find_free_port()` function was duplicated across multiple locations
in the codebase. This PR consolidates all implementations into a single
canonical location.

---------

Signed-off-by: Yicheng-Lu-llll <luyc58576@gmail.com>
Future-Outlier pushed a commit to Future-Outlier/ray that referenced this pull request Dec 7, 2025
Signed-off-by: Yicheng-Lu-llll <luyc58576@gmail.com>
Signed-off-by: Yicheng-Lu-llll <51814063+Yicheng-Lu-llll@users.noreply.github.com>
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com>
Signed-off-by: Future-Outlier <eric901201@gmail.com>
Future-Outlier pushed a commit to Future-Outlier/ray that referenced this pull request Dec 7, 2025
This PR is a follow-up to
ray-project#56147 (comment)

The `find_free_port()` function was duplicated across multiple locations
in the codebase. This PR consolidates all implementations into a single
canonical location.

---------

Signed-off-by: Yicheng-Lu-llll <luyc58576@gmail.com>
Signed-off-by: Future-Outlier <eric901201@gmail.com>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
Signed-off-by: Yicheng-Lu-llll <luyc58576@gmail.com>
Signed-off-by: Yicheng-Lu-llll <51814063+Yicheng-Lu-llll@users.noreply.github.com>
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
This PR is a follow-up to
ray-project#56147 (comment)

The `find_free_port()` function was duplicated across multiple locations
in the codebase. This PR consolidates all implementations into a single
canonical location.

---------

Signed-off-by: Yicheng-Lu-llll <luyc58576@gmail.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community core Issues that should be addressed in Ray Core go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants