[4/n] IPv6 support: Add IPv6 support for sockets#56147
[4/n] IPv6 support: Add IPv6 support for sockets#56147jjyao merged 33 commits intoray-project:masterfrom
Conversation
a54b8e2 to
5cb3e54
Compare
| auto bootstrap_address = ConfigInternal::Instance().bootstrap_ip; | ||
| if (bootstrap_address.empty()) { | ||
| bootstrap_address = GetNodeIpAddress(); | ||
| bootstrap_address = ray::GetNodeIpAddressFromPerspective(); |
There was a problem hiding this comment.
I renamed the function to GetNodeIpAddressFromPerspective to keep it consistent with the Python function node_ip_address_from_perspective. Additionally, I used CPython bindings to avoid maintaining duplicate implementations in both C++ and Python.
588d701 to
1d3a95d
Compare
158d418 to
9243d0c
Compare
fa61584 to
697488d
Compare
c3df1df to
4ea0fc7
Compare
094b528 to
ec48d95
Compare
aslonnie
left a comment
There was a problem hiding this comment.
(for core team to review)
21b3c6b to
ccc093d
Compare
jjyao
left a comment
There was a problem hiding this comment.
Sorry for the late review.
src/ray/util/network_util.cc
Outdated
| return result; | ||
| } | ||
|
|
||
| std::string GetNodeIpAddressFromPerspective(const std::string &address) { |
There was a problem hiding this comment.
prefer to use std::optional instead of empty string since it's more explicit.
src/ray/util/network_util.cc
Outdated
| std::string GetNodeIpAddressFromPerspective(const std::string &address) { | ||
| static const auto detect_ip = [](const std::string &target_address) -> std::string { | ||
| std::vector<std::pair<std::string, boost::asio::ip::udp>> test_addresses; | ||
| if (!target_address.empty()) { |
There was a problem hiding this comment.
Is this target address must be ip, if so we can check whether it's ipv4 or ipv6 and use the protocol correspondingly. We don't need to try both everytime?
python/ray/_common/network_utils.py
Outdated
| """IP address by which the local node can be reached *from* the `address`. | ||
|
|
||
| If no address is given, defaults to public DNS servers for detection. For | ||
| performance, the result is cached when using the default address (empty string). |
There was a problem hiding this comment.
Why does the performance matter here? I'd imagine that for each Ray node, we only detect it once and use it for all the other places by passing node_ip_address around.
There was a problem hiding this comment.
Got it. The reason I did this was that I had a create_socket function that’s called in many places, and it called node_ip_address_from_perspective() to automatically decide which socket to create.
Now I understand your point — I’ll try using node_ip_address directly to decide which socket to create, so node_ip_address_from_perspective() won’t need to be cached.
python/ray/_private/node.py
Outdated
| allocated_ports = set() | ||
|
|
||
| s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) | ||
| s = create_socket(socket.SOCK_STREAM) |
There was a problem hiding this comment.
At this point, we should alway know the node-ip-address and whether it's ipv4 or ipv6 and we should select AF_INET based on that so I think the code should be like
s = socket.socket(socket.AF_INET6 if is_ip_ipv6(self.node_ip_address) else socket.AF_INET, socket.SOCK_STREAM)
We don't need another function for it.
There was a problem hiding this comment.
I see your point!
My original idea was to add a safe helper create_socket for cases where node_ip_address wasn’t set yet or was hard to retrieve (but that assumption seems turn out to be false). It would:
node_ip = get_node_ip_address() # which might call `node_ip_address_from_perspective`
family = socket.AF_INET6 if is_ipv6_ip(node_ip) else socket.AF_INET
return socket.socket(family, socket_type)This lets call sites avoid worrying about IPv4 vs. IPv6.
Since create_socket is used in many places, I cached node_ip_address_from_perspective() to avoid repeated lookups.
I’ll try using node_ip_address directly in all places, which should let us remove the cache entirely for node_ip_address_from_perspective()
python/ray/_private/services.py
Outdated
| # ray start --node-ip-address. You should instead use | ||
| # get_cached_node_ip_address. | ||
| def get_node_ip_address(address="8.8.8.8:53"): | ||
| def get_node_ip_address(address=""): |
There was a problem hiding this comment.
| def get_node_ip_address(address=""): | |
| def get_node_ip_address(address=None): |
python/ray/_private/test_utils.py
Outdated
| start = time.time() | ||
| while time_elapsed <= timeout_ms: | ||
| s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) | ||
| s = create_socket(socket.SOCK_STREAM) |
There was a problem hiding this comment.
we should pick family based on whether address is ipv4 or ipv6
There was a problem hiding this comment.
Got it — my previous thought was to unify the behavior based on the node IP. I’ll make the change!
|
I’m going to reset the commit to b473c14, which is the last commit before I started reverting and debugging. I’ll update the branch once the flaky tests are fixed. |
a779dec to
b473c14
Compare
| host = "127.0.0.1" | ||
| port = self._get_unused_port( | ||
| socket.AF_INET6 if is_ipv6(host) else socket.AF_INET | ||
| ) |
There was a problem hiding this comment.
Bug: IPv6 Support Issue in Localhost Handling
The code hardcodes host = "127.0.0.1" (IPv4) and then immediately checks is_ipv6(host) to determine socket family. Since "127.0.0.1" is always IPv4, is_ipv6(host) will always return False, effectively disabling IPv6 support in this code path. In IPv6-only environments, this could prevent proper functionality. The code should use get_localhost_ip() instead to get the appropriate localhost address for the system.
| ip = ray.get(workers[0].get_node_ip.remote()) | ||
| port = ray.get(workers[0].find_free_port.remote()) | ||
| address = f"tcp://{build_address(ip, port)}" | ||
| address = f"tcp://{ip}:{port}" |
There was a problem hiding this comment.
Bug: IPv6 Address Formatting Error
The code was changed from address = f"tcp://{build_address(ip, port)}" to address = f"tcp://{ip}:{port}". This breaks IPv6 address formatting because IPv6 addresses must be wrapped in brackets when combined with a port (e.g., "[::1]:8080" not "::1:8080"). The build_address() function handles this correctly, but the manual string formatting does not, resulting in malformed URLs for IPv6 addresses.
|
Failed windows and mac tests are unrelated. |
This PR is a follow-up to #56147 (comment) The `find_free_port()` function was duplicated across multiple locations in the codebase. This PR consolidates all implementations into a single canonical location. --------- Signed-off-by: Yicheng-Lu-llll <luyc58576@gmail.com>
Signed-off-by: Yicheng-Lu-llll <luyc58576@gmail.com> Signed-off-by: Yicheng-Lu-llll <51814063+Yicheng-Lu-llll@users.noreply.github.com> Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com> Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com>
This PR is a follow-up to ray-project#56147 (comment) The `find_free_port()` function was duplicated across multiple locations in the codebase. This PR consolidates all implementations into a single canonical location. --------- Signed-off-by: Yicheng-Lu-llll <luyc58576@gmail.com>
Signed-off-by: Yicheng-Lu-llll <luyc58576@gmail.com> Signed-off-by: Yicheng-Lu-llll <51814063+Yicheng-Lu-llll@users.noreply.github.com> Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com> Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com>
This PR is a follow-up to ray-project#56147 (comment) The `find_free_port()` function was duplicated across multiple locations in the codebase. This PR consolidates all implementations into a single canonical location. --------- Signed-off-by: Yicheng-Lu-llll <luyc58576@gmail.com>
Signed-off-by: Yicheng-Lu-llll <luyc58576@gmail.com> Signed-off-by: Yicheng-Lu-llll <51814063+Yicheng-Lu-llll@users.noreply.github.com> Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com> Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com> Signed-off-by: Aydin Abiar <aydin@anyscale.com>
This PR is a follow-up to ray-project#56147 (comment) The `find_free_port()` function was duplicated across multiple locations in the codebase. This PR consolidates all implementations into a single canonical location. --------- Signed-off-by: Yicheng-Lu-llll <luyc58576@gmail.com> Signed-off-by: Aydin Abiar <aydin@anyscale.com>
This PR is a follow-up to ray-project#56147 (comment) The `find_free_port()` function was duplicated across multiple locations in the codebase. This PR consolidates all implementations into a single canonical location. --------- Signed-off-by: Yicheng-Lu-llll <luyc58576@gmail.com>
Signed-off-by: Yicheng-Lu-llll <luyc58576@gmail.com> Signed-off-by: Yicheng-Lu-llll <51814063+Yicheng-Lu-llll@users.noreply.github.com> Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com> Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com> Signed-off-by: Future-Outlier <eric901201@gmail.com>
This PR is a follow-up to ray-project#56147 (comment) The `find_free_port()` function was duplicated across multiple locations in the codebase. This PR consolidates all implementations into a single canonical location. --------- Signed-off-by: Yicheng-Lu-llll <luyc58576@gmail.com> Signed-off-by: Future-Outlier <eric901201@gmail.com>
Signed-off-by: Yicheng-Lu-llll <luyc58576@gmail.com> Signed-off-by: Yicheng-Lu-llll <51814063+Yicheng-Lu-llll@users.noreply.github.com> Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com> Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com> Signed-off-by: peterxcli <peterxcli@gmail.com>
This PR is a follow-up to ray-project#56147 (comment) The `find_free_port()` function was duplicated across multiple locations in the codebase. This PR consolidates all implementations into a single canonical location. --------- Signed-off-by: Yicheng-Lu-llll <luyc58576@gmail.com> Signed-off-by: peterxcli <peterxcli@gmail.com>
Why are these changes needed?
This PR updates all socket-related code to ensure compatibility with both IPv4 and IPv6.
Related issue number
Checks
See all the tests I have done here: https://docs.google.com/document/d/129aZnlNu4U3KaJbbriLHvUizUaZDnx9WgdNh50tp-c8/edit?tab=t.0
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.Note
Introduces IPv6-aware networking (address parsing, IP detection, socket binding, and port checks) across C++ and Python, updating services and helpers to prefer correct AF and exposing new utilities.
IsIPv6,GetNodeIpAddressFromPerspective, andCheckPortFree(family, port)inray/util/network_util.{h,cc}; keepBuildAddress/ParseAddressand URL helpers; extend tests.WorkerPool, tests, redis async context, runtime init paths).node_ip_address_from_perspective,is_ipv6), addget_localhost_ip, and adjustis_localhost.get_node_ip_address, dashboard bind checks, TLS cert SANs).find_free_portaccept family.CheckPortFree; improve port selection robustness.tcp://{ip}:{port}replacingbuild_address.Written by Cursor Bugbot for commit 3e235d8. This will update automatically on new commits. Configure here.