network: various fixes for OS X socket behavior#1381
network: various fixes for OS X socket behavior#1381mattklein123 merged 11 commits intoenvoyproxy:masterfrom
Conversation
| return; | ||
| } | ||
| } | ||
| #endif |
There was a problem hiding this comment.
@PiotrSikora you mentioned kq_errno but I'm not familiar with that and google only seems to be showing me references to nginx source code. It looks like there the event flags from libevent are preserved. Does that enable it to detect this condition?
There was a problem hiding this comment.
Ooops, sorry! kq_errno is indeed just NGINX's field name for that.
What you want is to check flags (if it's EV_EOF) and fflags (for errno) fields from struct kevent, see kqueue(2).
test/common/network/utility_test.cc
Outdated
| INSTANTIATE_TEST_CASE_P(IpVersions, NetworkUtilityGetOriginalDst, | ||
| testing::ValuesIn(TestEnvironment::getIpVersionsForTest())); | ||
|
|
||
| TEST_P(NetworkUtilityGetOriginalDst, GetOriginalDst) { |
There was a problem hiding this comment.
This test is a bit disingenuous since the connection isn't really being forwarded from another port.
However, I don't think it's even possible to do that kind of port redirection in OS X, so perhaps I should change getOriginalDst to always return nullptr in OS X and instead consider disabling the use_original_dst feature instead.
There was a problem hiding this comment.
I think it's reasonable to disable this feature altogether on OS X for now, and bring it back via separate PR at some point in the future, if it's possible at all (I think it should be possible to redirect traffic via either divert(2) or pf.conf on OS X, but I didn't verify it).
There was a problem hiding this comment.
I agree. I would just revert all of this IMO and only do the SOL_IP call on linux, and on apple just set error to ENOTSUPPORTED (or whatever) so that the subsequence if check fails with a TODO.
| UNREFERENCED_PARAMETER(rc); | ||
|
|
||
| #ifdef __APPLE__ | ||
| // When a connect() is performed on a non-blocking socket, it'll return -1/EINPROGRESS |
There was a problem hiding this comment.
This whole thing is absolutely nutty. Is there some documentation or stack overflow or something we can link to?
There was a problem hiding this comment.
This PR LGTM with this exception as well. Just some better citations would be sufficient if there is no way to improve on this.
test/common/network/utility_test.cc
Outdated
| signal(); | ||
| thread->join(); | ||
|
|
||
| #if !defined(__APPLE__) |
There was a problem hiding this comment.
I don't quite understand what this test is doing. Just making sure the calls actually work? And we don't support it on IPv6 basically? Per above I would just disable all of this on OSX and just leave the code as is before. If we keep this test I have some other comments but I can hold off for now until we decide.
| RELEASE_ASSERT(fd != -1); | ||
|
|
||
| #ifdef __APPLE__ | ||
| // Cannot set SOCK_NONBLOCK as a ::socket flag |
There was a problem hiding this comment.
Can we make Linux do what is needed for OS X here, and reduce the special casing?
There was a problem hiding this comment.
There are perf implications for this (multiple system calls vs. 1), so in this case I would be in favor of the ifdef. If there is a way we can make it less visually annoying that's fine.
There was a problem hiding this comment.
I switched flagsFromSocketType to be socketFromSocketType and do all the flag setting, fcntl, and socket creation there. I think this is better, since it's all contained in one block.
|
|
||
| #ifdef __APPLE__ | ||
| // Cannot set SOCK_NONBLOCK as a ::socket flag | ||
| RELEASE_ASSERT(fcntl(fd, F_SETFL, O_NONBLOCK) != -1); |
There was a problem hiding this comment.
This overwrites existing flags... It's probably safe, since we've just created the socket, but for my sanity, could you verify that fcntl(fd, F_GETFL) returns 0 at this point? Just one-off experiment, not in the PR.
There was a problem hiding this comment.
Hmm... I'm confused. Apparently, it doesn't overwrite flags on OS X.
fcntl(fd, F_GETFL) on a new socket returns 0x0002 (O_RDWR), but it returns 0x0006 (O_RDWR|O_NONBLOCK) after fcntl(fd, F_SETFL, O_NONBLOCK), which is not something I'd expect.
There was a problem hiding this comment.
From http://pubs.opengroup.org/onlinepubs/9699919799/functions/fcntl.html:
F_SETFL
Set the file status flags, defined in <fcntl.h>, for the file description associated with fildes from the corresponding bits in the third argument, arg, taken as type int. Bits corresponding to the file access mode and the file creation flags, as defined in <fcntl.h>, that are set in arg shall be ignored. If any bits in arg other than those mentioned here are changed by the application, the result is unspecified. If fildes does not support non-blocking operations, it is unspecified whether the O_NONBLOCK flag will be ignored.
which explains why O_RDWR isn't cleared.
mattklein123
left a comment
There was a problem hiding this comment.
This LGTM beyond my ask for some additional comments on how you knew about the crazy SO_ERROR stuff on OS X. @htuch @PiotrSikora ?
|
@zuercher ping on the |
|
Ah maybe @PiotrSikora knows how to get rid of the crazy hack. Nice. |
|
Sorry, I was deep in bazel internals all day trying to figure out a better solution for my other open PR. I plan to revisit this and get rid of that ifdef if possible. I'm off Monday, so not much will happen till Tuesday. Empirically, that block does get triggered but when it does the read ends up returning EAGAIN. Then the next call to onWriteReady gets no errors. Taking the block out crashes some integration tests. It seems like the socket isn't fully connected somehow, but that seems crazy as well. I'm hopeful that I can inspect the event earlier and detect the EAGAIN case before we ever call onWriteReady. |
|
So, I can't directly look at EV_EOF (as @PiotrSikora suggested) because the kqueue event is hidden inside libevent. I believe the problem occurs when |
…ckets and set flags as needed
|
I eventually realized that ENOTCONN in both the original
The original ifdef code worked by ignoring the ENOTCONN error and waiting for the real write event that occurs when the connection completes. You can get a similar effect by ignoring the ENOTCONN from Instead, I added an |
| static const uint32_t Connecting = 0x02; | ||
| static const uint32_t CloseWithFlush = 0x04; | ||
| static const uint32_t ImmediateConnectionError = 0x08; | ||
| static const uint32_t Unconnected = 0x10; |
There was a problem hiding this comment.
What if we just start client connections in the Connecting state? Is there any downside to that? Do we need Unconnected state for anything?
There was a problem hiding this comment.
That should work. Just me being pedantic.
| int new_value = enable; | ||
| rc = setsockopt(fd_, IPPROTO_TCP, TCP_NODELAY, &new_value, sizeof(new_value)); | ||
| #ifdef __APPLE__ | ||
| if (-1 == rc && errno == EINVAL) { |
There was a problem hiding this comment.
When does this happen? Is this needed with the connected fixes?
There was a problem hiding this comment.
Hm. This still happens and I think it's basically the same reason.
We call ConnectionImpl::connect() and then immediately ConnectionImpl::noDelay(true), but the socket isn't connected yet, so OS X returns an error.
I added some code to read the value of TCP_NODELAY and it does get set despite the error.
There was a problem hiding this comment.
I think the way to get rid of this #ifdef is to only set TCP_NODELAY after the socket is connected. That means either changing all the call sites to invoke this from a callback or else tracking the desired setting and applying in onWriteReady.
There was a problem hiding this comment.
This seems like an OS X bug to me, but I digress. Hmm. In this case I would be in favor of just putting in more of a comment on why this is done/breaks and maybe a TODO?
There was a problem hiding this comment.
I would also file an issue on this and reference it here. I'm imagining some point in the future where we're looking at performance on OS X, and the TCP_NODELAY fail ends up being due to some reason that does not actually set the option. We will then have non-deterministic performance characteristics. This is only slightly hypothetical - we've recently been chasing down a TCP proxy overhead issue that relates somewhat to TCP_NODELAY on envoy-users.
| server_connection_ = std::move(conn); | ||
| server_connection_->addConnectionCallbacks(server_callbacks_); | ||
| server_connection_->addReadFilter(read_filter_); | ||
| printf("argh!\n"); |
There was a problem hiding this comment.
oops nevermind. You did del. :)
There was a problem hiding this comment.
I did. It snuck in via a different PR of mine. 😬
| : ConnectionImpl(dispatcher, address->socket(Address::SocketType::Stream), address, | ||
| getNullLocalAddress(*address), false) {} | ||
| getNullLocalAddress(*address), false) { | ||
| markConnecting(); |
There was a problem hiding this comment.
nit: IMO it would be better to just pass a boolean to ConnectionImpl constructor saying whether to mark connecting or not vs. having a method.
mattklein123
left a comment
There was a problem hiding this comment.
Sorry two more nits. LGTM. @PiotrSikora @dnoe @htuch ?
| if (version == IpVersion::v6) { | ||
| domain = AF_INET6; | ||
| } else { | ||
| RELEASE_ASSERT(version == IpVersion::v4); |
There was a problem hiding this comment.
nit: I would make this normal ASSERT
| domain = AF_INET; | ||
| } | ||
| } else { | ||
| RELEASE_ASSERT(type() == Type::Pipe); |
There was a problem hiding this comment.
nit: I would make this normal ASSERT
htuch
left a comment
There was a problem hiding this comment.
LGTM if you can file that issue on TCP_NODELAY. Thanks.
|
Seeing #1446 also reference illumos, I'm curious if at least some of the fixes here may need to apply to illumos (and possibly Oracle Solaris) as well? |
Signed-off-by: Jose Nino <jnino@lyft.com> Signed-off-by: JP Simard <jp@jpsim.com>
Signed-off-by: Jose Nino <jnino@lyft.com> Signed-off-by: JP Simard <jp@jpsim.com>
**Description** This PR adds the Anthropic logo to the LLM providers section and sorts all providers alphabetically for better organization. **Changes:** - Added Anthropic SVG logo to static assets - Updated provider list in both README.md and website component - Implemented alphabetical sorting for the provider list on homepage - Updated website description to clarify that providers are displayed alphabetically **Related Issues/PRs:** Related PR: #1369 --------- Signed-off-by: Erica Hughberg <erica.sundberg.90@gmail.com>
Some network socket changes for OS X support:
getpeernamereturns a larger size in itsaddrlenparameter for an anonymous peer than on Linux.fcntlto setO_NONBLOCK. While making this change I made all the socket creation paths RELEASE_ASSERT on the fd being valid.setsockoptif it's invoked on an already-closed socket.getsockopt(,,SOL_ORIGINAL_DST,)-- usegetsocknameinstead and work around the fact that the latter works for IPv6 where the Linux getsockopt call does not. Added direct testing ofNetwork::Utility::getOriginalDst.(Split out from #1348, in support of #128.)