-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
osx support (fixes #128) #1348
osx support (fixes #128) #1348
Conversation
When a connect() is performed on a non-blocking socket, it'll return -1/EINPROGRESS immediatly and when select()'ed for writability it'll succeed or fail but when it fails the reason is hidden. `getsockopt(,,SO_ERROR,,)` solves this issue on some sytems but on some cases such as FreeBSD getsockopt returns 0 even on `EAGAIN`. The following trick issues a `getpeername()` which will return -1/ENOTCONN if the socket is not connected and read(,,1) will produce the right errno.
Note that this implementation doesn't follow RFC4122. It's a fully random generated UUID instead of time-based.
osx: invoke fcntl to set O_NONBLOCK (cannot be set via socket(2))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a quick skim so far. This is great, thanks for the contribution.
patches/freebsd11_osymlink.patch
Outdated
@@ -0,0 +1,95 @@ | |||
--- ./lib/libc/sys/open.2.orig 2016-09-28 16:25:57.000000000 -0700 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this file intended for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was in @Reflejo's original branch. I gather it's a FreeBSD path required for the kqueue WatcherImpl to work. I'm fine to remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a patch for freebsd11's kernel to enable O_SYMLINK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we watch the directory (similarly to the inotify()
implementation on Linux)? I don't think patching FreeBSD kernel and adding custom features to it is a viable prerequisite for Envoy on FreeBSD.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I didn't intended this to go to upstream. Honestly I don't think there is a way to do what envoy is doing for the symlink observation without this. But I didn't investigate this longer enough for a conclusive answer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To spin this differently... @mattklein123: now, that we have LDS and overall better xDS support, do you think that we still need the runtime configuration (and therfore Watcher
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we need it. We will also support loading xDS APIs off of filesystem. Either way, I think we can ignore BSD for the purpose of this PR. Let's just get OSX working and move in. If someone like @Reflejo wants to add in BSD support after and can test on current master, that's fine.
#include "common/filesystem/watcher_impl.h" | ||
// NOLINT(namespace-envoy) | ||
#if defined(__linux__) | ||
#include "common/filesystem/watcher_impl_linux.cc" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use select based Bazel target srcs/hdrs to avoid this kind of compile time conditioning?
@@ -78,7 +82,14 @@ InstanceConstSharedPtr peerAddressFromFd(int fd) { | |||
} | |||
|
|||
int InstanceBase::flagsFromSocketType(SocketType type) const { | |||
#if defined(__FreeBSD__) | |||
int flags = O_NONBLOCK; | |||
#elif defined(__APPLE__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In various places you condition on both FreeBSD and APPLE. Why is this? Are you adding FreeBSD and OS X support? Is the FreeBSD part of the OS X support (but then why special APPLE conditions?)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should keep it since I got freeBSD to pass all tests as well. I'd double check that it's still the case. The delta between FreeBSD support and OSX is actually pretty small
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
going to remove the partial FreeBSD support per gitter conversation
int fd = ::socket(AF_INET, flagsFromSocketType(type), 0); | ||
#ifdef __APPLE__ | ||
if (fd != -1) { | ||
RELEASE_ASSERT(fcntl(fd, F_SETFL, O_NONBLOCK) != -1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment explaining why this is a special case for APPLE?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
// reason is hidden. `getsockopt(,,SO_ERROR,,)` solves this issue on some systems but on some | ||
// cases such as FreeBSD getsockopt returns 0 even on `EAGAIN`. The following trick issues a | ||
// `getpeername()` which will return -1/ENOTCONN if the socket is not connected and read(,,1) | ||
// will produce the right errno. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems sketchy. Is there some reference to where this is shown to be accurate for all errnos?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a good summary here: https://cr.yp.to/docs/connect.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that connection error will be available in kq_errno
on OSX & FreeBSD and it should be used instead of getsockopt(SO_ERROR)
on those platforms, so this might not be necessary at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can only offer that it works, empirically. This case is triggered regularly in testing on OS X. I think it makes sense to keep it disabled under Linux since it'll only add overhead and doesn't seem to be a problem there.
@@ -438,6 +450,27 @@ void ConnectionImpl::onWriteReady() { | |||
ASSERT(0 == rc); | |||
UNREFERENCED_PARAMETER(rc); | |||
|
|||
#if !defined(__linux__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this true for all non-Linux or only for OS X?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it's true for FreeBSD and OS X. I'll just change it to #ifdef __APPLE__.
uuid_unparse(uuid, generated_uuid); | ||
#else | ||
sprintf(generated_uuid, "%08lx-%04x-%04x-%04x-%04x%08lx", | ||
static_cast<unsigned long>(arc4random()), static_cast<unsigned>(arc4random() & 0xffff), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably best to generate the minimum number of random numbers required here before the sprintf
. Also, would prefer snprintf
if using C formatters for safety.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll just remove this since it's not used for Linux or OS X.
#else // __linux__ | ||
|
||
/** | ||
* No-op implementation of HotRestart for everybody else. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We (@ Google) would find this useful even on Linux. @mrice32. Maybe there is some way to condition this at build time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On this point, can we potentially just revert all the changes to hot_restart_impl.h/cc, and just add a new hot_restart_nop_impl.h/cc, and do select() compile? This will keep existing code cleaner and make it possible to compile out hot restart support. (I think MSFT people will want this too initially).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still see build errors like this:
(on my merged HEAD from 1346 and 1348).
source/common/http/async_client_impl.cc:17:73: error: default initialization of an object of const type 'const AsyncStreamImpl::NullRetryPolicy' without a user-provided default constructor
const AsyncStreamImpl::NullRetryPolicy AsyncStreamImpl::RouteEntryImpl::retry_policy_;
^
{}
@@ -179,7 +180,7 @@ def envoy_cc_test(name, | |||
copts = envoy_copts(repository, test = True), | |||
# TODO(mattklein123): It's not great that we universally link against the following libs. | |||
# In particular, -latomic is not needed on all platforms. Make this more granular. | |||
linkopts = ["-pthread", "-latomic"], | |||
linkopts = ["-pthread", "-latomic", "-luuid"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will need to install libuuid-dev
for this in build container, and also add this to doc.
@@ -11,6 +11,10 @@ | |||
#include "spdlog/spdlog.h" | |||
#include "tclap/CmdLine.h" | |||
|
|||
#if defined(__APPLE__) | |||
#define uint64_t unsigned long |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need this? uint64_t
should just exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be in header <cstdint>
included above. You might need to do using std::uint64_t;
however.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uint64_t
exists but is typedef unsigned long long
. Without this you get this error:
In file included from source/server/options_impl.cc:12:
In file included from external/envoy_deps/thirdparty/tclap/include/tclap/CmdLine.h:27:
In file included from external/envoy_deps/thirdparty/tclap/include/tclap/SwitchArg.h:30:
In file included from external/envoy_deps/thirdparty/tclap/include/tclap/Arg.h:54:
external/envoy_deps/thirdparty/tclap/include/tclap/ArgTraits.h:80:22: error: type 'unsigned long long' cannot be used prior to '::' because it has no members
typedef typename T::ValueCategory ValueCategory;
^
external/envoy_deps/thirdparty/tclap/include/tclap/ValueArg.h:403:37: note: in instantiation of template class 'TCLAP::ArgTraits<unsigned long long>' requested here
ExtractValue(_value, val, typename ArgTraits<T>::ValueCategory());
^
external/envoy_deps/thirdparty/tclap/include/tclap/ValueArg.h:363:5: note: in instantiation of member function 'TCLAP::ValueArg<unsigned long long>::_extractValue' requested here
_extractValue( args[*i] );
^
source/server/options_impl.cc:31:29: note: in instantiation of member function 'TCLAP::ValueArg<unsigned long long>::processArg' requested here
TCLAP::ValueArg<uint64_t> base_id(
^
1 error generated.
Swapping out the typedef I added for using std::uint64_t;
produces an identical error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrrm, this is unfortunate. I don't understand why the define doesn't cause the same problem.
If we can't figure out a better solution please do add a detailed comment explaining why it's done this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally figured this out. TCLAP provides implementations of ArgTraits<T>
for various types, but the ones for unsigned long long
and long long
are gated by #ifdef HAVE_LONG_LONG
, which isn't set. I think they were expecting users of libraries to use autoconf which would detect this and make the necessary #define. I'll post it with a few other things tomorrow (split out in a new PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments across stuff I have specific familiarity with.
source/exe/signal_action.cc
Outdated
@@ -16,6 +16,8 @@ void SignalAction::sigHandler(int sig, siginfo_t* info, void* context) { | |||
#ifdef REG_RIP | |||
// x86_64 | |||
error_pc = reinterpret_cast<void*>(ucontext->uc_mcontext.gregs[REG_RIP]); | |||
#elif defined(__APPLE__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to add a conditional for x86 32 bit here? No need to implement it now, but I'm guessing a 32 bit compile will fail because there's no __rip.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
source/exe/signal_action.cc
Outdated
int hidx = 0; | ||
for (const auto& sig : FATAL_SIGS) { | ||
RELEASE_ASSERT(sigaction(sig, &previous_handlers_[hidx++], nullptr) == 0); | ||
} | ||
} | ||
|
||
#ifdef __APPLE__ | ||
#define MAP_STACK (0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to add #ifndef MAP_STACK around this in case Darwin adds real map_stack in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
@@ -57,13 +59,24 @@ void SignalAction::installSigHandlers() { | |||
} | |||
|
|||
void SignalAction::removeSigHandlers() { | |||
#if defined(__APPLE__) | |||
// ss_flags contains SS_DISABLE, but Darwin still checks the size, contrary to the man page |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
D'oh, good catch!
@@ -47,7 +49,9 @@ namespace Envoy { | |||
class SignalAction : NonCopyable { | |||
public: | |||
SignalAction() | |||
: guard_size_(sysconf(_SC_PAGE_SIZE)), altstack_size_(guard_size_ * 4), altstack_(nullptr) { | |||
: guard_size_(sysconf(_SC_PAGE_SIZE)), | |||
altstack_size_(std::max(guard_size_ * 4, static_cast<size_t>(MINSIGSTKSZ))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if the MINSIGSTKSZ case is hit then protections for the immediately adjacent pages still works OK, there's just a portion at the end which is not protected. The second mprotect call above could have the size adjusted to cover this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's correct. We ultimately mmap altstack_size_ + 2 * guard_size_
bytes (see mapSizeWithGuards
). altstack_size_
doesn't include the guard pages. The second mprotect calls protects guard_size_
bytes at an address that's guard_size_ + altstack_size_
bytes from the start of the mapped memory.
We call sigaltstack
with altstack_ + guard_size_
(to skip the first guard page) and altstack_size_
(which doesn't count the guard pages, and so stops just before the second guard page).
Did I miss something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, all of this looks correct. Guard pages are always outside the altstack_size_, and MINSIGSTKSZ should always be a multiple of page size.
@@ -11,6 +11,10 @@ | |||
#include "spdlog/spdlog.h" | |||
#include "tclap/CmdLine.h" | |||
|
|||
#if defined(__APPLE__) | |||
#define uint64_t unsigned long |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be in header <cstdint>
included above. You might need to do using std::uint64_t;
however.
@@ -2,6 +2,12 @@ | |||
|
|||
set -e | |||
|
|||
if [[ `uname` != "Linux" ]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's probably a way this test could be completely disabled with a Bazel rule instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but I don't know how.
Normally, I could use select
to choose either no test script or a different one. However, the output of select
is only usable by bazel rules (because only at the point of rule execution is it possible to inspect the configuration to choose which subset of parameters to select).
envoy_sh_test
is a macro and it attempts to access members of its srcs
parameter to produce inputs for several rules. This doesn't work if srcs is assigned from select (whose output has no []
operator). To make it work, envoy_sh_test would have to become a rule and at that point I'm in over my head.
@@ -10,6 +10,15 @@ | |||
|
|||
#include "spdlog/spdlog.h" | |||
|
|||
#ifdef __APPLE__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will end up getting used again elsewhere. Can we put this into common/common/byte_order.h or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a ton for doing this. Some quick comments from a skim. Will do a more full review after the next pass of comment fixes once everyone else has looked.
// Since we are using SOCK_NONBLOCK, there are cases (on FreeBSD for example) where the immediate | ||
// connect response is EINPROGRESS but the socket failed and the calling code might not know about | ||
// it yet. | ||
if (-1 == rc && errno == ECONNRESET) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we guard this with free bsd only then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed it since it doesn't seem to happen on OSX.
return ::socket(AF_INET, flagsFromSocketType(type), 0); | ||
int fd = ::socket(AF_INET, flagsFromSocketType(type), 0); | ||
#ifdef __APPLE__ | ||
if (fd != -1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any point in doing fd != -1 check here? (Same in other places). Will just crash anyway in RELEASE_ASSERT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did the check because the original code didn't assert (for AF_INET and AF_UNIX). I'll go ahead and just assert on the fd and then assert on the fcntl call.
return Address::InstanceConstSharedPtr{ | ||
new Address::Ipv4Instance(reinterpret_cast<sockaddr_in*>(&orig_addr))}; | ||
|
||
case AF_INET6: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these new cases and the throw have code coverage on linux CI? If not will need to refactor somehow to get coverage.
#else // __linux__ | ||
|
||
/** | ||
* No-op implementation of HotRestart for everybody else. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On this point, can we potentially just revert all the changes to hot_restart_impl.h/cc, and just add a new hot_restart_nop_impl.h/cc, and do select() compile? This will keep existing code cleaner and make it possible to compile out hot restart support. (I think MSFT people will want this too initially).
@@ -323,14 +323,14 @@ TEST(Ipv6CidrRange, InstanceConstSharedPtrAndLengthCtor) { | |||
|
|||
TEST(Ipv6CidrRange, StringAndLengthCtor) { | |||
CidrRange rng; | |||
rng = CidrRange::create("::ffff", 122); // Assignment operator. | |||
rng = CidrRange::create("ff::ffff", 122); // Assignment operator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we losing any linux test coverage w/ these changes? cc @hennna
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not off the top of my head. I don't think these tests are dependent on the address value. Is the issue with OSX the leading ::?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sort of. Wikipedia tells me an old version of the IPv6 spec considered ::/96
addresses (that is addresses with 96 bits of leading zeros) to be IPv6-mapped IPv4 addresses. It's been deprecated for a decade, but when OSX converts, say ::ffff
, to a string it comes out as "::0.0.255.255" and the tests fail because they do string comparisons. Linux only provides the "mapped IPv4" style addresses with trailing dotted quads for the ::ffff:0:0/96
range (which replaced ::/96
). It seems to me that these test are concerned with how the low order bits of the address are manipulated and that using a different prefix doesn't change the meaning of the tests.
(I wrote a little program to round trip IP address strings to binary and back. https://gist.github.com/zuercher/0d0bc5a433fda423124c71afa2eb29e1)
@@ -102,6 +103,48 @@ TEST_F(RingHashLoadBalancerTest, Basic) { | |||
TestLoadBalancerContext context(0); | |||
EXPECT_EQ(cluster_.hosts_[5], lb_.chooseHost(&context)); | |||
} | |||
#else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if it's really worth doing this. This is already not great and related to my TODO above. Seems like it might be better to just not run these tests on Apple until someone fixes the TODO to compile murmur3 or city into the code (which is not difficult).
… to select between real and no-op HotRestartImpl code
…P_STACK existing in the future
void free(Stats::RawStatData& data) override; | ||
|
||
private: | ||
SharedMemory& shmem_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO the NOP hot restart impl should not use shared memory or the process shared mutex stuff. Basically, I would just do exactly what https://github.com/lyft/envoy/blob/master/test/integration/server.cc#L93 does which is to use process local locks, the heap allocator, etc. I think you can probably revert all the changes related to shared memory and just move the test impl into prod code somewhere and use that.
@zuercher general comment. Can we start to split out some of this change into self contained PRs? Some of the test fixes can go in and get reviewed independently, as can some of the trivial header changes, etc. That way we can start merging the not controversial stuff and trim down the size of the main PR. Thank you. |
Going to close this out in lieu of the other PRs that are ongoing. |
Some highlights:
source/common/filesystem: adds a kqueue-based implementation of WatcherImpl. The conditional compiliation is hacky.
source/common/network: handle odd-ball FreeBSD/osx errors on get/setsockopt. Also, SOCK_NONBLOCK doesn't exist on FreeBSD or OSX. OSX has an extra byte field in
struct sockaddr
and derivatives that needs to be accounted for in some places.source/common/runtime: use libuuid to generate UUIDs on Linux and OSX. Freebsd gets a hand-rolled implementation.
source/exe: hot restart is completely disabled for osx and freebsd
test/common/http: on OSX,
struct tm
configured for a date before 1901-12-13 20:45:52 becomes atime_t
that represents 1969-12-31 23:59:59.test/common/network: OSX doesn't like some of IPv6 addresses used in testing. By default only a single IPv4 loopback address exists in OSX (127.0.0.1).
Various and sundry explicit #include statements to pick up definitions that are obtained transitively on Linux.