-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LibOS,PAL] Support EINPROGRESS
on non-blocking sockets connect
#1643
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 20 files reviewed, all discussions resolved, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel)
libos/src/sys/libos_socket.c
line 652 at r1 (raw file):
bool has_sendtimeout_set = !!sock->sendtimeout_us; ret = -((ssize_t)sock->last_error);
FYI: I detected this bug while testing this sockets subsystem (incorrect type expansion, leading to ret
having a positive value). It is unrelated to this PR.
libos/src/sys/libos_socket.c
line 815 at r1 (raw file):
lock(&sock->lock); bool has_recvtimeout_set = !!sock->receivetimeout_us; ret = -((ssize_t)sock->last_error);
FYI: I detected this bug while testing this sockets subsystem (incorrect type expansion, leading to ret
having a positive value). It is unrelated to this PR.
9e1e0c8
to
144916b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 20 files reviewed, 2 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel)
a discussion (no related file):
Use an un-routable address 10.255.255.255 in the test.
a discussion (no related file):
send
and recv
must return EAGAIN on sockets in CONNECTING state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 20 files reviewed, 3 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel)
a discussion (no related file):
Add test for poll
and for epoll
. Note that select()
is emulated by poll in Gramine anyway, so no sense in testing it.
144916b
to
3fdcd9b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 20 files reviewed, all discussions resolved, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel)
a discussion (no related file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Use an un-routable address 10.255.255.255 in the test.
Done
a discussion (no related file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
send
andrecv
must return EAGAIN on sockets in CONNECTING state.
Done
a discussion (no related file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Add test for
poll
and forepoll
. Note thatselect()
is emulated by poll in Gramine anyway, so no sense in testing it.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 20 files reviewed, 4 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel)
a discussion (no related file):
FYI: A couple links I found useful:
- https://stackoverflow.com/questions/17769964/linux-sockets-non-blocking-connect
- posix connect() blocks even when O_NONBLOCK is set. microsoft/WSL#2412 (and other links from there)
libos/src/sys/libos_epoll.c
line 690 at r3 (raw file):
*/ enum libos_sock_state state = __atomic_load_n(&items[i]->handle->info.sock.state, __ATOMIC_ACQUIRE);
I'm unsure here, that's a bit of a hack. Technically we should always take the lock sock->lock
before accessing the state
field (it's not supposed to be an atomic).
So I see three solutions here:
- Just take the lock immediately, don't play this dance with "check if state is CONNECTING in atomic manner for performance reasons". But perf will downgrade significantly I think...
- Keep like this. Technically, in real-world apps, the transition CONNECTING -> CONNECTED will happen only in one thread/case, so this hacky atomic load is fine.
- Introduce a new atomic field
bool sock->is_connecting
. It will be set to true when state is moved to CONNECTING, and then we'll do this atomic load but onsock->is_connecting
instead ofsock->state
.
libos/src/sys/libos_poll.c
line 229 at r3 (raw file):
*/ enum libos_sock_state state = __atomic_load_n(&libos_handles[i]->info.sock.state, __ATOMIC_ACQUIRE);
ditto (hacky atomic load?)
libos/test/regression/tcp_einprogress.c
line 123 at r3 (raw file):
CHECK(close(epfd)); pollout = !!(out_event.events & EPOLLOUT); }
I think I need to extract the ret == 0
case outside of these IFs. Otherwise it's complicated to read this one interesting case
.
libos/test/regression/test_libos.py
line 1432 at r3 (raw file):
def test_308_socket_tcp_einprogress_unresponsive_epoll(self): # Test unresponsive peer: first connect() returns EINPROGRESS, then poll/epoll times out # because the connection cannot be established
Oops, must remove this copy-pasted comment, it's redundant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 20 files reviewed, 4 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel)
a discussion (no related file):
Repeating here, for manual testing of tcp_einprogress
:
-
Drop SYN packets on 127.0.0.1:12345
sudo iptables -A OUTPUT -p tcp --tcp-flags SYN,ACK,FIN,RST SYN -d 127.0.0.1 --dport 12345 -j DROP
-
Re-establish all packets on 127.0.0.1:12345
sudo iptables -D OUTPUT -p tcp --tcp-flags SYN,ACK,FIN,RST SYN -d 127.0.0.1 --dport 12345 -j DROP
Also, change TIMEOUT_MS
to e.g. 10 seconds for manual testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 17 of 20 files at r1, 3 of 3 files at r3, all commit messages.
Reviewable status: all files reviewed, 14 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv)
a discussion (no related file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Repeating here, for manual testing of
tcp_einprogress
:
Drop SYN packets on 127.0.0.1:12345
sudo iptables -A OUTPUT -p tcp --tcp-flags SYN,ACK,FIN,RST SYN -d 127.0.0.1 --dport 12345 -j DROP
Re-establish all packets on 127.0.0.1:12345
sudo iptables -D OUTPUT -p tcp --tcp-flags SYN,ACK,FIN,RST SYN -d 127.0.0.1 --dport 12345 -j DROP
Also, change
TIMEOUT_MS
to e.g. 10 seconds for manual testing.
Should we point readers to these instructions/steps of running manual tests somewhere in the code/comment? (otherwise they could only search for the original issue/PR)
libos/src/sys/libos_epoll.c
line 690 at r3 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
I'm unsure here, that's a bit of a hack. Technically we should always take the lock
sock->lock
before accessing thestate
field (it's not supposed to be an atomic).So I see three solutions here:
- Just take the lock immediately, don't play this dance with "check if state is CONNECTING in atomic manner for performance reasons". But perf will downgrade significantly I think...
- Keep like this. Technically, in real-world apps, the transition CONNECTING -> CONNECTED will happen only in one thread/case, so this hacky atomic load is fine.
- Introduce a new atomic field
bool sock->is_connecting
. It will be set to true when state is moved to CONNECTING, and then we'll do this atomic load but onsock->is_connecting
instead ofsock->state
.
I vote for solution #3 which removes the hacky w/o introduing much complexity (IMO).
libos/src/sys/libos_socket.c
line 31 at r3 (raw file):
* | | | (if it was bound) | | * | | connect() | | | * | | + select()/poll()/ | |
why is this change?
Code quote:
+
libos/src/sys/libos_socket.c
line 33 at r3 (raw file):
* | | + select()/poll()/ | | * | | V epoll() + | * | +---------------------> CONNECTING ---------------- CONNECTED <----+
Should we note that the CONNECTING
state only applies to non-blocking sockets? (Otherwise the selcect()
/poll()
/epoll()
transition doesn't make that much sense)
Code quote:
CONNECTING
libos/test/regression/tcp_einprogress.c
line 28 at r3 (raw file):
if (argc != 3) ERR("Usage: %s <IP address> poll|epoll\n(use 127.0.0.1 for responsive peer, 10.255.255.255"
Do we need ERR()
here?
Code quote:
ERR(
libos/test/regression/tcp_einprogress.c
line 32 at r3 (raw file):
if (strcmp(argv[2], "poll") && strcmp(argv[2], "epoll")) ERR("Usage: %s <IP address> poll|epoll\nerror: only poll/epoll allowed", argv[0]);
ditto
libos/test/regression/test_libos.py
line 1415 at r3 (raw file):
def test_305_socket_tcp_einprogress_responsive_poll(self): # Test responsive peer: first connect() returns EINPROGRESS, then poll/epoll immediately
Should we move the comment out of def
so that it serves both tests.
libos/test/regression/test_libos.py
line 1425 at r3 (raw file):
def test_307_socket_tcp_einprogress_unresponsive_poll(self): # Test unresponsive peer: first connect() returns EINPROGRESS, then poll/epoll times out
ditto
pal/src/host/linux-sgx/enclave_ocalls.c
line 1429 at r3 (raw file):
} int ocall_connect_simple(int fd, struct sockaddr_storage* addr, size_t* addrlen) {
I understand why we don't change ocall_connect()
, but still find it a bit weird :).
Code quote:
ocall_connect_simple
pal/src/host/linux-sgx/enclave_ocalls.c
line 1460 at r3 (raw file):
if (ret != -EACCES && ret != -EPERM && ret != -EADDRINUSE && ret != -EADDRNOTAVAIL && ret != -EAFNOSUPPORT && ret != -EAGAIN && ret != -EALREADY && ret != -EBADF && ret != -ECONNREFUSED && ret != -EINPROGRESS && ret != -EISCONN
I think we can remove this now
Code quote:
&& ret != -EINPROGRESS
pal/src/host/linux-sgx/host_ocalls.c
line 514 at r3 (raw file):
/* Connect succeeded or in progress (EINPROGRESS); in both cases retrieve local name -- host * Linux binds the socket to address even in case of EINPROGRESS */
pls add a .
at the end of a sentence
pal/src/host/linux-sgx/pal_sockets.c
line 275 at r3 (raw file):
/* Connect succeeded or in progress (EINPROGRESS); in both cases local name of the socket was * retrieved, must verify it */
diito
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 14 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv)
libos/src/sys/libos_epoll.c
line 690 at r3 (raw file):
Previously, kailun-qin (Kailun Qin) wrote…
I vote for solution #3 which removes the hacky w/o introduing much complexity (IMO).
oops... sorry for the unintentional link -- I meant the 3rd option
f5f58da
to
725b89b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 11 of 20 files reviewed, 9 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @kailun-qin)
a discussion (no related file):
Previously, kailun-qin (Kailun Qin) wrote…
Should we point readers to these instructions/steps of running manual tests somewhere in the code/comment? (otherwise they could only search for the original issue/PR)
I think it's not necessary. Users can figure out these commands themselves, if they really want to. Otherwise, they will do some GitHubbing and will find these instructions.
I put it here mainly for reviewers convenience, if they want to test these things manually.
libos/src/sys/libos_epoll.c
line 690 at r3 (raw file):
Previously, kailun-qin (Kailun Qin) wrote…
oops... sorry for the unintentional link -- I meant the 3rd option
Done. Did solution 3.
libos/src/sys/libos_poll.c
line 229 at r3 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
ditto (hacky atomic load?)
Done
libos/src/sys/libos_socket.c
line 31 at r3 (raw file):
Previously, kailun-qin (Kailun Qin) wrote…
why is this change?
Done. That's because I use AsciiFlow to draw these diagrams, and AsciiFlow automatically replaced such places. Fixed manually now.
libos/src/sys/libos_socket.c
line 33 at r3 (raw file):
Previously, kailun-qin (Kailun Qin) wrote…
Should we note that the
CONNECTING
state only applies to non-blocking sockets? (Otherwise theselcect()
/poll()
/epoll()
transition doesn't make that much sense)
Done.
libos/test/regression/tcp_einprogress.c
line 28 at r3 (raw file):
Previously, kailun-qin (Kailun Qin) wrote…
Do we need
ERR()
here?
Done.
libos/test/regression/tcp_einprogress.c
line 32 at r3 (raw file):
Previously, kailun-qin (Kailun Qin) wrote…
ditto
Done.
libos/test/regression/tcp_einprogress.c
line 123 at r3 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
I think I need to extract the
ret == 0
case outside of these IFs. Otherwise it's complicated to read thisone interesting case
.
Done. Also refactored other places in this test a bit.
libos/test/regression/test_libos.py
line 1415 at r3 (raw file):
Previously, kailun-qin (Kailun Qin) wrote…
Should we move the comment out of
def
so that it serves both tests.
Done.
libos/test/regression/test_libos.py
line 1425 at r3 (raw file):
Previously, kailun-qin (Kailun Qin) wrote…
ditto
Done.
libos/test/regression/test_libos.py
line 1432 at r3 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Oops, must remove this copy-pasted comment, it's redundant
Done
pal/src/host/linux-sgx/enclave_ocalls.c
line 1429 at r3 (raw file):
Previously, kailun-qin (Kailun Qin) wrote…
I understand why we don't change
ocall_connect()
, but still find it a bit weird :).
Do you mean our use of two connect functions: ocall_connect()
and ocall_connect_simple()
? Yes, it's a bit weird, and maybe in the future we'll refactor to have just one generic OCALL.
ocall_connect()
is currently used only for pipe emulation in Gramine (using host UNIX sockets).ocall_connect_simple()
is currently used by TCP/UDP sockets.
The main difference between them is that ocall_connect()
hides under the hood a bunch of operations: socket()
-> bind()
-> connect()
. This is required because emulated pipes don't have the notion of "create, assign an address and connect to another address", so Gramine has to emulate a single "pipe create" with this series of operations.
pal/src/host/linux-sgx/enclave_ocalls.c
line 1460 at r3 (raw file):
Previously, kailun-qin (Kailun Qin) wrote…
I think we can remove this now
Done. I was a bit reluctant, as we could easily miss this check if we ever refactor EINPROGRESS logic again. But I agree, it looks weird to have a dead-code check.
pal/src/host/linux-sgx/host_ocalls.c
line 514 at r3 (raw file):
Previously, kailun-qin (Kailun Qin) wrote…
pls add a
.
at the end of a sentence
Done.
pal/src/host/linux-sgx/pal_sockets.c
line 275 at r3 (raw file):
Previously, kailun-qin (Kailun Qin) wrote…
diito
Done.
Previously, Gramine transformed `connect()` of non-blocking sockets into a blocking operation and thus never returned -EINPROGRESS. This led to the connect operation being very slow (waiting for a host timeout) if a remote peer is unresponsive. This is a backport of PR #1643 to Gramine v1.5. Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 9 of 9 files at r4, all commit messages.
Reviewable status: all files reviewed, all discussions resolved, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners
a discussion (no related file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
I think it's not necessary. Users can figure out these commands themselves, if they really want to. Otherwise, they will do some GitHubbing and will find these instructions.
I put it here mainly for reviewers convenience, if they want to test these things manually.
Sure, I'm fine w/ this.
pal/src/host/linux-sgx/enclave_ocalls.c
line 1429 at r3 (raw file):
Do you mean our use of two connect functions:
ocall_connect()
andocall_connect_simple()
? Yes, it's a bit weird, and maybe in the future we'll refactor to have just one generic OCALL.
Yes :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 8 of 20 files at r1, 4 of 9 files at r4, all commit messages.
Reviewable status: all files reviewed, 11 unresolved discussions, not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv)
a discussion (no related file):
unfortunately the test must be run manually with special network settings to properly test the EINPROGRESS behavior.
This test passes for me without changing anything in iptables. Why? Is this intended? The description of the PR seem to suggest that it should fail then?
libos/include/libos_handle.h
line 103 at r4 (raw file):
unsigned int last_error; /* This field is an atomically-accessed version of the SOCK_CONNECTING state (for perf). */ bool connection_in_progress;
(otherwise it has a slightly different meaning and can be read as "it's still connected")
Suggestion:
bool connecting_in_progress;
libos/src/net/ip.c
line 227 at r4 (raw file):
if (ret < 0) { if (ret == -PAL_ERROR_CONNFAILED) { /* can't use pal_to_unix_errno() because it translates to ECONNRESET */
Why not change pal_to_unix_errno()
translation instead? Does it break some other places?
libos/src/net/ip.c
line 232 at r4 (raw file):
assert(handle->flags & O_NONBLOCK); ret = pal_to_unix_errno(ret); goto out;
This execution path later uses PalSocketConnect()
's output (pal_local_addr
), despite the function failure. Or at least it seems that it can use it? Isn't it uninitialized then?
libos/src/sys/libos_epoll.c
line 676 at r4 (raw file):
} if (pal_ret_events[i] & PAL_WAIT_WRITE && items[i]->handle->type == TYPE_SOCK) {
Only write? What about reading?
libos/src/sys/libos_epoll.c
line 677 at r4 (raw file):
if (pal_ret_events[i] & PAL_WAIT_WRITE && items[i]->handle->type == TYPE_SOCK) { /*
Shouldn't this whole logic be a function in the socket code? We wouldn't have that big copy-paste then, and the socket-related logic would be in socket sources, not in epoll sources.
libos/test/regression/tcp_einprogress.c
line 102 at r4 (raw file):
if (bytes != -1 || errno != EAGAIN) { ERR("[after EINPROGRESS] expected send to fail with EAGAIN but failed with %s", strerror(errno));
This can also be reached if send()
succeeded.
libos/test/regression/tcp_einprogress.c
line 108 at r4 (raw file):
if (bytes != -1 || errno != EAGAIN) { ERR("[after EINPROGRESS] expected recv to fail with EAGAIN but failed with %s", strerror(errno));
ditto
libos/test/regression/test_libos.py
line 1415 at r4 (raw file):
# Two tests for a responsive peer: first connect() returns EINPROGRESS, then poll/epoll # immediately returns because the connection is quickly established
Is it? There's nothing listening on this port on localhost...? The assert message seems to be correct though.
Code quote:
because the connection is quickly established
pal/src/host/linux-sgx/pal_sockets.c
line 270 at r4 (raw file):
int ret = ocall_connect_simple(handle->sock.fd, &sa_storage, &linux_addrlen); if (ret < 0 && ret != -EINPROGRESS) {
This shouldn't be allowed for non-blocking sockets (this probably should be verified in the ocall wrapper, as we do with other return values).
Code quote:
ret != -EINPROGRESS
pal/src/host/linux-sgx/pal_sockets.c
line 277 at r4 (raw file):
* retrieved, must verify it. */ if (out_local_addr) { int verify_ret = verify_ip_addr(handle->sock.domain, &sa_storage, linux_addrlen);
This is pretty bad, everywhere we use the convention that if a function failed, then the arg-returned values are undefined. Here it's the same, except for one specific error... This breaks the global convention we're using everywhere.
Just FYI. We executed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @vasanth-intel ,this is really helpful!
Reviewable status: all files reviewed, 11 unresolved discussions, not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @mkow)
a discussion (no related file):
Previously, mkow (Michał Kowalczyk) wrote…
unfortunately the test must be run manually with special network settings to properly test the EINPROGRESS behavior.
This test passes for me without changing anything in iptables. Why? Is this intended? The description of the PR seem to suggest that it should fail then?
Oops, I thought I fixed the PR description everywhere (e.g., I fixed the "How to test this PR?" section). Apparently forgot to change this text too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 4 of 27 files reviewed, 11 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @kailun-qin and @mkow)
a discussion (no related file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Oops, I thought I fixed the PR description everywhere (e.g., I fixed the "How to test this PR?" section). Apparently forgot to change this text too.
Done
libos/include/libos_handle.h
line 103 at r4 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
(otherwise it has a slightly different meaning and can be read as "it's still connected")
Done.
libos/src/net/ip.c
line 227 at r4 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Why not change
pal_to_unix_errno()
translation instead? Does it break some other places?
Yes, it will break other places. Please note that it's not my change -- this was like this before this PR.
See https://github.com/search?q=repo%3Agramineproject%2Fgramine%20PAL_ERROR_CONNFAILED&type=code
In particular, this is the "other differing" code:
Lines 240 to 244 in d052d18
ret = PalStreamOpen(pipe_name, PAL_ACCESS_RDWR, /*share_flags=*/0, PAL_CREATE_IGNORED, options, | |
&pal_handle); | |
if (ret < 0) { | |
return (ret == -PAL_ERROR_CONNFAILED) ? -ENOENT : pal_to_unix_errno(ret); | |
} |
As you can see, PAL_ERROR_CONNFAILED
has two meanings:
ECONNREFUSED
for TCP/UDP socketsENOENT
for UNIX sockets
(Why? That's because UNIX sockets are "like files" in Linux, but we emulate them as pipes inside PAL. So we have a mismatch in what "connection failed" means.)
libos/src/net/ip.c
line 232 at r4 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
This execution path later uses
PalSocketConnect()
's output (pal_local_addr
), despite the function failure. Or at least it seems that it can use it? Isn't it uninitialized then?
Done.
You already realized it yourself upon finishing the review, but just to reiterate: yes, POSIX/Linux considers EINPROGRESS
as a success of the connect operation, even though it returns an error code. This is an odd semantics, and in the first revision of my PR, I inherited this semantics.
Now I modified the PR to have "sane" semantics, and propagate the "connecting-in-progress" condition in a separate boolean variable.
This by the way allowed me to remove the now-unused PAL_ERROR_INPROGRESS
.
libos/src/sys/libos_epoll.c
line 676 at r4 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Only write? What about reading?
The man pages instruct the application to perform select/poll/epoll with the POLLOUT event:
EINPROGRESS
The socket is nonblocking and the connection cannot be
completed immediately. (UNIX domain sockets failed with
EAGAIN instead.) It is possible to select(2) or poll(2)
for completion by selecting the socket for writing. After
select(2) indicates writability, use getsockopt(2) to read
the SO_ERROR option at level SOL_SOCKET to determine
whether connect() completed successfully (SO_ERROR is
zero) or unsuccessfully (SO_ERROR is one of the usual
error codes listed here, explaining the reason for the
failure).
Using the read (POLLIN) event doesn't lead to anything, so we don't take it into account here.
libos/src/sys/libos_epoll.c
line 677 at r4 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Shouldn't this whole logic be a function in the socket code? We wouldn't have that big copy-paste then, and the socket-related logic would be in socket sources, not in epoll sources.
Done, yes, looks much better now.
libos/test/regression/tcp_einprogress.c
line 102 at r4 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
This can also be reached if
send()
succeeded.
Done. I hope I understood you correctly. Fixed also all other such places.
libos/test/regression/tcp_einprogress.c
line 108 at r4 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
ditto
Done.
libos/test/regression/test_libos.py
line 1415 at r4 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Is it? There's nothing listening on this port on localhost...? The assert message seems to be correct though.
Done. Oops, a typo.
pal/src/host/linux-sgx/pal_sockets.c
line 270 at r4 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
This shouldn't be allowed for non-blocking sockets (this probably should be verified in the ocall wrapper, as we do with other return values).
Done.
pal/src/host/linux-sgx/pal_sockets.c
line 277 at r4 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
This is pretty bad, everywhere we use the convention that if a function failed, then the arg-returned values are undefined. Here it's the same, except for one specific error... This breaks the global convention we're using everywhere.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 9 files at r4, 23 of 23 files at r5, all commit messages.
Reviewable status: all files reviewed, 11 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv)
libos/src/net/ip.c
line 227 at r4 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Yes, it will break other places. Please note that it's not my change -- this was like this before this PR.
See https://github.com/search?q=repo%3Agramineproject%2Fgramine%20PAL_ERROR_CONNFAILED&type=code
In particular, this is the "other differing" code:
Lines 240 to 244 in d052d18
ret = PalStreamOpen(pipe_name, PAL_ACCESS_RDWR, /*share_flags=*/0, PAL_CREATE_IGNORED, options, &pal_handle); if (ret < 0) { return (ret == -PAL_ERROR_CONNFAILED) ? -ENOENT : pal_to_unix_errno(ret); } As you can see,
PAL_ERROR_CONNFAILED
has two meanings:
ECONNREFUSED
for TCP/UDP socketsENOENT
for UNIX sockets(Why? That's because UNIX sockets are "like files" in Linux, but we emulate them as pipes inside PAL. So we have a mismatch in what "connection failed" means.)
I see, looks good then.
libos/src/net/ip.c
line 250 at r5 (raw file):
bool inprogress_unused; int ret = PalSocketConnect(sock->pal_handle, &pal_ip_addr, /*local_addr=*/NULL, &inprogress_unused);
I think I forgot it, why do we have this weird way of disconnecting instead of just having PalSocketDisconnect()
with simpler arguments?
libos/src/sys/libos_epoll.c
line 676 at r4 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
The man pages instruct the application to perform select/poll/epoll with the POLLOUT event:
EINPROGRESS The socket is nonblocking and the connection cannot be completed immediately. (UNIX domain sockets failed with EAGAIN instead.) It is possible to select(2) or poll(2) for completion by selecting the socket for writing. After select(2) indicates writability, use getsockopt(2) to read the SO_ERROR option at level SOL_SOCKET to determine whether connect() completed successfully (SO_ERROR is zero) or unsuccessfully (SO_ERROR is one of the usual error codes listed here, explaining the reason for the failure).
Using the read (POLLIN) event doesn't lead to anything, so we don't take it into account here.
I'm not sure if this implies that this is the only way of checking it. What if someone connects and then polls for read, because in that specific protocol it's the server which sends the data first?
libos/src/sys/libos_epoll.c
line 678 at r5 (raw file):
if (items[i]->handle->type == TYPE_SOCK) { check_connect_inprogress_on_poll(items[i]->handle, pal_ret_events[i]);
Wouldn't the pal_ret_events[i]
check be clearer and simpler if it was here? This is the place which checks the conditions when to re-check the state anyways, and now this checking is split between two places and you need an additional argument in check_connect_inprogress_on_poll()
.
libos/src/sys/libos_poll.c
line 216 at r5 (raw file):
fds[i].revents |= fds[i].events & (POLLOUT | POLLWRNORM); if (libos_handles[i]->type == TYPE_SOCK) {
ditto
libos/src/sys/libos_socket.c
line 96 at r5 (raw file):
* SOCK_CONNECTED state and clear SO_ERROR. * * This is only relevant if POLLOUT event was requested. See `man 2 connect`, EINPROGRESS case.
Just to not forget: if we decide to wait for more, then this will also need a change (and it's easy to miss).
libos/src/sys/libos_socket.c
line 114 at r5 (raw file):
if (sock->state != SOCK_CONNECTING) { /* theoretically, another thread could be doing another select/poll on this socket and * modify the state; we don't support this unlikely case */
Why not support it? Isn't it just unlock+return here? (and one change below, see my comment)
libos/src/sys/libos_socket.c
line 117 at r5 (raw file):
BUG(); } if (pal_ret_events & (PAL_WAIT_ERROR | PAL_WAIT_HANG_UP)) {
Could you explain what's the logic behind this IF? Why does what we waited for influence sock->last_error
?
libos/src/sys/libos_socket.c
line 121 at r5 (raw file):
} else { sock->last_error = 0; __atomic_store_n(&sock->connecting_in_progress, false, __ATOMIC_RELEASE);
To fix concurrency I think we should move it to be the last step, so the fast-path never returns while the state is only half-updated to say it's connected. Not really dangerous because we have all of this under the lock, but IMO seems like a saner/safer version.
libos/src/sys/libos_socket.c
line 259 at r5 (raw file):
bool inprogress; ret = sock2->ops->connect(handle2, &addr, sizeof(addr), &inprogress); assert(inprogress == false);
inprogress
is initialized only on success.
pal/include/pal/pal.h
line 599 at r5 (raw file):
* Can be NULL, to ignore the result. * \param[out] out_inprogress On success, returns true in special case of an in-progress connection * on non-blocking socket.
Suggestion:
on a non-blocking socket.
pal/src/host/linux-sgx/enclave_ocalls.h
line 66 at r5 (raw file):
int nonblocking
Why not bool
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 17 of 27 files reviewed, 11 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski and @mkow)
libos/src/net/ip.c
line 250 at r5 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
I think I forgot it, why do we have this weird way of disconnecting instead of just having
PalSocketDisconnect()
with simpler arguments?
This was implemented by @boryspoplawski : #579
I don't see any particular comments from my (or your) side on this.
I think Borys just followed the POSIX standard in this regard, see man 2 connect
, search for AF_UNSPEC
.
libos/src/sys/libos_epoll.c
line 676 at r4 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
I'm not sure if this implies that this is the only way of checking it. What if someone connects and then polls for read, because in that specific protocol it's the server which sends the data first?
Done. You are right, I verified manually (using the test I added in this PR) that the native behavior of Linux works even on reads (POLLIN/EPOLLIN). I added a corresponding change to my test, and updated the logic in LibOS.
libos/src/sys/libos_epoll.c
line 678 at r5 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Wouldn't the
pal_ret_events[i]
check be clearer and simpler if it was here? This is the place which checks the conditions when to re-check the state anyways, and now this checking is split between two places and you need an additional argument incheck_connect_inprogress_on_poll()
.
Done.
Note that we still need an additional argument, because we need to distinguish between success and error. I modified the argument though to be a boolean.
libos/src/sys/libos_poll.c
line 216 at r5 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
ditto
Done.
libos/src/sys/libos_socket.c
line 96 at r5 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Just to not forget: if we decide to wait for more, then this will also need a change (and it's easy to miss).
Done.
libos/src/sys/libos_socket.c
line 114 at r5 (raw file):
Done, you're right.
(and one change below, see my comment)
Why do you say that? That change seems to be irrelevant.
libos/src/sys/libos_socket.c
line 117 at r5 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Could you explain what's the logic behind this IF? Why does what we waited for influence
sock->last_error
?
The man page describes this:
EINPROGRESS
The socket is nonblocking and the connection cannot be
completed immediately. (UNIX domain sockets failed with
EAGAIN instead.) It is possible to select(2) or poll(2)
for completion by selecting the socket for writing. After
select(2) indicates writability, use getsockopt(2) to read
the SO_ERROR option at level SOL_SOCKET to determine
whether connect() completed successfully (SO_ERROR is
zero) or unsuccessfully (SO_ERROR is one of the usual
error codes listed here, explaining the reason for the
failure).
In other words, if you are polling on an in-progress-connecting socket, and it returns POLLHUP or POLLERR (these are translated into PAL versions PAL_WAIT_HANG_UP
and PAL_WAIT_ERROR
), you can perform getsockopt(socket_fd, SO_ERROR)
and you must get the error code (ECONNREFUSED
in this case). See also the test that I added in this PR.
libos/src/sys/libos_socket.c
line 121 at r5 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
To fix concurrency I think we should move it to be the last step, so the fast-path never returns while the state is only half-updated to say it's connected. Not really dangerous because we have all of this under the lock, but IMO seems like a saner/safer version.
Done.
libos/src/sys/libos_socket.c
line 259 at r5 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
inprogress
is initialized only on success.
Done.
I first wanted to do a more complicated assert(ret < 0 || inprogress == false)
, but then decide to just remove this assert, as this weak assert didn't deserve too much complexity. If you feel that the complicated assert should stay, I can change.
pal/include/pal/pal.h
line 599 at r5 (raw file):
* Can be NULL, to ignore the result. * \param[out] out_inprogress On success, returns true in special case of an in-progress connection * on non-blocking socket.
Done.
pal/src/host/linux-sgx/enclave_ocalls.h
line 66 at r5 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
int nonblockingWhy not
bool
?
Done.
Initially I had int
because this header file didn't use bool
before my change. But I agree, there is no reason not to have a bool. And maybe we'll fix all other arguments of all other functions in this file one day.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 10 of 10 files at r6, all commit messages.
Reviewable status: all files reviewed, all discussions resolved, "fixup! " found in commit messages' one-liners
libos/src/net/ip.c
line 250 at r5 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
This was implemented by @boryspoplawski : #579
I don't see any particular comments from my (or your) side on this.
I think Borys just followed the POSIX standard in this regard, see
man 2 connect
, search forAF_UNSPEC
.
Weird, I'm pretty sure we had a discussion about this, but I don't remember the details... Anyways, it's not for this PR. But I'd change it, it's really weird.
And POSIX is not a good argument, it contains a lot of terrible APIs. But yeah, may explain the origin of this construction.
libos/src/sys/libos_socket.c
line 114 at r5 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Done, you're right.
(and one change below, see my comment)
Why do you say that? That change seems to be irrelevant.
When I wrote this comment I still thought it's required, but then I noticed that it's also ok without it and forgot to update.
pal/src/host/linux-sgx/enclave_ocalls.h
line 66 at r5 (raw file):
this header file didn't use
bool
before my change
Hmm, it did? E.g. in ocall_get_quote()
. Not too many places (2?), but still :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 17 of 23 files at r5, 10 of 10 files at r6, all commit messages.
Reviewable status: all files reviewed, all discussions resolved, "fixup! " found in commit messages' one-liners
524a4b6
to
a2e7d95
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved
libos/src/sys/libos_socket.c
line 98 at r7 (raw file):
* the common case of an already-connected socket doing recv/send. */ assert(handle->type == TYPE_SOCK);
This got indented by 5 spaces. Not blocking as the CI is very slow this week, but if you want then please fix. If not, then I'll add it to my "typo list" and fix someday in a single commit with other trivial stuff.
Previously, Gramine transformed `connect()` of non-blocking sockets into a blocking operation and thus never returned -EINPROGRESS. This led to the connect operation being very slow (waiting for a host timeout) if a remote peer is unresponsive. This commit fixes this and adds a LibOS regression test (4 variants to test poll/epoll and responsive/unresponsive peer). Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii@intel.com>
a2e7d95
to
272c889
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 26 of 27 files reviewed, all discussions resolved, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @kailun-qin and @mkow)
libos/src/sys/libos_socket.c
line 98 at r7 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
This got indented by 5 spaces. Not blocking as the CI is very slow this week, but if you want then please fix. If not, then I'll add it to my "typo list" and fix someday in a single commit with other trivial stuff.
Done. It's ok, we can wait for CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r8, all commit messages.
Reviewable status: all files reviewed, all discussions resolved, not enough approvals from different teams (1 more required, approved so far: Intel)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r8, all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved
Description of the changes
Previously, Gramine transformed
connect()
of non-blocking sockets into a blocking operation and thus never returned -EINPROGRESS. This led to the connect operation being very slow (waiting for a host timeout) if a remote peer is unresponsive.This commit fixes this and adds a LibOS regression test
; unfortunately the test must be run manually with special network settings to properly test the EINPROGRESS behavior. UPDATE: @kailun-qin gave suggestions how to test without any special network settings, see the test sources and notes from the meeting #1635.Fixes #1641.
How to test this PR?
A new test was added, but please test it also manually as described in #1641.
This change is