Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LibOS,PAL] Support EINPROGRESS on non-blocking sockets connect #1643

Merged
merged 1 commit into from
Nov 24, 2023

Conversation

dimakuv
Copy link

@dimakuv dimakuv commented Nov 7, 2023

Description of the changes

Previously, Gramine transformed connect() of non-blocking sockets into a blocking operation and thus never returned -EINPROGRESS. This led to the connect operation being very slow (waiting for a host timeout) if a remote peer is unresponsive.

This commit fixes this and adds a LibOS regression test ; unfortunately the test must be run manually with special network settings to properly test the EINPROGRESS behavior. UPDATE: @kailun-qin gave suggestions how to test without any special network settings, see the test sources and notes from the meeting #1635.

Fixes #1641.

How to test this PR?

A new test was added, but please test it also manually as described in #1641.


This change is Reviewable

Copy link
Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 20 files reviewed, all discussions resolved, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel)


libos/src/sys/libos_socket.c line 652 at r1 (raw file):

    bool has_sendtimeout_set = !!sock->sendtimeout_us;

    ret = -((ssize_t)sock->last_error);

FYI: I detected this bug while testing this sockets subsystem (incorrect type expansion, leading to ret having a positive value). It is unrelated to this PR.


libos/src/sys/libos_socket.c line 815 at r1 (raw file):

    lock(&sock->lock);
    bool has_recvtimeout_set = !!sock->receivetimeout_us;
    ret = -((ssize_t)sock->last_error);

FYI: I detected this bug while testing this sockets subsystem (incorrect type expansion, leading to ret having a positive value). It is unrelated to this PR.

@dimakuv dimakuv force-pushed the dimakuv/support-einprogress branch from 9e1e0c8 to 144916b Compare November 7, 2023 15:04
Copy link
Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 20 files reviewed, 2 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel)

a discussion (no related file):
Use an un-routable address 10.255.255.255 in the test.


a discussion (no related file):
send and recv must return EAGAIN on sockets in CONNECTING state.


Copy link
Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 20 files reviewed, 3 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel)

a discussion (no related file):
Add test for poll and for epoll. Note that select() is emulated by poll in Gramine anyway, so no sense in testing it.


@dimakuv dimakuv force-pushed the dimakuv/support-einprogress branch from 144916b to 3fdcd9b Compare November 7, 2023 19:47
Copy link
Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 20 files reviewed, all discussions resolved, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel)

a discussion (no related file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Use an un-routable address 10.255.255.255 in the test.

Done


a discussion (no related file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

send and recv must return EAGAIN on sockets in CONNECTING state.

Done


a discussion (no related file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Add test for poll and for epoll. Note that select() is emulated by poll in Gramine anyway, so no sense in testing it.

Done


Copy link
Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 20 files reviewed, 4 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel)

a discussion (no related file):
FYI: A couple links I found useful:



libos/src/sys/libos_epoll.c line 690 at r3 (raw file):

                 */
                enum libos_sock_state state = __atomic_load_n(&items[i]->handle->info.sock.state,
                                                              __ATOMIC_ACQUIRE);

I'm unsure here, that's a bit of a hack. Technically we should always take the lock sock->lock before accessing the state field (it's not supposed to be an atomic).

So I see three solutions here:

  • Just take the lock immediately, don't play this dance with "check if state is CONNECTING in atomic manner for performance reasons". But perf will downgrade significantly I think...
  • Keep like this. Technically, in real-world apps, the transition CONNECTING -> CONNECTED will happen only in one thread/case, so this hacky atomic load is fine.
  • Introduce a new atomic field bool sock->is_connecting. It will be set to true when state is moved to CONNECTING, and then we'll do this atomic load but on sock->is_connecting instead of sock->state.

libos/src/sys/libos_poll.c line 229 at r3 (raw file):

             */
            enum libos_sock_state state = __atomic_load_n(&libos_handles[i]->info.sock.state,
                                                          __ATOMIC_ACQUIRE);

ditto (hacky atomic load?)


libos/test/regression/tcp_einprogress.c line 123 at r3 (raw file):

        CHECK(close(epfd));
        pollout = !!(out_event.events & EPOLLOUT);
    }

I think I need to extract the ret == 0 case outside of these IFs. Otherwise it's complicated to read this one interesting case.


libos/test/regression/test_libos.py line 1432 at r3 (raw file):

    def test_308_socket_tcp_einprogress_unresponsive_epoll(self):
        # Test unresponsive peer: first connect() returns EINPROGRESS, then poll/epoll times out
        # because the connection cannot be established

Oops, must remove this copy-pasted comment, it's redundant

Copy link
Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 20 files reviewed, 4 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel)

a discussion (no related file):
Repeating here, for manual testing of tcp_einprogress:

  • Drop SYN packets on 127.0.0.1:12345

    sudo iptables -A OUTPUT -p tcp --tcp-flags SYN,ACK,FIN,RST SYN -d 127.0.0.1 --dport 12345 -j DROP
    
  • Re-establish all packets on 127.0.0.1:12345

    sudo iptables -D OUTPUT -p tcp --tcp-flags SYN,ACK,FIN,RST SYN -d 127.0.0.1 --dport 12345 -j DROP
    

Also, change TIMEOUT_MS to e.g. 10 seconds for manual testing.


Copy link
Contributor

@kailun-qin kailun-qin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 17 of 20 files at r1, 3 of 3 files at r3, all commit messages.
Reviewable status: all files reviewed, 14 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv)

a discussion (no related file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Repeating here, for manual testing of tcp_einprogress:

  • Drop SYN packets on 127.0.0.1:12345

    sudo iptables -A OUTPUT -p tcp --tcp-flags SYN,ACK,FIN,RST SYN -d 127.0.0.1 --dport 12345 -j DROP
    
  • Re-establish all packets on 127.0.0.1:12345

    sudo iptables -D OUTPUT -p tcp --tcp-flags SYN,ACK,FIN,RST SYN -d 127.0.0.1 --dport 12345 -j DROP
    

Also, change TIMEOUT_MS to e.g. 10 seconds for manual testing.

Should we point readers to these instructions/steps of running manual tests somewhere in the code/comment? (otherwise they could only search for the original issue/PR)



libos/src/sys/libos_epoll.c line 690 at r3 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I'm unsure here, that's a bit of a hack. Technically we should always take the lock sock->lock before accessing the state field (it's not supposed to be an atomic).

So I see three solutions here:

  • Just take the lock immediately, don't play this dance with "check if state is CONNECTING in atomic manner for performance reasons". But perf will downgrade significantly I think...
  • Keep like this. Technically, in real-world apps, the transition CONNECTING -> CONNECTED will happen only in one thread/case, so this hacky atomic load is fine.
  • Introduce a new atomic field bool sock->is_connecting. It will be set to true when state is moved to CONNECTING, and then we'll do this atomic load but on sock->is_connecting instead of sock->state.

I vote for solution #3 which removes the hacky w/o introduing much complexity (IMO).


libos/src/sys/libos_socket.c line 31 at r3 (raw file):

 *  |     |                        |         (if it was bound)  |            |
 *  |     | connect()              |                            |            |
 *  |     |                        +         select()/poll()/   |            |

why is this change?

Code quote:

+

libos/src/sys/libos_socket.c line 33 at r3 (raw file):

 *  |     |                        +         select()/poll()/   |            |
 *  |     |                        V            epoll()         +            |
 *  |     +---------------------> CONNECTING ---------------- CONNECTED <----+

Should we note that the CONNECTING state only applies to non-blocking sockets? (Otherwise the selcect()/poll()/epoll() transition doesn't make that much sense)

Code quote:

CONNECTING

libos/test/regression/tcp_einprogress.c line 28 at r3 (raw file):

    if (argc != 3)
        ERR("Usage: %s <IP address> poll|epoll\n(use 127.0.0.1 for responsive peer, 10.255.255.255"

Do we need ERR() here?

Code quote:

ERR(

libos/test/regression/tcp_einprogress.c line 32 at r3 (raw file):

    if (strcmp(argv[2], "poll") && strcmp(argv[2], "epoll"))
        ERR("Usage: %s <IP address> poll|epoll\nerror: only poll/epoll allowed", argv[0]);

ditto


libos/test/regression/test_libos.py line 1415 at r3 (raw file):

    def test_305_socket_tcp_einprogress_responsive_poll(self):
        # Test responsive peer: first connect() returns EINPROGRESS, then poll/epoll immediately

Should we move the comment out of def so that it serves both tests.


libos/test/regression/test_libos.py line 1425 at r3 (raw file):

    def test_307_socket_tcp_einprogress_unresponsive_poll(self):
        # Test unresponsive peer: first connect() returns EINPROGRESS, then poll/epoll times out

ditto


pal/src/host/linux-sgx/enclave_ocalls.c line 1429 at r3 (raw file):

}

int ocall_connect_simple(int fd, struct sockaddr_storage* addr, size_t* addrlen) {

I understand why we don't change ocall_connect(), but still find it a bit weird :).

Code quote:

ocall_connect_simple

pal/src/host/linux-sgx/enclave_ocalls.c line 1460 at r3 (raw file):

        if (ret != -EACCES && ret != -EPERM && ret != -EADDRINUSE && ret != -EADDRNOTAVAIL
                && ret != -EAFNOSUPPORT && ret != -EAGAIN && ret != -EALREADY && ret != -EBADF
                && ret != -ECONNREFUSED && ret != -EINPROGRESS && ret != -EISCONN

I think we can remove this now

Code quote:

&& ret != -EINPROGRESS

pal/src/host/linux-sgx/host_ocalls.c line 514 at r3 (raw file):

    /* Connect succeeded or in progress (EINPROGRESS); in both cases retrieve local name -- host
     * Linux binds the socket to address even in case of EINPROGRESS */

pls add a . at the end of a sentence


pal/src/host/linux-sgx/pal_sockets.c line 275 at r3 (raw file):

    /* Connect succeeded or in progress (EINPROGRESS); in both cases local name of the socket was
     * retrieved, must verify it */

diito

Copy link
Contributor

@kailun-qin kailun-qin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 14 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv)


libos/src/sys/libos_epoll.c line 690 at r3 (raw file):

Previously, kailun-qin (Kailun Qin) wrote…

I vote for solution #3 which removes the hacky w/o introduing much complexity (IMO).

oops... sorry for the unintentional link -- I meant the 3rd option

@dimakuv dimakuv force-pushed the dimakuv/support-einprogress branch from f5f58da to 725b89b Compare November 8, 2023 08:39
Copy link
Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 11 of 20 files reviewed, 9 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @kailun-qin)

a discussion (no related file):

Previously, kailun-qin (Kailun Qin) wrote…

Should we point readers to these instructions/steps of running manual tests somewhere in the code/comment? (otherwise they could only search for the original issue/PR)

I think it's not necessary. Users can figure out these commands themselves, if they really want to. Otherwise, they will do some GitHubbing and will find these instructions.

I put it here mainly for reviewers convenience, if they want to test these things manually.



libos/src/sys/libos_epoll.c line 690 at r3 (raw file):

Previously, kailun-qin (Kailun Qin) wrote…

oops... sorry for the unintentional link -- I meant the 3rd option

Done. Did solution 3.


libos/src/sys/libos_poll.c line 229 at r3 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

ditto (hacky atomic load?)

Done


libos/src/sys/libos_socket.c line 31 at r3 (raw file):

Previously, kailun-qin (Kailun Qin) wrote…

why is this change?

Done. That's because I use AsciiFlow to draw these diagrams, and AsciiFlow automatically replaced such places. Fixed manually now.


libos/src/sys/libos_socket.c line 33 at r3 (raw file):

Previously, kailun-qin (Kailun Qin) wrote…

Should we note that the CONNECTING state only applies to non-blocking sockets? (Otherwise the selcect()/poll()/epoll() transition doesn't make that much sense)

Done.


libos/test/regression/tcp_einprogress.c line 28 at r3 (raw file):

Previously, kailun-qin (Kailun Qin) wrote…

Do we need ERR() here?

Done.


libos/test/regression/tcp_einprogress.c line 32 at r3 (raw file):

Previously, kailun-qin (Kailun Qin) wrote…

ditto

Done.


libos/test/regression/tcp_einprogress.c line 123 at r3 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I think I need to extract the ret == 0 case outside of these IFs. Otherwise it's complicated to read this one interesting case.

Done. Also refactored other places in this test a bit.


libos/test/regression/test_libos.py line 1415 at r3 (raw file):

Previously, kailun-qin (Kailun Qin) wrote…

Should we move the comment out of def so that it serves both tests.

Done.


libos/test/regression/test_libos.py line 1425 at r3 (raw file):

Previously, kailun-qin (Kailun Qin) wrote…

ditto

Done.


libos/test/regression/test_libos.py line 1432 at r3 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Oops, must remove this copy-pasted comment, it's redundant

Done


pal/src/host/linux-sgx/enclave_ocalls.c line 1429 at r3 (raw file):

Previously, kailun-qin (Kailun Qin) wrote…

I understand why we don't change ocall_connect(), but still find it a bit weird :).

Do you mean our use of two connect functions: ocall_connect() and ocall_connect_simple()? Yes, it's a bit weird, and maybe in the future we'll refactor to have just one generic OCALL.

  • ocall_connect() is currently used only for pipe emulation in Gramine (using host UNIX sockets).
  • ocall_connect_simple() is currently used by TCP/UDP sockets.

The main difference between them is that ocall_connect() hides under the hood a bunch of operations: socket() -> bind() -> connect(). This is required because emulated pipes don't have the notion of "create, assign an address and connect to another address", so Gramine has to emulate a single "pipe create" with this series of operations.


pal/src/host/linux-sgx/enclave_ocalls.c line 1460 at r3 (raw file):

Previously, kailun-qin (Kailun Qin) wrote…

I think we can remove this now

Done. I was a bit reluctant, as we could easily miss this check if we ever refactor EINPROGRESS logic again. But I agree, it looks weird to have a dead-code check.


pal/src/host/linux-sgx/host_ocalls.c line 514 at r3 (raw file):

Previously, kailun-qin (Kailun Qin) wrote…

pls add a . at the end of a sentence

Done.


pal/src/host/linux-sgx/pal_sockets.c line 275 at r3 (raw file):

Previously, kailun-qin (Kailun Qin) wrote…

diito

Done.

dimakuv pushed a commit that referenced this pull request Nov 8, 2023
Previously, Gramine transformed `connect()` of non-blocking sockets into
a blocking operation and thus never returned -EINPROGRESS. This led to
the connect operation being very slow (waiting for a host timeout) if a
remote peer is unresponsive.

This is a backport of PR #1643 to Gramine v1.5.

Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii@intel.com>
kailun-qin
kailun-qin previously approved these changes Nov 9, 2023
Copy link
Contributor

@kailun-qin kailun-qin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 9 of 9 files at r4, all commit messages.
Reviewable status: all files reviewed, all discussions resolved, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners

a discussion (no related file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I think it's not necessary. Users can figure out these commands themselves, if they really want to. Otherwise, they will do some GitHubbing and will find these instructions.

I put it here mainly for reviewers convenience, if they want to test these things manually.

Sure, I'm fine w/ this.



pal/src/host/linux-sgx/enclave_ocalls.c line 1429 at r3 (raw file):

Do you mean our use of two connect functions: ocall_connect() and ocall_connect_simple()? Yes, it's a bit weird, and maybe in the future we'll refactor to have just one generic OCALL.

Yes :).

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 8 of 20 files at r1, 4 of 9 files at r4, all commit messages.
Reviewable status: all files reviewed, 11 unresolved discussions, not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv)

a discussion (no related file):

unfortunately the test must be run manually with special network settings to properly test the EINPROGRESS behavior.

This test passes for me without changing anything in iptables. Why? Is this intended? The description of the PR seem to suggest that it should fail then?



libos/include/libos_handle.h line 103 at r4 (raw file):

    unsigned int last_error;
    /* This field is an atomically-accessed version of the SOCK_CONNECTING state (for perf). */
    bool connection_in_progress;

(otherwise it has a slightly different meaning and can be read as "it's still connected")

Suggestion:

bool connecting_in_progress;

libos/src/net/ip.c line 227 at r4 (raw file):

    if (ret < 0) {
        if (ret == -PAL_ERROR_CONNFAILED) {
            /* can't use pal_to_unix_errno() because it translates to ECONNRESET */

Why not change pal_to_unix_errno() translation instead? Does it break some other places?


libos/src/net/ip.c line 232 at r4 (raw file):

            assert(handle->flags & O_NONBLOCK);
            ret = pal_to_unix_errno(ret);
            goto out;

This execution path later uses PalSocketConnect()'s output (pal_local_addr), despite the function failure. Or at least it seems that it can use it? Isn't it uninitialized then?


libos/src/sys/libos_epoll.c line 676 at r4 (raw file):

            }

            if (pal_ret_events[i] & PAL_WAIT_WRITE && items[i]->handle->type == TYPE_SOCK) {

Only write? What about reading?


libos/src/sys/libos_epoll.c line 677 at r4 (raw file):

            if (pal_ret_events[i] & PAL_WAIT_WRITE && items[i]->handle->type == TYPE_SOCK) {
                /*

Shouldn't this whole logic be a function in the socket code? We wouldn't have that big copy-paste then, and the socket-related logic would be in socket sources, not in epoll sources.


libos/test/regression/tcp_einprogress.c line 102 at r4 (raw file):

    if (bytes != -1 || errno != EAGAIN) {
        ERR("[after EINPROGRESS] expected send to fail with EAGAIN but failed with %s",
            strerror(errno));

This can also be reached if send() succeeded.


libos/test/regression/tcp_einprogress.c line 108 at r4 (raw file):

    if (bytes != -1 || errno != EAGAIN) {
        ERR("[after EINPROGRESS] expected recv to fail with EAGAIN but failed with %s",
            strerror(errno));

ditto


libos/test/regression/test_libos.py line 1415 at r4 (raw file):

    # Two tests for a responsive peer: first connect() returns EINPROGRESS, then poll/epoll
    # immediately returns because the connection is quickly established

Is it? There's nothing listening on this port on localhost...? The assert message seems to be correct though.

Code quote:

because the connection is quickly established

pal/src/host/linux-sgx/pal_sockets.c line 270 at r4 (raw file):

    int ret = ocall_connect_simple(handle->sock.fd, &sa_storage, &linux_addrlen);
    if (ret < 0 && ret != -EINPROGRESS) {

This shouldn't be allowed for non-blocking sockets (this probably should be verified in the ocall wrapper, as we do with other return values).

Code quote:

ret != -EINPROGRESS

pal/src/host/linux-sgx/pal_sockets.c line 277 at r4 (raw file):

     * retrieved, must verify it. */
    if (out_local_addr) {
        int verify_ret = verify_ip_addr(handle->sock.domain, &sa_storage, linux_addrlen);

This is pretty bad, everywhere we use the convention that if a function failed, then the arg-returned values are undefined. Here it's the same, except for one specific error... This breaks the global convention we're using everywhere.

@vasanth-intel
Copy link
Contributor

Just FYI. We executed Redis and Memcached workloads on the PR to check if performance is impacted or not. We did not observe any regressions on perf front.

Copy link
Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @vasanth-intel ,this is really helpful!

Reviewable status: all files reviewed, 11 unresolved discussions, not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @mkow)

a discussion (no related file):

Previously, mkow (Michał Kowalczyk) wrote…

unfortunately the test must be run manually with special network settings to properly test the EINPROGRESS behavior.

This test passes for me without changing anything in iptables. Why? Is this intended? The description of the PR seem to suggest that it should fail then?

Oops, I thought I fixed the PR description everywhere (e.g., I fixed the "How to test this PR?" section). Apparently forgot to change this text too.


Copy link
Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 4 of 27 files reviewed, 11 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @kailun-qin and @mkow)

a discussion (no related file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Oops, I thought I fixed the PR description everywhere (e.g., I fixed the "How to test this PR?" section). Apparently forgot to change this text too.

Done



libos/include/libos_handle.h line 103 at r4 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

(otherwise it has a slightly different meaning and can be read as "it's still connected")

Done.


libos/src/net/ip.c line 227 at r4 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Why not change pal_to_unix_errno() translation instead? Does it break some other places?

Yes, it will break other places. Please note that it's not my change -- this was like this before this PR.

See https://github.com/search?q=repo%3Agramineproject%2Fgramine%20PAL_ERROR_CONNFAILED&type=code

In particular, this is the "other differing" code:

ret = PalStreamOpen(pipe_name, PAL_ACCESS_RDWR, /*share_flags=*/0, PAL_CREATE_IGNORED, options,
&pal_handle);
if (ret < 0) {
return (ret == -PAL_ERROR_CONNFAILED) ? -ENOENT : pal_to_unix_errno(ret);
}

As you can see, PAL_ERROR_CONNFAILED has two meanings:

  1. ECONNREFUSED for TCP/UDP sockets
  2. ENOENT for UNIX sockets

(Why? That's because UNIX sockets are "like files" in Linux, but we emulate them as pipes inside PAL. So we have a mismatch in what "connection failed" means.)


libos/src/net/ip.c line 232 at r4 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

This execution path later uses PalSocketConnect()'s output (pal_local_addr), despite the function failure. Or at least it seems that it can use it? Isn't it uninitialized then?

Done.

You already realized it yourself upon finishing the review, but just to reiterate: yes, POSIX/Linux considers EINPROGRESS as a success of the connect operation, even though it returns an error code. This is an odd semantics, and in the first revision of my PR, I inherited this semantics.

Now I modified the PR to have "sane" semantics, and propagate the "connecting-in-progress" condition in a separate boolean variable.

This by the way allowed me to remove the now-unused PAL_ERROR_INPROGRESS.


libos/src/sys/libos_epoll.c line 676 at r4 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Only write? What about reading?

The man pages instruct the application to perform select/poll/epoll with the POLLOUT event:

   EINPROGRESS
          The socket is nonblocking and the connection cannot be
          completed immediately.  (UNIX domain sockets failed with
          EAGAIN instead.)  It is possible to select(2) or poll(2)
          for completion by selecting the socket for writing.  After
          select(2) indicates writability, use getsockopt(2) to read
          the SO_ERROR option at level SOL_SOCKET to determine
          whether connect() completed successfully (SO_ERROR is
          zero) or unsuccessfully (SO_ERROR is one of the usual
          error codes listed here, explaining the reason for the
          failure).

Using the read (POLLIN) event doesn't lead to anything, so we don't take it into account here.


libos/src/sys/libos_epoll.c line 677 at r4 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Shouldn't this whole logic be a function in the socket code? We wouldn't have that big copy-paste then, and the socket-related logic would be in socket sources, not in epoll sources.

Done, yes, looks much better now.


libos/test/regression/tcp_einprogress.c line 102 at r4 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

This can also be reached if send() succeeded.

Done. I hope I understood you correctly. Fixed also all other such places.


libos/test/regression/tcp_einprogress.c line 108 at r4 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto

Done.


libos/test/regression/test_libos.py line 1415 at r4 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Is it? There's nothing listening on this port on localhost...? The assert message seems to be correct though.

Done. Oops, a typo.


pal/src/host/linux-sgx/pal_sockets.c line 270 at r4 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

This shouldn't be allowed for non-blocking sockets (this probably should be verified in the ocall wrapper, as we do with other return values).

Done.


pal/src/host/linux-sgx/pal_sockets.c line 277 at r4 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

This is pretty bad, everywhere we use the convention that if a function failed, then the arg-returned values are undefined. Here it's the same, except for one specific error... This breaks the global convention we're using everywhere.

Done.

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 9 files at r4, 23 of 23 files at r5, all commit messages.
Reviewable status: all files reviewed, 11 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv)


libos/src/net/ip.c line 227 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Yes, it will break other places. Please note that it's not my change -- this was like this before this PR.

See https://github.com/search?q=repo%3Agramineproject%2Fgramine%20PAL_ERROR_CONNFAILED&type=code

In particular, this is the "other differing" code:

ret = PalStreamOpen(pipe_name, PAL_ACCESS_RDWR, /*share_flags=*/0, PAL_CREATE_IGNORED, options,
&pal_handle);
if (ret < 0) {
return (ret == -PAL_ERROR_CONNFAILED) ? -ENOENT : pal_to_unix_errno(ret);
}

As you can see, PAL_ERROR_CONNFAILED has two meanings:

  1. ECONNREFUSED for TCP/UDP sockets
  2. ENOENT for UNIX sockets

(Why? That's because UNIX sockets are "like files" in Linux, but we emulate them as pipes inside PAL. So we have a mismatch in what "connection failed" means.)

I see, looks good then.


libos/src/net/ip.c line 250 at r5 (raw file):

    bool inprogress_unused;
    int ret = PalSocketConnect(sock->pal_handle, &pal_ip_addr, /*local_addr=*/NULL,
                               &inprogress_unused);

I think I forgot it, why do we have this weird way of disconnecting instead of just having PalSocketDisconnect() with simpler arguments?


libos/src/sys/libos_epoll.c line 676 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

The man pages instruct the application to perform select/poll/epoll with the POLLOUT event:

   EINPROGRESS
          The socket is nonblocking and the connection cannot be
          completed immediately.  (UNIX domain sockets failed with
          EAGAIN instead.)  It is possible to select(2) or poll(2)
          for completion by selecting the socket for writing.  After
          select(2) indicates writability, use getsockopt(2) to read
          the SO_ERROR option at level SOL_SOCKET to determine
          whether connect() completed successfully (SO_ERROR is
          zero) or unsuccessfully (SO_ERROR is one of the usual
          error codes listed here, explaining the reason for the
          failure).

Using the read (POLLIN) event doesn't lead to anything, so we don't take it into account here.

I'm not sure if this implies that this is the only way of checking it. What if someone connects and then polls for read, because in that specific protocol it's the server which sends the data first?


libos/src/sys/libos_epoll.c line 678 at r5 (raw file):

            if (items[i]->handle->type == TYPE_SOCK) {
                check_connect_inprogress_on_poll(items[i]->handle, pal_ret_events[i]);

Wouldn't the pal_ret_events[i] check be clearer and simpler if it was here? This is the place which checks the conditions when to re-check the state anyways, and now this checking is split between two places and you need an additional argument in check_connect_inprogress_on_poll().


libos/src/sys/libos_poll.c line 216 at r5 (raw file):

            fds[i].revents |= fds[i].events & (POLLOUT | POLLWRNORM);

        if (libos_handles[i]->type == TYPE_SOCK) {

ditto


libos/src/sys/libos_socket.c line 96 at r5 (raw file):

     * SOCK_CONNECTED state and clear SO_ERROR.
     *
     * This is only relevant if POLLOUT event was requested. See `man 2 connect`, EINPROGRESS case.

Just to not forget: if we decide to wait for more, then this will also need a change (and it's easy to miss).


libos/src/sys/libos_socket.c line 114 at r5 (raw file):

    if (sock->state != SOCK_CONNECTING) {
        /* theoretically, another thread could be doing another select/poll on this socket and
         * modify the state; we don't support this unlikely case */

Why not support it? Isn't it just unlock+return here? (and one change below, see my comment)


libos/src/sys/libos_socket.c line 117 at r5 (raw file):

        BUG();
    }
    if (pal_ret_events & (PAL_WAIT_ERROR | PAL_WAIT_HANG_UP)) {

Could you explain what's the logic behind this IF? Why does what we waited for influence sock->last_error?


libos/src/sys/libos_socket.c line 121 at r5 (raw file):

    } else {
        sock->last_error = 0;
        __atomic_store_n(&sock->connecting_in_progress, false, __ATOMIC_RELEASE);

To fix concurrency I think we should move it to be the last step, so the fast-path never returns while the state is only half-updated to say it's connected. Not really dangerous because we have all of this under the lock, but IMO seems like a saner/safer version.


libos/src/sys/libos_socket.c line 259 at r5 (raw file):

    bool inprogress;
    ret = sock2->ops->connect(handle2, &addr, sizeof(addr), &inprogress);
    assert(inprogress == false);

inprogress is initialized only on success.


pal/include/pal/pal.h line 599 at r5 (raw file):

 *                             Can be NULL, to ignore the result.
 * \param[out] out_inprogress  On success, returns true in special case of an in-progress connection
 *                             on non-blocking socket.

Suggestion:

on a non-blocking socket.

pal/src/host/linux-sgx/enclave_ocalls.h line 66 at r5 (raw file):

int nonblocking

Why not bool?

Copy link
Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 17 of 27 files reviewed, 11 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski and @mkow)


libos/src/net/ip.c line 250 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

I think I forgot it, why do we have this weird way of disconnecting instead of just having PalSocketDisconnect() with simpler arguments?

This was implemented by @boryspoplawski : #579

I don't see any particular comments from my (or your) side on this.

I think Borys just followed the POSIX standard in this regard, see man 2 connect, search for AF_UNSPEC.


libos/src/sys/libos_epoll.c line 676 at r4 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

I'm not sure if this implies that this is the only way of checking it. What if someone connects and then polls for read, because in that specific protocol it's the server which sends the data first?

Done. You are right, I verified manually (using the test I added in this PR) that the native behavior of Linux works even on reads (POLLIN/EPOLLIN). I added a corresponding change to my test, and updated the logic in LibOS.


libos/src/sys/libos_epoll.c line 678 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Wouldn't the pal_ret_events[i] check be clearer and simpler if it was here? This is the place which checks the conditions when to re-check the state anyways, and now this checking is split between two places and you need an additional argument in check_connect_inprogress_on_poll().

Done.

Note that we still need an additional argument, because we need to distinguish between success and error. I modified the argument though to be a boolean.


libos/src/sys/libos_poll.c line 216 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto

Done.


libos/src/sys/libos_socket.c line 96 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Just to not forget: if we decide to wait for more, then this will also need a change (and it's easy to miss).

Done.


libos/src/sys/libos_socket.c line 114 at r5 (raw file):
Done, you're right.

(and one change below, see my comment)

Why do you say that? That change seems to be irrelevant.


libos/src/sys/libos_socket.c line 117 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Could you explain what's the logic behind this IF? Why does what we waited for influence sock->last_error?

The man page describes this:

EINPROGRESS
The socket is nonblocking and the connection cannot be
completed immediately. (UNIX domain sockets failed with
EAGAIN instead.) It is possible to select(2) or poll(2)
for completion by selecting the socket for writing. After
select(2) indicates writability, use getsockopt(2) to read
the SO_ERROR option at level SOL_SOCKET to determine
whether connect() completed successfully (SO_ERROR is
zero) or unsuccessfully (SO_ERROR is one of the usual
error codes listed here, explaining the reason for the
failure)
.

In other words, if you are polling on an in-progress-connecting socket, and it returns POLLHUP or POLLERR (these are translated into PAL versions PAL_WAIT_HANG_UP and PAL_WAIT_ERROR), you can perform getsockopt(socket_fd, SO_ERROR) and you must get the error code (ECONNREFUSED in this case). See also the test that I added in this PR.


libos/src/sys/libos_socket.c line 121 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

To fix concurrency I think we should move it to be the last step, so the fast-path never returns while the state is only half-updated to say it's connected. Not really dangerous because we have all of this under the lock, but IMO seems like a saner/safer version.

Done.


libos/src/sys/libos_socket.c line 259 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

inprogress is initialized only on success.

Done.

I first wanted to do a more complicated assert(ret < 0 || inprogress == false), but then decide to just remove this assert, as this weak assert didn't deserve too much complexity. If you feel that the complicated assert should stay, I can change.


pal/include/pal/pal.h line 599 at r5 (raw file):

 *                             Can be NULL, to ignore the result.
 * \param[out] out_inprogress  On success, returns true in special case of an in-progress connection
 *                             on non-blocking socket.

Done.


pal/src/host/linux-sgx/enclave_ocalls.h line 66 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…
int nonblocking

Why not bool?

Done.

Initially I had int because this header file didn't use bool before my change. But I agree, there is no reason not to have a bool. And maybe we'll fix all other arguments of all other functions in this file one day.

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 10 of 10 files at r6, all commit messages.
Reviewable status: all files reviewed, all discussions resolved, "fixup! " found in commit messages' one-liners


libos/src/net/ip.c line 250 at r5 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

This was implemented by @boryspoplawski : #579

I don't see any particular comments from my (or your) side on this.

I think Borys just followed the POSIX standard in this regard, see man 2 connect, search for AF_UNSPEC.

Weird, I'm pretty sure we had a discussion about this, but I don't remember the details... Anyways, it's not for this PR. But I'd change it, it's really weird.

And POSIX is not a good argument, it contains a lot of terrible APIs. But yeah, may explain the origin of this construction.


libos/src/sys/libos_socket.c line 114 at r5 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Done, you're right.

(and one change below, see my comment)

Why do you say that? That change seems to be irrelevant.

When I wrote this comment I still thought it's required, but then I noticed that it's also ok without it and forgot to update.


pal/src/host/linux-sgx/enclave_ocalls.h line 66 at r5 (raw file):

this header file didn't use bool before my change

Hmm, it did? E.g. in ocall_get_quote(). Not too many places (2?), but still :)

Copy link
Contributor

@kailun-qin kailun-qin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 17 of 23 files at r5, 10 of 10 files at r6, all commit messages.
Reviewable status: all files reviewed, all discussions resolved, "fixup! " found in commit messages' one-liners

@dimakuv dimakuv force-pushed the dimakuv/support-einprogress branch from 524a4b6 to a2e7d95 Compare November 23, 2023 19:11
kailun-qin
kailun-qin previously approved these changes Nov 24, 2023
Copy link
Contributor

@kailun-qin kailun-qin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved

mkow
mkow previously approved these changes Nov 24, 2023
Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved


libos/src/sys/libos_socket.c line 98 at r7 (raw file):

     * the common case of an already-connected socket doing recv/send.
     */
     assert(handle->type == TYPE_SOCK);

This got indented by 5 spaces. Not blocking as the CI is very slow this week, but if you want then please fix. If not, then I'll add it to my "typo list" and fix someday in a single commit with other trivial stuff.

Previously, Gramine transformed `connect()` of non-blocking sockets into
a blocking operation and thus never returned -EINPROGRESS. This led to
the connect operation being very slow (waiting for a host timeout) if a
remote peer is unresponsive.

This commit fixes this and adds a LibOS regression test (4 variants to
test poll/epoll and responsive/unresponsive peer).

Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii@intel.com>
@dimakuv dimakuv dismissed stale reviews from mkow and kailun-qin via 272c889 November 24, 2023 07:55
@dimakuv dimakuv force-pushed the dimakuv/support-einprogress branch from a2e7d95 to 272c889 Compare November 24, 2023 07:55
Copy link
Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 26 of 27 files reviewed, all discussions resolved, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @kailun-qin and @mkow)


libos/src/sys/libos_socket.c line 98 at r7 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

This got indented by 5 spaces. Not blocking as the CI is very slow this week, but if you want then please fix. If not, then I'll add it to my "typo list" and fix someday in a single commit with other trivial stuff.

Done. It's ok, we can wait for CI.

Copy link
Contributor

@kailun-qin kailun-qin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r8, all commit messages.
Reviewable status: all files reviewed, all discussions resolved, not enough approvals from different teams (1 more required, approved so far: Intel)

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r8, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved

@dimakuv dimakuv merged commit 272c889 into master Nov 24, 2023
17 of 18 checks passed
@dimakuv dimakuv deleted the dimakuv/support-einprogress branch November 24, 2023 11:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[LibOS,PAL] Gramine doesn't support EINPROGRESS (non-blocking sockets connect)
4 participants