Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sockets_tls: receiving data on offloaded socket before handshake causes pollin | pollerr and failed recvfrom (SARA-R4) #35789

Closed
emillindq opened this issue May 28, 2021 · 6 comments · Fixed by #36167
Assignees
Labels
area: Networking bug The issue is a bug, or the PR is fixing a bug priority: low Low impact/importance bug

Comments

@emillindq
Copy link
Contributor

Describe the bug
This bug is a variant of #33330 but with offloaded socket behind the DTLS socket. The fix for mentioned issue doesn't apply for offloaded sockets.

The scenario is as follows:

  1. A DTLS socket is created and bind() is called with an offloaded socket behind it
  2. A packet comes on the offloaded socket before any other calls has been made (see picture, SOCR->bind() UUSORF->data waiting on socket)
    bugsara
  3. when poll() is called, return code is 1 with revents=9 (POLLIN | POLLERR).
  4. Since POLLIN is set, recvfrom is called but since no data exist in mbedTLS it returns a failure

The reason as to why mentioned fix doesn't work for offloaded sockets appears to be that for native sockets ioctl is implemented while sara-r4 modem's offload_ioctl just returns -EXDEV on POLL_PREPARE request, whereby we just restore the events and go on like nothing happened (aka. patch is ignored).

if ((pfd->events & ZSOCK_POLLIN) && (ctx->type == SOCK_DGRAM) &&
(ctx->options.role == MBEDTLS_SSL_IS_CLIENT) &&
!is_handshake_complete(ctx)) {
(*pev)->obj = &ctx->tls_established;
(*pev)->type = K_POLL_TYPE_SEM_AVAILABLE;
(*pev)->mode = K_POLL_MODE_NOTIFY_ONLY;
(*pev)->state = K_POLL_STATE_NOT_READY;
(*pev)++;
/* Since k_poll_event is configured by the TLS layer in this
* case, do not forward ZSOCK_POLLIN to the underlying socket.
*/
pfd->events &= ~ZSOCK_POLLIN;
}
obj = z_get_fd_obj_and_vtable(
ctx->sock, (const struct fd_op_vtable **)&vtable, &lock);
if (obj == NULL) {
ret = -EBADF;
goto exit;
}
(void)k_mutex_lock(lock, K_FOREVER);
ret = z_fdtable_call_ioctl(vtable, obj, ZFD_IOCTL_POLL_PREPARE,
pfd, pev, pev_end);
if (ret != 0) {
goto exit;
}
if (pfd->events & ZSOCK_POLLIN) {
ret = ztls_poll_prepare_pollin(ctx);
}
exit:
/* Restore original events. */
pfd->events = events;
k_mutex_unlock(lock);
return ret;
}

To Reproduce
Steps to reproduce the behavior:

  1. Hard to reproduce, best inject a UUSORF just after creating the socket to simulate, for example using a debugger.

Expected behavior
poll() on DTLS socket where handshake not complete should always return nothing.

Impact
Possible network hang in rare cases when network is inited and deinited often. I guess this is not super serious, but I've seen it happen a few times during our unit tests when we stress-test it by bringing our network stack up&down to see its behavior. All special cases are not handled

Logs and console output
See above

Environment (please complete the following information):

  • STM32H747I DISCO devkit with Sparkfun SARA-R4 shield
  • zephyr 9867ac2
@emillindq emillindq added the bug The issue is a bug, or the PR is fixing a bug label May 28, 2021
@rlubos
Copy link
Contributor

rlubos commented May 31, 2021

Thank you Emil for reporting the issue.

First of all, let me shortly explain why does the SARA-R4 returns -EXDEV for the POLL_PREPARE. In the design of the socket offloading, we've assumed that the entire poll() function call would be offloaded. Therefore offloaded sockets don't use the POLL_PREPARE/POLL_UPDATE mechanism that natives socets do, but rather indicate the offloading of the entire poll() function by returning -EXDEV for the POLL_PREPARE request, which is then followed by POLL_OFFLOAD ioctl call.

For that reason the fix implemented in POLL_PREPARE/UPDATE handlers doesn't fix the behaviour with the offloaded sockets. And to be honest it seems pretty tough for me to get fixed properly, as the original solution assumed the creation of a separate kernel object to monitor (the handshake semaphore), which is not the case for the offloadng sockets (as there is no underlying k_poll() call).

So for offloaded underlying sockets, the poll() functionality for TLS is implemented in ztls_poll_offload(). Currently as you've reported, the function would return if any data is received, regardless of the handshake state. That shouldn't be the case, and we should act differently if the handshake is not complete (for the DTLS clients at least, just as in the case with native sockets).
We cannot simply ignore the data and retry the internal poll() as it'd lead to a busy loop we had with native sockets.
The most straightforward approach could be to simply drop the data that arrives before the handshake, but I'm afraid that could lead to other issues, if the handshake is currently being executed by different thread, and the polling thread has a higher priority.
I see two other potential solutions here:

  1. Take a "dumb" approach, and put the polling thread to sleep for some amount of time, if the handhake is not complete yet. While not ideal, it should be pretty easy to implement and should serve its purpose - allow other thread to consume the data during the handshake and prevent busy-looping if the data arrives before the hanshake starts. The downside is a possible delay (up to the sleep period) for the first poll after the handshake finishes (IMO acceptable tradeoff).
  2. Feed the data to the mbedtls_ssl_handshake() function (from within the poll()) - this could work, but we'll end up with a possiblity to process the handshake from two different threads - I think that the mutex protection in the sockets recently should be enough protection though...

@emillindq Any thoughts, which sounds like a better solution for you?

@galak galak added the priority: low Low impact/importance bug label Jun 1, 2021
@emillindq
Copy link
Contributor Author

@rlubos firstly, thank you for taking a look at this!

It is a juicy problem indeed. At the core of the problem is that the data is left there and nobody feeds it anywhere. Normally in the DTLS client case, no data should come until somebody has called sendto since that initiates the client handshake and it is firstly then that mbedTLS stack calls recv function.

Would it be possible to check if the handshake has been started? If it has, never mind because the data will be read by mbedTLS, if it hasn't then just discard all data in the poll call and continue as usual.

If that's not an option, I reckon point 2 would be the best. At long as the data is fed anywhere and discarded it should solve the problem!

@rlubos
Copy link
Contributor

rlubos commented Jun 2, 2021

Would it be possible to check if the handshake has been started? If it has, never mind because the data will be read by mbedTLS, if it hasn't then just discard all data in the poll call and continue as usual.

I think this should be fairly simple to implement. We have tls_mbedtls_handshake() function, which is actually the only entry point for the handshake process. We could set some flag, handshake_in_progress on the TLS context upon entry/exit of this function. Then, in the offloaded poll, either discard the data (if no connection is established and handshake is not in progress), or ignore it if handhshake is in progress (with some sleep probably to prevent busy-looping in a high prio thread). Otherwise (handshake complete) just feed the data to mbedTLS to verify if there's new application data to process.

Actually, seeing it like this, I think this is the way to go with the offloaded sockets. Do you agree?

@emillindq
Copy link
Contributor Author

emillindq commented Jun 3, 2021

Actually, seeing it like this, I think this is the way to go with the offloaded sockets. Do you agree?

Yes I think this is the cleanest solution! Is it perhaps possible to implement this in a way so we can revert #33474 ? It would be nice if one fix covers both offloaded and native poll

Will you make a PR or should I take a look at it?

@rlubos
Copy link
Contributor

rlubos commented Jun 7, 2021

Sorry for the delay, I'll try to propose something.

@rlubos
Copy link
Contributor

rlubos commented Jun 11, 2021

@emillindq I've submitted a draft PR with the proposal, would you mind give it some testing? We can discuss further, but for now I've implemented the alternative solution only for offloaded sockets and left native stack as is - IMO monitoring the tls_established semaphore is a more roboust solution and I'd personally prefer to leave is as it is for the native socket implmenetation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Networking bug The issue is a bug, or the PR is fixing a bug priority: low Low impact/importance bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants