Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: sockets: Add locking to receive callback #35466

Merged

Conversation

jukkar
Copy link
Member

@jukkar jukkar commented May 19, 2021

Fix a regression when application is waiting data but does
not notice that because socket layer is not woken up.

This could happen because application was waiting condition
variable but the signal to wake the condvar came before the
wait started. Normally if there is constant flow of incoming
data to the socket, the signal would be given later. But if
the peer is waiting that Zephyr replies, there might be a
timeout at peer.

The solution is to add locking in socket receive callback so
that we only signal the condition variable after we have made
sure that the condition variable is actually waiting the data.

Fixes #34964

Signed-off-by: Jukka Rissanen jukka.rissanen@linux.intel.com

@jukkar
Copy link
Member Author

jukkar commented May 19, 2021

@pfalcon I did testing with dumb_http_server and ab and did not see the error any more. Can you give it a run too as you seemed to have a good test setup for this.

@jukkar jukkar added this to the v2.6.0 milestone May 19, 2021
@jukkar jukkar force-pushed the bug-34964-socket-locking-regression branch from cc5a5f5 to 09ff600 Compare May 20, 2021 10:44
@zephyrbot zephyrbot added the area: Sockets Networking sockets label May 20, 2021
@zephyrbot zephyrbot requested a review from mengxianglinx May 20, 2021 12:17
jukkar added 3 commits May 21, 2021 10:53
Fix a regression when application is waiting data but does
not notice that because socket layer is not woken up.

This could happen because application was waiting condition
variable but the signal to wake the condvar came before the
wait started. Normally if there is constant flow of incoming
data to the socket, the signal would be given later. But if
the peer is waiting that Zephyr replies, there might be a
timeout at peer.

The solution is to add locking in socket receive callback so
that we only signal the condition variable after we have made
sure that the condition variable is actually waiting the data.

Fixes zephyrproject-rtos#34964

Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
If we are waiting all the data i.e., the MSG_WAITALL flag is set,
then if we have not yet received all the data at the end of the
receive loop. We must use the condition variable to get the signal
when the data is ready to be received. Otherwise the receive loop
will not release the socket lock and receive_cb will not be able
to indicate that data is received.

Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
The k_fifo_ prefix is meant for kernel API functions, and
not to our socket helper. So remove the k_ prefix in order
to avoid confusion.

Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
@jukkar jukkar force-pushed the bug-34964-socket-locking-regression branch from 09ff600 to 410c799 Compare May 21, 2021 07:55
@jukkar jukkar requested a review from mniestroj May 24, 2021 08:56
@jukkar jukkar added bug The issue is a bug, or the PR is fixing a bug Regression Something, which was working, does not anymore labels May 24, 2021
@nashif nashif merged commit c53d483 into zephyrproject-rtos:main May 25, 2021
@pfalcon
Copy link
Contributor

pfalcon commented May 28, 2021

@jukkar

I did testing with dumb_http_server and ab and did not see the error any more. Can you give it a run too as you seemed to have a good test setup for this.

Sorry for the delay with response, I was on vacation. Confirming that that this patch resolved issue with dumb_http_server/qemu_x86 + ab for me. Also retested with big_http_download (as the test for "opposite" transfer direction), and it works as expected.

Thanks for investigating this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Networking area: Sockets Networking sockets bug The issue is a bug, or the PR is fixing a bug Regression Something, which was working, does not anymore
Projects
None yet
6 participants