pkg/lwip: fix race in async sock API with event #21093
Draft
+21
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Contribution description
In TCP server mode, the
sock_tcp_t
sockets are managed by the network stack and can be reused if a previous connection is no longer in used. However, an event may still be posted in the event queue when the socket is reused. Wiping it will result in thenext
pointer in that event to be NULL, which will cause the event handler fetching that event to crash.This adds an
event_cancel()
at two places:The former catches issues in server mode e.g. when a connect has been closed (e.g. due to timeout) and is reused before a pending event (e.g. a timeout event) has been processed.
The letter may be an issue on client side. E.g. when
sock_tcp_t
was allocated on stack and goes out of scope aftersock_tcp_disconnect
but before the event handler was run.Testing procedure
I was able to see this bug in the wild in my work-in-progress PR that adds CoAP over TCP to nanocoap. But that code is not in a well shape now and has other issues.
A test that puts an async TCP server with lots of connections through the paces would likely be a good demonstrator.
Issues/PRs references
None