-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg/lwip: fix netdev concurrency issues #18479
Conversation
@miri64 does this assumption holds? |
lwIP internally synchronizes all acceses via a global lock and all internal functions assert that the lock is taken when they are called. I guess that lock is implemented via a mutex in RIOT, but I haven dived deep enough into the implementation to say be sure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK. Code looks good and I trust your testing.
I believe this can fix a lot of sporadic and hard to reproduce issues.
@@ -65,6 +65,7 @@ static err_t _ieee802154_link_output(struct netif *netif, struct pbuf *p); | |||
static void _event_cb(netdev_t *dev, netdev_event_t event); | |||
static void *_event_loop(void *arg); | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There already is an empty above
This has been requested for backporting. I don't see the big picture of its complexity yet, but am marking it for "needs backport" if only to make sure I'll have it in the "known issues" section. |
11b7d63
to
a2bf203
Compare
amended and squashed direclty |
Contribution description
This PR fixes a concurrency issue in LWIP, as a result of calling
netdev
functions from different context.Traditionally our netdev driver model assumes implicitly that all
netdev
functions run on the same thread. As it is now, LWIP calls all send functions from the caller thread (e.gmain
), but processes ISR on a higher priority thread. This not only creates concurrency issues in existing devices, but renders other modules such as the SubMAC unusable...To (temporally) solve this, I implemented some recursive mutex lock functions to acquire the device before calling any internal operation. In the future it would be better to schedule all transmissions from a single thread.
Although this won't make the current situation worse, I'm assuming LWIP uses mutex internally to prevent concurrency issues between LWIP functions called from different context. If this doesn't hold, we will need to proceed with the single thread solution.
EDIT: I also adapted the
netdev
return calls to the newer API. Returning 0 is not considered an error anymore (which is whatnetdev_ieee802154_submac
does).Testing procedure
Test that LWIP still works as expected (e.g
tests/lwip
).Issues/PRs references
Found while porting the KW2XRF to LWIP #18465