Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/lwip: fix netdev concurrency issues #18479

Merged
merged 3 commits into from
Aug 23, 2022

Conversation

jia200x
Copy link
Member

@jia200x jia200x commented Aug 19, 2022

Contribution description

This PR fixes a concurrency issue in LWIP, as a result of calling netdev functions from different context.
Traditionally our netdev driver model assumes implicitly that all netdev functions run on the same thread. As it is now, LWIP calls all send functions from the caller thread (e.g main), but processes ISR on a higher priority thread. This not only creates concurrency issues in existing devices, but renders other modules such as the SubMAC unusable...

To (temporally) solve this, I implemented some recursive mutex lock functions to acquire the device before calling any internal operation. In the future it would be better to schedule all transmissions from a single thread.

Although this won't make the current situation worse, I'm assuming LWIP uses mutex internally to prevent concurrency issues between LWIP functions called from different context. If this doesn't hold, we will need to proceed with the single thread solution.

EDIT: I also adapted the netdev return calls to the newer API. Returning 0 is not considered an error anymore (which is what netdev_ieee802154_submac does).

Testing procedure

Test that LWIP still works as expected (e.g tests/lwip).

Issues/PRs references

Found while porting the KW2XRF to LWIP #18465

@jia200x jia200x requested a review from miri64 as a code owner August 19, 2022 17:54
@github-actions github-actions bot added Area: network Area: Networking Area: pkg Area: External package ports labels Aug 19, 2022
@jia200x jia200x requested review from benpicco and maribu August 19, 2022 17:54
@jia200x
Copy link
Member Author

jia200x commented Aug 19, 2022

Although this won't make the current situation worse, I'm assuming LWIP uses mutex internally to prevent concurrency issues between LWIP functions called from different context. If this doesn't hold, we will need to proceed with the single thread solution.

@miri64 does this assumption holds?

@maribu
Copy link
Member

maribu commented Aug 20, 2022

lwIP internally synchronizes all acceses via a global lock and all internal functions assert that the lock is taken when they are called. I guess that lock is implemented via a mutex in RIOT, but I haven dived deep enough into the implementation to say be sure.

@jia200x jia200x added CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR and removed CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR labels Aug 20, 2022
@chrysn chrysn added CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR and removed CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR labels Aug 21, 2022
@benpicco benpicco requested a review from yarrick August 22, 2022 18:09
Copy link
Member

@maribu maribu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK. Code looks good and I trust your testing.

I believe this can fix a lot of sporadic and hard to reproduce issues.

@@ -65,6 +65,7 @@ static err_t _ieee802154_link_output(struct netif *netif, struct pbuf *p);
static void _event_cb(netdev_t *dev, netdev_event_t event);
static void *_event_loop(void *arg);


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

There already is an empty above

@chrysn chrysn added the Process: needs backport Integration Process: The PR is required to be backported to a release or feature branch label Aug 23, 2022
@chrysn
Copy link
Member

chrysn commented Aug 23, 2022

This has been requested for backporting. I don't see the big picture of its complexity yet, but am marking it for "needs backport" if only to make sure I'll have it in the "known issues" section.

@jia200x jia200x force-pushed the pr/fix_lwip_concurrency branch from 11b7d63 to a2bf203 Compare August 23, 2022 09:57
@jia200x
Copy link
Member Author

jia200x commented Aug 23, 2022

amended and squashed direclty

@maribu maribu enabled auto-merge August 23, 2022 12:07
@maribu maribu merged commit 0d555de into RIOT-OS:master Aug 23, 2022
@maribu maribu added this to the Release 2022.10 milestone Oct 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: network Area: Networking Area: pkg Area: External package ports CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Process: needs backport Integration Process: The PR is required to be backported to a release or feature branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants