-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg/semtech-loramac: fix deadlock when sending unconfirmed messages #11535
pkg/semtech-loramac: fix deadlock when sending unconfirmed messages #11535
Conversation
In uncnf tx mode, when no data is received from the server there's no easy way to be notified from the MAC that the TX is complete. Before there was a hack in the mcps_confirm event but this triggers a deadlock in the case where data are effectively received. To workaround this, we use an xtimer to send a TX done message with an offset greater than the RX windows: this ensures that normal RX data are received by the called thread and the caller thread doesn't remain locked when no data is received after an unconfirmed message is sent
Thanks for tackling this.
Why? Also, "Indication right after Confirm" is only valid for class A, so we would need a message queue anyway for class B and class C. What do you think about:
This way we could also send other information to the caller (Link Check, etc) and the caller can't block or de-synchronize the MAC layer. |
One way you could do it is to read But in fact, the real problem is to use For example a single message (even Also, I noticed in LoRaMAC-node source code that there is a case where One way to solve all of these problems is to use |
Thanks for your help @jia200x @ParksProjets. Your suggestion of only use mcps confirm event to confirm the TX is interesting, that would indeed make this more robust. For receiving messages, I prefer the thread approach over the callback function. See #11541 that provides this change. |
closing in favor of #11541 |
Contribution description
This PR is attempt to fix #11530. Even if the problem was well explained in #11530, I had a hard time figuring out what was going on with the messages exchanged between the MAC and the caller thread in the loramac adaption code. I end up to the conclusion that one TX done message should'nt be sent from the mcps_confirm event callback, as it was received by the semtech-loramac_recv function. And in the case of data effectively received, the RX message was blocking the event loop. On the next send, calling the send function in the mac, which is using a msg_send_receive, was causing the deadlock.
The proposed solution in this PR is to use a background timer to send a TX done message to the MAC thread after an amount time enough to ensure received data are processed in
semtech-loramac_recv
. This message, is not sent anymore from the mcps_confirm event. The RX message is then correctly retrieved by the caller thread. And no deadlock on future sends.For me, it's working quite reliably now with this.
I tried other solutions, like setting a message queue to the caller thread but I found this was a workaround not really fixing the original issue.
Maybe there's a better fix that I didn't think of, suggestions are welcome.
@ParksProjets would you like to have a look ang give some feedback ?
Testing procedure
Repeat the procedure described in #11530, the deadlock doesn't occur anymore with this PR
Issues/PRs references
fixes #11530