Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/semtech_loramac: deadlock with UNCONFIRMED messages #11530

Closed
ParksProjets opened this issue May 15, 2019 · 1 comment · Fixed by #11541
Closed

pkg/semtech_loramac: deadlock with UNCONFIRMED messages #11530

ParksProjets opened this issue May 15, 2019 · 1 comment · Fixed by #11541
Assignees
Labels
Area: LoRa Area: LoRa radio support Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)

Comments

@ParksProjets
Copy link
Contributor

ParksProjets commented May 15, 2019

Description

If you are unlucky, semtech_loramac package can produce a deadlock when you send UNCONFIRMED messages from a thread that doesn't have a message queue.

The simplest way to show why is to explain how semtech_loramac package is connected to LoRaMAC-node library. This library has 3 interfaces: MLME, MCPS and MIB. When you send an application uplink you use the MCPS interface. This interface is asynchronous and can trigger two events: Confirm and Indication.

LoRMAC architecture

When you send an uplink you first call semtech_loramac_send and then semtech_loramac_recv to wait for the transmission to finish. Currently semtech_loramac handle the two LoRaMAC-node events as following:

  • when an UNCONFIRMED message is sent, semtech_loramac_recv returns on Confirm event.

  • when a CONFIRMED message is sent, Confirm event is ignored. As we expect an Ack from the LoRa server, semtech_loramac_recv will return on Indication event (when this Ack is received). Waiting for Indication event allows also to retrieve data from downlinks.

Function semtech_loramac_recv is waiting using the message module on the caller thread. When Confirm / Indication event is received, loramac event loop sends a MSG_TYPE_LORAMAC_TX_STATUS message to the caller thread, so it can returns from semtech_loramac_recv. If this thread doesn't have a message queue, sending the message blocks the event loops until the caller thread receives the message.

The problem is that when you transmit an UNCONFIRMED message you are also waiting for the 2 RX windows. If the LoRa server is sending a downlink to the device, Indication event will be fired but loramac event loop has already sent MSG_TYPE_LORAMAC_TX_STATUS on Confirm event. It will send another MSG_TYPE_LORAMAC_TX_STATUS that will block the event loop forever because the caller thread is out of semtech_loramac_recv. Even worse, if now the caller thread calls semtech_loramac_send, a deadlock occurs.

With the current architecture of semtech_loramac package, it is quite difficult to fix this issue. The simplest way would be to ignore Indication event when an UNCONFIRMED message was sent, but data from downlink messages would be lost.

Steps to reproduce the issue

Send an UNCONFIRMED message from a thread that doesn't have an message queue to a LoRa server that have a downlink message to transmit to the end device.
In fact if you are lucky LoRa server can send a downlink by itself if it has MAC parameters to transmit to end device (for example NewChannelReq).

The image below describes what happened when a downlink is received after an UNCONFIRMED uplink.

Deadlock

The code that was used to produce this example is the following:

loramac.cnf = LORAMAC_TX_UNCNF;
uint8_t ret = semtech_loramac_send(&loramac, (uint8_t *)message, strlen(message));

ret = semtech_loramac_recv(&loramac);
printf("Received: %d\n", ret);

xtimer_sleep(10);
printf("Send second message\n");
ret = semtech_loramac_send(&loramac, (uint8_t *)message, strlen(message));
@MrKevinWeiss MrKevinWeiss added Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) Area: LoRa Area: LoRa radio support labels May 16, 2019
@jia200x
Copy link
Member

jia200x commented May 16, 2019

Thanks for pointing this out. Reading your detailed description, this can indeed happen. Maybe this is the issue @smlng was experiencing.

One way to solve this is to make send/recv asynchronous, so the application thread only receives downlink messages and there's no expected synchronization between the application thread and the MAC layer thread. That's what I'm doing in GNRC LoRaWAN to avoid deadlocks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: LoRa Area: LoRa radio support Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
3 participants