RDM: The 802.15.4 Radio HAL #13943

jia200x · 2020-04-24T16:26:07Z

Contribution description

This RDM introduces a new Radio Hardware Abstraction Layer for 802.15.4 compliant radios.

Abstract:

This memo describes a Hardware Abstraction Layer (HAL) for radios compliant
with the IEEE802.15.4 standard. The work follows a technology-specific approach
to provide well defined hardware access that allows to implement agnostic
IEEE802.15.4 PHY and MAC layers on top. Additionally, the new HAL enables
integration of network stacks that require direct access to the radio.

Implementation

~~The proposed API, 2 PoC reference implementations (nrf802154 and at86rf23x) and a small test application were implemented in this branch.~~

~~Radio HAL API~~
~~nrf802154 Radio HAL PoC implementation~~
~~at86rf2xx Radio HAL PoC implemention~~
~~Test application~~ ~~(the test application is already added as a submodule under $(RIOTBASE)/tests)~~

See #14655

Memory efficiency

EDIT: BASED ON OLD TEST! THIS IS OUTDATED NOW

Here are some preliminary memory usage results against netdev. Note that the Radio HAL version consumes less ROM because most getters are not needed (e.g channel, tx_power, addresses are stored by the PHY/MAC, so it's not necessary to ask the device for Information Base). Also, a compliant Radio HAL requires less instructions (no unnecessary state changes, sleep/wake-up routines everywhere, etc).

samr21-xpro

`hal_test` application with netdev using base Radio HAL base commit (3e7ddcd):

Building application "tests_hal_test" for "samr21-xpro" with MCU "samd21".

   text	  data	   bss	   dec	   hex	filename
  20916	   504	  4056	 25476	  6384

`hal_test` application with Radio HAL:

Building application "tests_hal_test" for "samr21-xpro" with MCU "samd21".

   text	  data	   bss	   dec	   hex	filename
  18544	   504	  4028	 23076	  5a24

nrf802154

`hal_test` application with netdev using base Radio HAL base commit (3e7ddcd):

Building application "tests_hal_test" for "nrf52840dk" with MCU "nrf52".

   text	  data	   bss	   dec	   hex	filename
  16396	   532	  4108	 21036	  522c

`hal_test` application with Radio HAL:

Building application "tests_hal_test" for "nrf52840dk" with MCU "nrf52".

   text	  data	   bss	   dec	   hex	filename
  14888	   508	  4084	 19480	  4c18

Testing procedure

Read the RDM
Explore the HAL PoC implementations (in [POC] ieee802154_hal: tester PR for #14371 #14655)

Issues/PRs references

#13771
#14655
#14371

benpicco · 2020-04-24T18:02:28Z

doc/memos/rdm-draft-alamos-ieee802154-radio-hal.md

+     * If the radio is still transmitting, it should block until it is safe to
+     * write to the frame buffer again.


Can we please return -EBUSY here and make the upper layer block?
Blocking in the radio thread (while still servicing radio events) is ugly and error-prone.

Yes we can. Ideally a MAC layer would synchronize access to the radio, so this shouldn't happen. But it doesn't hurt to return -EBUSY here

I strongly agree with @benpicco. E.g. the at86rf2xx and the cc110x driver have special quirks to implement this behavior that is both ugly and complex. But without those quirks, a simple ping6 with a size big enough to cause layer 2 fragmentation would currently result in the driver ignoring the subsequent frames while busy sending the first fragment. And the upper layers don't retry, resulting in the whole packet being lost.

With most use cases only sending IPv6 packets fitting into a single frame, this issue is also often overlooked.

Yes, I agree that the bottleneck should not be here.
But even with an -EBUSY, the MAC layer on top should synchronize this anyway.
But I agree that this could improve a lot our current state while we don't have a full 802.15.4 MAC. So, I'm in for the change

Hmmm I remember the design decision behind this:

Most radios (if not all) allow to write packets into the framebuffer while the radio is still transmitting. IIRC, the at86rf233 allows to write again into the framebuffer 192 us after the TX_START command. I agree that under no circumstances we should block during the whole TX procedure, but this delay is really short compared to the LIFS (640 us)

I've seen some radios that simply "sleep" for 192 us before trying to load the framebuffer again.

How to proceed? I can think of:

Drop this "load before TX done" feature and simply return -EBUSY if the radio is BUSY

Try to return -EBUSY if it's not possible to load the framebuffer (T<192us)

Block for 192 us

/** Check if the radio driver is currently receiving a packet /
int ( receiving_packet)(void);

I'm strongly against this. This interface will inherently result in TOC/TOU bugs.

Even if the driver accepts frames while e.g. RX or TX without ACK request is ongoing, transmit would then just set a flag to execute the TX as the next operation.

I'm not sure if I understand this correctly. But as @jia200x already pointed out, this function should have predictable timing behavior and very little delay to allow implementing TDMA(ish) MAC schemes.

In slotted MAC layers, the state change can be forced calling ieee802154_radio_set_trx_state(dev, TX_ON) directly.

Are you sure you really want to allow explicitly setting the TX state? The internal state machines of the hardware can be rather complex and hard to map reasonably to the HAL. Wouldn't it be better to just define that prepare() in addition to loading the buffer brings the device into a state in which starting a transmission can be triggered with as little delay as possible? So if some intermediate state is needed to put the device to before being able to transmit (e.g. some transceivers turn their frequency generator of in an idle state and need some time for the phase locked loop to settle), prepare() would just do so. And transmit() then does implicitly the state change to TX and back to IDLE/RX once done.

~~So only the following three states IMHO should be settable through the HAL:~~

OFF: Device is in the lowest possible power consumption mode. Switching to IDLE or RX can require some time. Neither prepare() nor transmit() are allowed to be called in this state.

IDLE: Device is not listing for incoming packets, but prepare() and transmit() are allowed to be called. Switching to RX might be faster than from OFF and more power might be consumed.

RX: Device is listening for packets. A call to prepare() is allowed and may implicitly change the state to an internal state appropriate for fast transmission. Thus, after prepare() the device might not be listening for incoming packets anymore until either the transmission is triggered or aborted.

~~I think exposing these three states can be implemented reasonably for every network device. And I don't see a use case for having explicit control over more states.~~

Update: I added a more detailed proposal of which states an abstract state machine representation of a Radio could have

Oh I didn't consider slotted MACs.
In that case it might be possible for prepare to succeed but transmit would return -EBUSY if the radio is still receiving.

But if the driver then generates en event to the upper layer when it switches back to idle, that upper layer can decide to do the transmit immediately - or wait for the next time slot.

That way the whole TX after RX and MAC access considerations could be kept out of the radio driver.

I'm strongly against this. This interface will inherently result in TOC/TOU bugs.

I agree, just threw some ideas without deep thinking :P

I'm not sure if I understand this correctly. But as @jia200x already pointed out, this function should have predictable timing behavior and very little delay to allow implementing TDMA(ish) MAC schemes.

Yes. The radio operation should be more or less predictable.

I'm still confused about the blocking/non-blocking thing here. I strongly agree that a non-blocking mechanism is needed for sending packets, but the right layer to handle that is the MAC. In fact, the MAC MUST be non-blocking because in most cases packets are delayed (TSCH, slotted mode, sending beacons over data, etc).

In practice, an upper layer would request the MAC layer to send a packet and the MAC might refuse it or send when it's the right time. The MAC layer would read packets from a queue and send them one by one.

The question is, do we need a non-blocking radio HAL to implement a a non-blocking MAC? If the bottleneck is in the radio, the MAC is doing something wrong.

I think the documentation above needs to be updated, no that the request/confirm paradigm was settled on.

Two additional remarks:

Can you replace the term packet by frame?

It would be nice to also state how long the driver can be sure "own" the buffer passed. An peripheral driver with DMA might not need to copy the iolist into an contiguous RAM buffer. (E.g the Ethernet driver of the STM32 fetches the frame via DMA, and the DMA happily accepts a linked list of DMA descriptors with each containing only segment of the frame. This concept is not specific to the Ethernet case, so peripheral radios might provide the same feature.) So if the driver would "own" the iolist until confirm_transmit() returns success or an error, this would allow zero-copy transmission. It would come with the price of extra care being taken by the upper layer, if the upper layer truly exploits the features of the async API.

I think the documentation above needs to be updated, no that the request/confirm paradigm was settled on.

I agree.

Can you replace the term packet by frame?

Yes. It was in my bucketlist, but forgot to update it.

It would be nice to also state how long the driver can be sure "own" the buffer passed. An peripheral driver with DMA might not need to copy the iolist into an contiguous RAM buffer. (E.g the Ethernet driver of the STM32 fetches the frame via DMA, and the DMA happily accepts a linked list of DMA descriptors with each containing only segment of the frame. This concept is not specific to the Ethernet case, so peripheral radios might provide the same feature.) So if the driver would "own" the iolist until confirm_transmit() returns success or an error, this would allow zero-copy transmission. It would come with the price of extra care being taken by the upper layer, if the upper layer truly exploits the features of the async API.

Good catch, I think this must be specified. I agree with "owning" the packet until confirm_transmt is returns success or error.

benpicco · 2020-04-24T18:05:23Z

doc/memos/rdm-draft-alamos-ieee802154-radio-hal.md

+     *            NULL if this information is not needed.
+     *
+     * @return number of bytes written in @p buffer (0 if @p buf == NULL)
+     * @return -ENOBUFS if the packet doesn't fit in @p


Will the packet be discarded here?

Frame is the correct term at layer 2. We should try carefully to get those terms correct to avoid confusion. (Sadly, RIOT's doc uses the term packet regardless of the layer at many places. I should grep for all usage of packet and verify or fix the term.)

Yes, the packet is always discarded

Actually I forgot something here (thanks for pointing it out @maribu ). All these packets are "Physical Service Data Unit" (PSDU) packets. I will update that

Edit: frames... I meant, frames

I will add a comment that indicates the frame is always discarded

PSDU = frame right? Lets decide for one of both. You should add it to the "Acronyms" or "Definitions" section then.

I guess PSDU is more explicit. (Frame can have other meanings in other contexts.) Also, PSDU sounds smarter :-)

benpicco · 2020-04-24T18:13:10Z

doc/memos/rdm-draft-alamos-ieee802154-radio-hal.md

+    /**
+     * @brief the device support the IEEE802.15.4 Sub GHz band
+     */
+    IEEE802154_CAP_SUB_GHZ,


IEEE802154_PHY_MODES could be another capability.

yes, this can be used to report available PHY modes too

actually, this list can/should be expanded. There are other capabilities that are not listed here but exist anyway (e.g LBT, CSMA_BACKOFF timers, etc)

The list can and should be extended as explained here. Personally, I hope that it won't get extended as much as netopt.

maybe we could try to limit the scope here. Usually, capabilities are mapped to hardware accelerations, features and certain conditions (e.g if the radio calculates the checksum, that's a CAP).

In practice, I've seen that this list doesn't explode (see here or here)

maribu · 2020-04-24T18:18:16Z

I expect similarities between the HALs of different network devices. Maybe it makes sense to do "object oriented" programming here. E.g. having a base "Network Device HAL" as base. (And even a "base struct" as driver in the implementation.) Something like this maybe:

                         +--------------------+
                         | Network Device HAL |
                         +--------------------+
                               |      |
                   +-----------+      +-------------+
                   |                                |
    +--------------------------+      +-----------------------------+
    | Wired Network Device HAL |      | Wireless Network Device HAL |-------------+
    +--------------------------+      +-----------------------------+             |
       |           |                     |             |        |                 |
+-----------+ +---------+       +--------------+ +----------+ +------------+ +------------+
| 802.3 HAL | | CAN HAL | ...   | 802.15.4 HAL | | LoRa HAL | | nrfmin HAL | | CC110x HAL | ...
+-----------+ +---------+       +--------------+ +----------+ +------------+ +------------+

And every child would provide all the features and API of all of its parents, (or at least return -ENOTSUP). The basic Network Device HAL would only provide means to send and receive frames. The Wired Network Device HAL could additional provide the info if a cable is plugged in. The Wireless Network Device HAL could additionally provide common features like the tuned in channel / frequency, the TX power, etc.

I believe that there are many use cases where the code does not care if what network device it talks to. It would be a pity, if that could would have to be re-written for every HAL. But not having a single "on-size-fits-all" HAL, but separate HALs to allow access to technology-specific features seems to be a very compelling idea - if compatibility between the HALs (at a level that makes sense) is provided.

jia200x · 2020-04-24T19:32:06Z

I expect similarities between the HALs of different network devices. Maybe it makes sense to do "object oriented" programming here. E.g. having a base "Network Device HAL" as base. (And even a "base struct" as driver in the implementation.) Something like this maybe:

Well, one of the reasons to implement the "ieee802154_radio_xxx" wrappers (not mentioned in the RDM but available in the API implementation) was to allow these kind of optimizations. You can re-arrange the function pointers without breaking the API.

Actually, I remember there were some offline comments about implementing "device" structures, which could include "Device Management" stuff (start, irq_handler, set_sleep, etc).

NB there are some classes of devices that might not require a HAL. There are several technologies that have only one device (LoRa, CC110x). We had some offline conversations with @miri64 and @PeterKietzmann , and we thought it's important to check which technologies require a HAL in the mid term.

E.g for LoRaWAN, it should be enough to call the sx127x driver directly since it's the only radio available in the market (patent reasons). All radios (RFM95, etc) inherit from the same chip. The "LoRaWAN devices" (e.g RN2483) are not radios , but FullMAC devices.

mcr · 2020-04-25T22:55:01Z

I expect similarities between the HALs of different network devices. Maybe it makes sense to do "object oriented" programming here. E.g. having a base "Network Device HAL" as base. (And even a "base struct" as driver in the implementation.) Something like this maybe:
                         +--------------------+
                         | Network Device HAL |
                         +--------------------+
                               |      |
                   +-----------+      +-------------+
                   |                                |
    +--------------------------+      +-----------------------------+
    | Wired Network Device HAL |      | Wireless Network Device HAL |-------------+
    +--------------------------+      +-----------------------------+             |
       |           |                     |             |        |                 |
+-----------+ +---------+       +--------------+ +----------+ +------------+ +------------+
| 802.3 HAL | | CAN HAL | ...   | 802.15.4 HAL | | LoRa HAL | | nrfmin HAL | | CC110x HAL | ...
+-----------+ +---------+       +--------------+ +----------+ +------------+ +------------+
And every child would provide all the features and API of all of its parents, (or at least return -ENOTSUP). The basic Network Device HAL would only provide means to send and receive frames. The Wired Network Device HAL could additional provide the info if a cable is plugged in. The Wireless Network Device HAL could additionally provide common features like the tuned in channel / frequency, the TX power, etc.

I believe that there are many use cases where the code does not care if what network device it talks to. It would be a pity, if that could would have to be re-written for every HAL. But not having a single "on-size-fits-all" HAL, but separate HALs to allow access to technology-specific features seems to be a very compelling idea - if compatibility between the HALs (at a level that makes sense) is provided.

I like your idea, but I am concerned about boiling the ocean here.

maribu · 2020-04-26T08:35:37Z

I like your idea, but I am concerned about boiling the ocean here.

Consider the diagram above as a vision where the changes should lead to. For now, only the following part would be designed and implemented:

     +--------------------+
     | Network Device HAL |
     +--------------------+
               |
+-----------------------------+
| Wireless Network Device HAL |
+-----------------------------+
               |
       +--------------+
       | 802.15.4 HAL |
       +--------------+

But with the vision taken into account when designing the architecture, adding other HALs can be done later on one PR at a time. It will still be unlikely that the "Network Device HAL" and the "Wireless Network Device HAL" would survive adding new interfaces into this unchanged. But if the architecture is developed with this vision in mind, I'm sure it will be possible that every concrete HAL implements a parent "Wireless Network Device HAL" (or "Wired Network Device HAL"), which in turn implements the "Network Device HAL".

And I don't think it would be an issue if the APIs during major architectural changes is not stable for a couple of release. IMO end users care about the high level APIs (like the Sock API or POSIX sockets) rather than network stack internals, and "core developers" are happily accepting the pain of transition to a new API, if the benefits of the new API are compelling enough.

PeterKietzmann · 2020-04-26T12:27:37Z

@maribu thanks for your investigation. (i) Do I see it correctly that your main goal is to reduce duplication between HALs of different technologies? (ii) Where would you place a technology specific MAC layer in your proposal?

maribu

Some inline comments. I have to general concerns:

I would like to have the API non-blocking in order to allow implementing a network stack in a single thread. With blocking functions, at least one thread per network device is needed. (A shared thread would be unable to react to events of other network devices when an operation performed on another device blocks)
I think the state machine with only three states is not simplifying life, but quite the opposite:
- Operations like len() or read() are only possible to be called when the driver is not only in RX, but additionally has a frame in the receive buffer. (So at least two distinct states in RX to take into account)
- The transmit() operations requires a previous call to prepare(). (So at least two distinct states in TX)
- ==> I'm convinced that explicitly spelling out the states the HAL can have (so an abstract state machine of the HAL, not the fine-grained driver internal states) will make life easier

doc/memos/rdm-draft-alamos-ieee802154-radio-hal.md

maribu · 2020-04-26T09:05:07Z

doc/memos/rdm-draft-alamos-ieee802154-radio-hal.md

+          |              |  +-----------------------------------------------+           
+          |              |  |                                               |           
+   netdev_driver_t       |  |               802.15.4 Radio HAL              |
+          |              |  |                                               |           
+          |              |  +-----------------------------------------------+           
+          |              |                         ^                                               
+          |              |                         |                                               
+          |              |                   Device Driver API                                       
+          |              |                         |                                               
+          v              |                         v                                               
+---------------------+  |  +-----------------------------------------------+
+|                     |  |  |                                               |
+|    Device Driver    |  |  |                  Device Driver                |
+|                     |  |  |                                               |
+---------------------+  |  +-----------------------------------------------+


Is it really needed that the HAL sits on top of the driver? How about something like this:

+--------------------------------------+ | +--------------------+ | | | 802.15.4 Radio HAL | Device Driver | | +--------------------+ | +--------------------------------------+

So that the Device Driver implements the 802.15.4 Radio HAL, rather than being called by the HAL? By having common code in a (say) driver_ieee802154_common module, no code duplication is needed.

Some drivers in RIOT currently have e.g. a foo_send() function in addition to the netdev_driver_t::send(). But the netdev_driver_t::send() implementation often just calls the foo_send() without adding any features. This overhead seemingly serves no purpose. I really don't see why we need a foo_send() at all. IMO the HAL should be the only API the driver provides for features that can be access through the HAL. (Features that are not part of the HAL, e.g. because they are specific to a single device, would obviously be still exposed outside of the common HAL.)

maribu · 2020-04-26T09:10:17Z

doc/memos/rdm-draft-alamos-ieee802154-radio-hal.md

+hardware-specific device driver, by implementing the Radio HAL API.  Since
+devices drivers do not depend on the Radio HAL, it is still possible to use a
+the raw driver of a specific device, for testing purposes or accessing device
+specific features.


I'm strongly against this. What is the use case of a hardware-specific API for features that are accessible through the Radio HAL API? If the driver implements the Radio HAL API natively and as efficient as a driver specific API would be, what is the harm in using the Radio HAL API?

(Obviously: The driver should still be allowed to provide features not covered by the HAL by extending it with device specific APIs.)

Actually I'm not strongly in favor of having the Radio HAL on top of the device driver, but I tried to mimic what we have in other devices (e.g saul on top of sensor devices). Besides testing, I agree that the use cases for a "raw device driver" are rare.

If people don't care about that, I'm fine with the approach in #13943 (comment)

Also note: When new drivers for "strange" transceivers (e.g. let's say the CC1101 or the PR for adding netdev integration of the nRF24L01+) get merged, it would be very helpful to already have a good testing application in place. If the Radio HAL is the only API and a common part of the HAL exists, a shared testing application will likely find it's way into RIOT. That would ease testing for new transceivers in a "one-of-a-kind" class a lot, as those wouldn't need their own test application. (Except for features unique to them, if implemented.)

In general I'm OK with that approach. @miri64 @PeterKietzmann @benpicco @bergzand is it OK for you too?

I don't seem to understand this point correctly. So far, a technology specific HAL for 802.15.4 devices has been proposed. @maribu proposes a common HAL component that includes "strange" transceivers. @jia200x is OK with that approach, in general. Isn't that contradictory?

My idea is that the common HAL component is a subset of the 802.15.4 specific HAL. Thus, every device implementing the 802.15.4 Radio HAL would implicitly also be compliant with the common HAL.

So: The IEEE 802.15.4 HAL would be technology specific and technology aware, but it would have a generic subset. To me, this works fine and is not contradicting.

But I think what @jia200x above agreed to is allowing and encouraging device drivers to directly implement the functions of the Radio HAL, rather than writing an 802.15.4 driver with an internal API and a Radio HAL wrapper above, with the Radio HAL wrapper being the only user of the driver internal API.

(Please note: I'm not against sharing code between different IEEE 802.15.4 driver; quite the contrary. I just would like to prevent having e.g. a ieee802154_driver_t::transmit() function that contains a single line of code calling e.g. at86rf2xx_transmit(). This would add one function call of overhead without improving code structure or maintainability. So if there is indeed a reason to implement one function of the HAL as a layer on top of more low level driver function (e.g. avoiding code duplication), I'm also all in for that. But please let us avoid adding a layer for the sake of adding a layer.)

maribu · 2020-04-26T13:07:19Z

doc/memos/rdm-draft-alamos-ieee802154-radio-hal.md

+Unlike the `netdev` interface, the 802.15.4 Radio HAL requires only these three
+states to drive the radio. This is due to the fact that the radio is either in


Not true. The programmer needs to take into account that a frame can only be fetched after it has been received. So the actual number of states to take into account is larger. (See above.)

I think what I wrote here is not accurate (see #13943 (comment))

To me, this is an upper layer logic. The HAL guarantees that it's safe to access the frame buffer on RX_DONE. But the transceiver state could be TRX_OFF or TX_ON, since the FB access is independent of the state.

That's what I mean with the fact these are only "transceiver states" (PHY states). In fact, these states are mostly independent to MAC and device states (the exception is the TRX_OFF during sleep, since it's implicit that the transceiver is disabled when the device sleeps)

You're correct. Only three states are sufficient, when only looking at the transceiver. (But you might want to highlight a bit more that these states should only cover the transceiver?)

Sure, I will clarify that

doc/memos/rdm-draft-alamos-ieee802154-radio-hal.md

maribu · 2020-04-26T13:17:33Z

doc/memos/rdm-draft-alamos-ieee802154-radio-hal.md

+    /**
+     * @brief Set the sleep state of the device (sleep or awake)
+     *
+     * @param[in] dev IEEE802.15.4 device descriptor
+     * @param[in] sleep whether the device should sleep or not.
+     *
+     * @post the transceiver state is @ref IEEE802154_TRX_STATE_TRX_OFF,
+     *       regardless of the value of @p sleep
+     *
+     * @return 0 on success
+     * @return negative errno on error
+     */
+    int (*set_sleep)(ieee802154_dev_t *dev, bool sleep);


Is this actually needed? How about renaming the start() to on() and providing an off() function in addition?

To me, there is little to no different between starting a network device after init() or after it has been put into lowest power mode.

Actually I like the on/off approach better than the start. I just wanted to avoid extra functions, but I think it would give even more control

actually, what do you think of moving the "set_sleep" function to "off"?

Basically the behavior would be the same

(I think that's what you meant, am I right?)

note to self: since we are now using "on" and "off", the "on" function must initialize all values with the default "IEEE802154_XXX" params (e.g CONFIG_IEEE802154_DEFAULT_CHANNEL). Radios lose register content when turned off

hmmm maybe it's even better to not set anything. There are some changes the MAC already will set a different PHY state than the default one when waking up the radio

maribu · 2020-04-26T13:19:58Z

doc/memos/rdm-draft-alamos-ieee802154-radio-hal.md

+    /**
+     * @brief Set IEEE802.15.4 promiscuous mode
+     *
+     * @pre the device is not sleeping
+     *
+     * @note this function pointer can be NULL if the device doesn't support
+     *       hardware address filtering.
+     *
+     * @param[in] dev IEEE802.15.4 device descriptor
+     * @param[in] enable whether the promiscuous mode should be enabled or not.
+     *
+     * @return 0 on success
+     * @return negative errno on error
+     */
+    int (*set_promiscuous)(ieee802154_dev_t *dev, bool enable);


Can we have this as first element? It would be super nice if the struct of the IEEE 802.15.4 HAL implementation would overlap with a Wireless Network Device HAL. Otherwise (assuming the idea of "object orientation" is agreed upon), one might end up having to device two structs.

This feature is certainly something I would put into the generic Wireless Network Device HAL.

Yes, we can rearrange it as we want here

maribu · 2020-04-26T13:31:06Z

doc/memos/rdm-draft-alamos-ieee802154-radio-hal.md

+typedef enum {
+    /**
+     * @brief the transceiver detected a valid Start Frame Delimiter, which
+              indicates the radio started to receive a packet.
+     */
+    IEEE802154_RADIO_RX_START,
+
+    /**
+     * @brief the transceiver received a packet and there is a packet in the
+     *        internal framebuffer.
+     *
+     * The transceiver is in @ref IEEE802154_TRX_STATE_RX_ON state when
+     * this function is called, but with framebuffer protection (either
+     * dynamic framebuffer protection or disabled RX). Thus, the packet
+     * will not be overwritten before calling the @ref ieee802154_radio_read
+     * function. However, @ref ieee802154_radio_read MUST be called in
+     * order to receive new packets. If there is no interest in the
+     * packet, the function can be called with a NULL buffer to drop
+     * the packet.
+     */
+    IEEE802154_RADIO_RX_DONE,
+
+    /**
+     * @brief the transceiver finished sending a packet or the
+     *        retransmission procedure.
+     *
+     * If the radio supports frame retransmissions the
+     * @ref ieee802154_radio_get_tx_status MAY be called to retrieve useful
+     * information (number of retries, frame pending bit, etc). The
+     * transceiver is in @ref IEEE802154_TRX_STATE_TX_ON state when this function
+     * is called.
+     */
+    IEEE802154_RADIO_TX_DONE,
+} ieee802154_trx_status_t;


I think the events can be generic to all wireless network devices.

Additionally, as already said above: I would like this API to allow non-blocking use. As a result, more events are needed. E.g. in a blocking mode you can simply do:

prepare(); transmit();

But if the prepare() does not block, the transmit() might return -EAGAIN (or -EBUSY?) when the prepare() has not yet completed. (prepare() could take some significant time if the device has the frequency generator offline.)

Instead, an event would need to be triggered when the device is ready to transmit. One could add some convenience code to block until an event is triggered, e.g. it could be used like this:

prepare(); await_event(READY_FOR_TX); transmit();

The only thing that I'm concerned here: if we need to care about "blocking" stuff here at radio level, IMO a higher layer is not doing it's job well.

By design, these Wireless MAC layers work under a request/confirm architecture (MLME-SAP, MCPS-SAP) which are by nature non blocking. A user "request" the MAC layer to send data, and the MAC layer might respond with a "queue is full" or "queued".

So, the MAC is the non-blocking layer in the IEEE802.15.4 architecture.

Also, consider the IEEE802.15.4 PHY definition:

Basically the Radio HAL is the RF-SAP in the IEEE802.15.4 PHY model. The specification doesn't say anything about the blocking/non-blocking nature of the PD-SAP interface, but IMO the interface definition suggest that this is not the layer that should care about this.
As an example, this is the "confirm" primitive of the PD-DATA.request command (used to send data at PHY level):

Another example, some APIs (OpenWSN) expect the radio to simply transmit data (because the media access is already controlled by the MAC). In fact, the "transmit" function is a void function and the stack has no way to deal with a -EBUSY (if it happens for whatever reason)

The only thing that I'm concerned here: if we need to care about "blocking" stuff here at radio level, IMO a higher layer is not doing it's job well.

Well, the current situation is that e.g. GNRC happily calls send() at every given point in time no matter if the device is currently already sending a frame or not. If the driver does not block during send() and returns an -EBUSY, the frame is just lost. As a result when e.g. using 6LoWPAN and sending any packet needing layer 2 fragmentation, all but the first fragment are lost.

In fact, the "transmit" function is a void function and the stack has no way to deal with a -EBUSY (if it happens for whatever reason)

This would not be an issue, if the upper layer waits for the driver to be ready for transmission first. E.g. in the state machine above, a transition to the state "Ready for TX" would trigger an event the MAC could wait for / react to. And in that state, a transmit() never fails.

So, if the MAC layer synchronizes with the driver, an -EBUSY should never ever happen.

Well, the current situation is that e.g. GNRC happily calls send() at every given point in time no matter if the device is currently already sending a frame or not. If the driver does not block during send() and returns an -EBUSY, the frame is just lost. As a result when e.g. using 6LoWPAN and sending any packet needing layer 2 fragmentation, all but the first fragment are lost.

I strongly agree with this. But the actual problem here is the lack of MAC layers, not the implementation of the send function. In practice, GNRC would interact with the MAC layer via non blocking requests (it becomes even more important when using slotted MACs, unless we want to block until the slot is available)

So, if the MAC layer synchronizes with the driver, an -EBUSY should never ever happen.

That's my point. If the radio HAL provides resources to know when it's safe to send again (event callbacks, etc), why do we need to add an -EBUSY here?

That's my point. If the radio HAL provides resources to know when it's safe to send again (event callbacks, etc), why do we need to add an -EBUSY here?

Sorry for being unclear, the -EBUSY was not my point. I would only like the Radio HAL to define that a call to transmit() fails, unless prepare() has been called previously and the device signaled completion of the prepare(). I don't mind if the error is then reported using a return value, or an assert() blows, or a log message is generation, or the frame is silently dropped. I really just want the transmit() to fail when called in an inappropriate state.

I understand the intention here, but under what circumstance you would get to a state where you are not sure if it's safe to call "transmit"? With this API, when the "prepare" function finishes the radio is ready to transmit (FB is already loaded). The only case I could imagine is the case where we use some kind of DMA to prepare a frame. But with the current "iolist" this would be hard anyway (I'm not aware of scattering/gathering DMA APIs).

If a MAC layer decides to prepare in a previous cycle and transmit later, the API can in practice rely on the TX_DONE event.

I just did some measurements for the samr21-xpro (basic mode, direct send). For sending 127 bytes packet, it takes around 400 us to "prepare" the frame and ~4300 us from TX_START to TX_END (TOA). Considering the LIFS is 640us, calling "prepare" right after the TX_DONE event is still compliant with sending packets with IFS >= 640us.

Maybe we could add a precondition that the "prepare" function should be called while there's no ongoing transmission (e.g after TX_DONE)?

Is it allowed to call prepare() when the transceiver state is not TX? When prepare() really just copies over the buffer and does not need to wait for the PLL to settle, prepare() could block. (As said, to me blocking for an SPI transfer does count as "non-blocking" 😉) In that case, there is no need for an event to notify the HAL user that prepare() finished.

However, a simplistic user of the HAL (e.g. an ALOHA MAC) would still have to await an event for calling prepare() again after having issued transmit(). For simplicity, this could just be the TX_COMPLETED event.

Is it allowed to call prepare() when the transceiver state is not TX?

Yes! In fact, it is preferred to call prepare during TRX_OFF (if there are no timing constraints, considering switching from RX_ON to TX_ON takes 1 us but from TRX_OFF ~200 us). TRX_OFF consumes ~0.5 mA in while TX_ON consume ~6 mA (at86rf2xx).

However, a simplistic user of the HAL (e.g. an ALOHA MAC) would still have to await an event for calling prepare() again after having issued transmit(). For simplicity, this could just be the TX_COMPLETED event.

Exactly, that's the idea

maribu · 2020-04-26T14:06:08Z

(i) Do I see it correctly that your main goal is to reduce duplication between HALs of different technologies?

This would also be possible to some degree. But I mostly have code duplication between users of the HALs in mind.

E.g. primitive non-persistent CSMA implementation could likely be implemented for every wireless network device; in the most primitive form a transceiver that provides the current RSSI value would be sufficient. If the subset of the features common to all wireless network devices could be provided in a way to allow portable access to those features, that could be implemented once and used for all wireless transceivers.

(Note: The function to perform a single CCA could be part of the generic Wireless Network Device HAL and implemented fully in software by primitive transceivers; e.g. by fetching the current RSSI value and comparing it to a threshold. If a device not supporting any means to perform a CCA, it could return -ENOTSUP.)

(ii) Where would you place a technology specific MAC layer in your proposal?

On top of the most generic HAL that provides all features needed. E.g. TSCH is targeting IEEE 802.15.4, so there is no way to implement it on top of a more generic HAL. But a primitive CSMA MAC should work fine on top of the generic HAL.

Note that I see the generic "Wireless Network Device HAL" not as an extra layer, but as a subset of the function pointers in the "IEEE 802.15.4 HAL". So a driver implementing IEEE 802.15.4 HAL would not need to do anything in addition to be compatible with the "Wireless Network Device HAL" - this compatibility would come for free.

jia200x · 2020-04-26T16:57:24Z

A general request: since there are several pending technical discussions, could we maybe try to focus first in these aspects (transceiver state, blocking-non blocking stuff) before discussing about DRY/maintainability?

I'm in for everything that makes life easier when adding new HALs or maintaining code, but IMO we should first agree on the design decisions.

miri64 · 2023-02-20T13:56:52Z

doc/memos/rdm-draft-alamos-ieee802154-radio-hal.md

-Idle. The former will set the Abstract State Machine state to the transient
-state RX BUSY and then to Idle on `RF_RX_DONE` event. The latter will change
-the state to Idle only if the radio is currently not receiving a frame.
+TODO: Calls


I guess this sneaked in by accident? Having a TODO in the final document does not make it very final.

oh, good catch. I forgot that

miri64 · 2023-02-20T13:57:36Z

doc/memos/rdm-draft-alamos-ieee802154-radio-hal.md

+# 10 Revision
+
+- Rev6: Relax Abstract State Machine transitions
+- Rev5: Add Source Address Matching
+- Rev4: Add Async behavior
+- Rev3: Merge Device Driver and HAL
+- Rev2: Update sleep preconditions
+- Rev1: Address on/off functions 
+- Rev0: initial document


I think these can go, as the doc was not part of master.

Rev0: initial document can stay, as far as I understand.

jia200x · 2023-02-20T15:06:29Z

When I was working on #19213 I found it wasn’t clearly documented what events to expect in what state and how they should be handled.

I added in the last push a table that summarizes that.

jia200x · 2023-02-20T15:30:14Z

I clarified the handling of each state a bit more.

miri64 · 2023-02-20T15:38:12Z

static-tests are also complaining.

jia200x · 2023-02-20T17:46:02Z

done! I guess I have to rename the file to rfm0002 at some point too...

EDIT: Renamed!

jia200x · 2023-02-21T09:41:54Z

May I squash?

miri64

Yepp. Please also add RDM2 to the list in the README and see the following comments.

miri64 · 2023-02-21T12:11:18Z

doc/memos/rdm0002.md

+for the implementation of the Event Notification callback (e.g the `RX_DONE`
+event might post an event to an event queue in order to fetch the packet or
+`PLL_LOCK` might unlock a mutex).
+
+The full list of events and implications are defined in the Interface
+Definition section.


PLL_LOCK is not mentioned in the Interface Definition section at all. Is that intentional?

Kind of. The RDM does not prevent adding such an event, but we didn't include it in the final implementation because it's kind of a niche use case. We can still leave it there to point out that other events can be introduced if needed.

We can still leave it there to point out that other events can be introduced if needed.

Maybe rather remove the example here and add a mention of the introduction of other events still?

jia200x · 2023-02-21T13:53:45Z

squashed!

jia200x · 2023-02-22T09:20:08Z

@miri64 proposed to add the corresponding IEEE802154_RADIO_ prefix to the event references. I just amended that change directly and pushed.

miri64

RE-ACK

miri64 · 2023-02-22T10:25:07Z

bors merge

bors · 2023-02-22T10:25:12Z

🕐 Waiting for PR status (GitHub check) to be set, probably by CI. Bors will automatically try to run when all required PR statuses are set.

riot-ci · 2023-02-22T12:03:02Z

Murdock results

✔️ PASSED

7ac3052 rdm/radio_hal: add radio HAL design document

Success	Failures	Total	Runtime
1	0	1	55s

Artifacts

Documentation preview

miri64 · 2023-02-22T12:33:39Z

bors merge

bors · 2023-02-22T12:59:14Z

Build succeeded:

miri64 · 2023-02-22T12:59:46Z

🎉

jia200x · 2023-02-22T13:00:45Z

it took almost three years! Thank you everyone for this!

emmanuelsearch · 2023-02-22T15:09:46Z

Thanks for pushing all this time!

benpicco · 2023-02-28T18:56:33Z

doc/memos/rdm0002.md

+                    |   IDLE  |              |   RX    |
+                    |         |<-------------|         |
+                    +---------+              +---------+
+                               SET_TRX_STATE


Getting back to this, is there not an implicit TX state hidden inside IDLE when the device is busy sending the actual frame?

just checked this. There is definitely an implicit TX state, but it's already marked by the request_transmit and confirm_transmit function (which prevents calling set_trx_state).But to keep things simple, it just assumes the radio is either listening (RX), ready to transmit (TX) or the transceiver is off (which is not equivalent to turning off the device with off).

benpicco · 2023-06-05T13:48:34Z

doc/memos/rdm0002.md

+
+The author of this memo can be contacted via email at jose.alamos@haw-hamburg.de
+
+# Appendix A: Radio HAL header file (2023-02)


What's the point of including this here?
This will just become outdated (if it isn't already)

it was just a way to keep track of the Radio HAL status at the moment of merge. A commit hash could have done the same though

jia200x requested review from haukepetersen, miri64, benpicco, maribu, kaspar030, bergzand, leandrolanzieri and PeterKietzmann April 24, 2020 16:26

jia200x requested a review from aabadie as a code owner April 24, 2020 16:26

jia200x requested review from fjmolinas, roberthartung, tcschmidt and waehlisch April 24, 2020 16:37

benpicco reviewed Apr 24, 2020

View reviewed changes

maribu reviewed Apr 26, 2020

View reviewed changes

jia200x closed this Apr 26, 2020

miri64 requested changes Feb 20, 2023

View reviewed changes

miri64 reviewed Feb 21, 2023

View reviewed changes

jia200x force-pushed the pr/rdm/radio_hal branch 2 times, most recently from 0953490 to c767296 Compare February 22, 2023 09:18

rdm/radio_hal: add radio HAL design document

7ac3052

jia200x force-pushed the pr/rdm/radio_hal branch from c767296 to 7ac3052 Compare February 22, 2023 10:16

miri64 approved these changes Feb 22, 2023

View reviewed changes

bors bot merged commit e738fdd into RIOT-OS:master Feb 22, 2023

benpicco reviewed Feb 28, 2023

View reviewed changes

MrKevinWeiss added this to the Release 2023.04 milestone Apr 25, 2023

benpicco reviewed Jun 5, 2023

View reviewed changes

benpicco mentioned this pull request Jun 5, 2023

rdm/radio_hal: remove appendix A #19709

Merged

		* If the radio is still transmitting, it should block until it is safe to
		* write to the frame buffer again.

		Unlike the `netdev` interface, the 802.15.4 Radio HAL requires only these three
		states to drive the radio. This is due to the fact that the radio is either in


		The author of this memo can be contacted via email at jose.alamos@haw-hamburg.de

		# Appendix A: Radio HAL header file (2023-02)

RDM: The 802.15.4 Radio HAL #13943

RDM: The 802.15.4 Radio HAL #13943

Conversation

jia200x commented Apr 24, 2020 • edited Loading

Contribution description

Implementation

Memory efficiency

samr21-xpro

nrf802154

Testing procedure

Issues/PRs references

benpicco Apr 24, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jia200x Apr 24, 2020 • edited Loading

Choose a reason for hiding this comment

maribu Apr 25, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jia200x Apr 24, 2020 • edited Loading

Choose a reason for hiding this comment

jia200x Apr 24, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maribu commented Apr 24, 2020

jia200x commented Apr 24, 2020 • edited Loading

mcr commented Apr 25, 2020

maribu commented Apr 26, 2020

PeterKietzmann commented Apr 26, 2020

maribu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jia200x Apr 26, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maribu commented Apr 26, 2020

jia200x commented Apr 26, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jia200x commented Apr 24, 2020 •

edited

Loading

benpicco Apr 24, 2020 •

edited

Loading

jia200x Apr 24, 2020 •

edited

Loading

maribu Apr 25, 2020 •

edited

Loading

jia200x Apr 24, 2020 •

edited

Loading

jia200x Apr 24, 2020 •

edited

Loading

jia200x commented Apr 24, 2020 •

edited

Loading

jia200x Apr 26, 2020 •

edited

Loading

jia200x commented Feb 20, 2023 •

edited

Loading