Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RDM: The 802.15.4 Radio HAL #13943

Merged
merged 1 commit into from
Feb 22, 2023
Merged

RDM: The 802.15.4 Radio HAL #13943

merged 1 commit into from
Feb 22, 2023

Conversation

jia200x
Copy link
Member

@jia200x jia200x commented Apr 24, 2020

Contribution description

This RDM introduces a new Radio Hardware Abstraction Layer for 802.15.4 compliant radios.

Abstract:

This memo describes a Hardware Abstraction Layer (HAL) for radios compliant
with the IEEE802.15.4 standard. The work follows a technology-specific approach
to provide well defined hardware access that allows to implement agnostic
IEEE802.15.4 PHY and MAC layers on top. Additionally, the new HAL enables
integration of network stacks that require direct access to the radio.

Implementation

The proposed API, 2 PoC reference implementations (nrf802154 and at86rf23x) and a small test application were implemented in this branch.

See #14655

Memory efficiency

EDIT: BASED ON OLD TEST! THIS IS OUTDATED NOW

Here are some preliminary memory usage results against netdev. Note that the Radio HAL version consumes less ROM because most getters are not needed (e.g channel, tx_power, addresses are stored by the PHY/MAC, so it's not necessary to ask the device for Information Base). Also, a compliant Radio HAL requires less instructions (no unnecessary state changes, sleep/wake-up routines everywhere, etc).

samr21-xpro

`hal_test` application with netdev using base Radio HAL base commit (3e7ddcd):
Building application "tests_hal_test" for "samr21-xpro" with MCU "samd21".

   text	  data	   bss	   dec	   hex	filename
  20916	   504	  4056	 25476	  6384	
`hal_test` application with Radio HAL:
Building application "tests_hal_test" for "samr21-xpro" with MCU "samd21".

   text	  data	   bss	   dec	   hex	filename
  18544	   504	  4028	 23076	  5a24

nrf802154

`hal_test` application with netdev using base Radio HAL base commit (3e7ddcd):
Building application "tests_hal_test" for "nrf52840dk" with MCU "nrf52".

   text	  data	   bss	   dec	   hex	filename
  16396	   532	  4108	 21036	  522c
`hal_test` application with Radio HAL:
Building application "tests_hal_test" for "nrf52840dk" with MCU "nrf52".

   text	  data	   bss	   dec	   hex	filename
  14888	   508	  4084	 19480	  4c18

Testing procedure

Issues/PRs references

#13771
#14655
#14371

Comment on lines 444 to 445
* If the radio is still transmitting, it should block until it is safe to
* write to the frame buffer again.
Copy link
Contributor

@benpicco benpicco Apr 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please return -EBUSY here and make the upper layer block?
Blocking in the radio thread (while still servicing radio events) is ugly and error-prone.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can. Ideally a MAC layer would synchronize access to the radio, so this shouldn't happen. But it doesn't hurt to return -EBUSY here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly agree with @benpicco. E.g. the at86rf2xx and the cc110x driver have special quirks to implement this behavior that is both ugly and complex. But without those quirks, a simple ping6 with a size big enough to cause layer 2 fragmentation would currently result in the driver ignoring the subsequent frames while busy sending the first fragment. And the upper layers don't retry, resulting in the whole packet being lost.

With most use cases only sending IPv6 packets fitting into a single frame, this issue is also often overlooked.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree that the bottleneck should not be here.
But even with an -EBUSY, the MAC layer on top should synchronize this anyway.
But I agree that this could improve a lot our current state while we don't have a full 802.15.4 MAC. So, I'm in for the change

Copy link
Member Author

@jia200x jia200x Apr 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm I remember the design decision behind this:

Most radios (if not all) allow to write packets into the framebuffer while the radio is still transmitting. IIRC, the at86rf233 allows to write again into the framebuffer 192 us after the TX_START command. I agree that under no circumstances we should block during the whole TX procedure, but this delay is really short compared to the LIFS (640 us)

I've seen some radios that simply "sleep" for 192 us before trying to load the framebuffer again.

How to proceed? I can think of:

  • Drop this "load before TX done" feature and simply return -EBUSY if the radio is BUSY
  • Try to return -EBUSY if it's not possible to load the framebuffer (T<192us)
  • Block for 192 us

Copy link
Member

@maribu maribu Apr 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/** Check if the radio driver is currently receiving a packet /
int ( receiving_packet)(void);

I'm strongly against this. This interface will inherently result in TOC/TOU bugs.

Even if the driver accepts frames while e.g. RX or TX without ACK request is ongoing, transmit would then just set a flag to execute the TX as the next operation.

I'm not sure if I understand this correctly. But as @jia200x already pointed out, this function should have predictable timing behavior and very little delay to allow implementing TDMA(ish) MAC schemes.

In slotted MAC layers, the state change can be forced calling ieee802154_radio_set_trx_state(dev, TX_ON) directly.

Are you sure you really want to allow explicitly setting the TX state? The internal state machines of the hardware can be rather complex and hard to map reasonably to the HAL. Wouldn't it be better to just define that prepare() in addition to loading the buffer brings the device into a state in which starting a transmission can be triggered with as little delay as possible? So if some intermediate state is needed to put the device to before being able to transmit (e.g. some transceivers turn their frequency generator of in an idle state and need some time for the phase locked loop to settle), prepare() would just do so. And transmit() then does implicitly the state change to TX and back to IDLE/RX once done.

So only the following three states IMHO should be settable through the HAL:

  1. OFF: Device is in the lowest possible power consumption mode. Switching to IDLE or RX can require some time. Neither prepare() nor transmit() are allowed to be called in this state.
  2. IDLE: Device is not listing for incoming packets, but prepare() and transmit() are allowed to be called. Switching to RX might be faster than from OFF and more power might be consumed.
  3. RX: Device is listening for packets. A call to prepare() is allowed and may implicitly change the state to an internal state appropriate for fast transmission. Thus, after prepare() the device might not be listening for incoming packets anymore until either the transmission is triggered or aborted.

I think exposing these three states can be implemented reasonably for every network device. And I don't see a use case for having explicit control over more states.

Update: I added a more detailed proposal of which states an abstract state machine representation of a Radio could have

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I didn't consider slotted MACs.
In that case it might be possible for prepare to succeed but transmit would return -EBUSY if the radio is still receiving.

But if the driver then generates en event to the upper layer when it switches back to idle, that upper layer can decide to do the transmit immediately - or wait for the next time slot.

That way the whole TX after RX and MAC access considerations could be kept out of the radio driver.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm strongly against this. This interface will inherently result in TOC/TOU bugs.

I agree, just threw some ideas without deep thinking :P

I'm not sure if I understand this correctly. But as @jia200x already pointed out, this function should have predictable timing behavior and very little delay to allow implementing TDMA(ish) MAC schemes.

Yes. The radio operation should be more or less predictable.

I'm still confused about the blocking/non-blocking thing here. I strongly agree that a non-blocking mechanism is needed for sending packets, but the right layer to handle that is the MAC. In fact, the MAC MUST be non-blocking because in most cases packets are delayed (TSCH, slotted mode, sending beacons over data, etc).

In practice, an upper layer would request the MAC layer to send a packet and the MAC might refuse it or send when it's the right time. The MAC layer would read packets from a queue and send them one by one.

The question is, do we need a non-blocking radio HAL to implement a a non-blocking MAC? If the bottleneck is in the radio, the MAC is doing something wrong.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the documentation above needs to be updated, no that the request/confirm paradigm was settled on.

Two additional remarks:

  1. Can you replace the term packet by frame?
  2. It would be nice to also state how long the driver can be sure "own" the buffer passed. An peripheral driver with DMA might not need to copy the iolist into an contiguous RAM buffer. (E.g the Ethernet driver of the STM32 fetches the frame via DMA, and the DMA happily accepts a linked list of DMA descriptors with each containing only segment of the frame. This concept is not specific to the Ethernet case, so peripheral radios might provide the same feature.) So if the driver would "own" the iolist until confirm_transmit() returns success or an error, this would allow zero-copy transmission. It would come with the price of extra care being taken by the upper layer, if the upper layer truly exploits the features of the async API.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the documentation above needs to be updated, no that the request/confirm paradigm was settled on.

I agree.

  1. Can you replace the term packet by frame?

Yes. It was in my bucketlist, but forgot to update it.

  1. It would be nice to also state how long the driver can be sure "own" the buffer passed. An peripheral driver with DMA might not need to copy the iolist into an contiguous RAM buffer. (E.g the Ethernet driver of the STM32 fetches the frame via DMA, and the DMA happily accepts a linked list of DMA descriptors with each containing only segment of the frame. This concept is not specific to the Ethernet case, so peripheral radios might provide the same feature.) So if the driver would "own" the iolist until confirm_transmit() returns success or an error, this would allow zero-copy transmission. It would come with the price of extra care being taken by the upper layer, if the upper layer truly exploits the features of the async API.

Good catch, I think this must be specified. I agree with "owning" the packet until confirm_transmt is returns success or error.

* NULL if this information is not needed.
*
* @return number of bytes written in @p buffer (0 if @p buf == NULL)
* @return -ENOBUFS if the packet doesn't fit in @p
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the packet be discarded here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frame is the correct term at layer 2. We should try carefully to get those terms correct to avoid confusion. (Sadly, RIOT's doc uses the term packet regardless of the layer at many places. I should grep for all usage of packet and verify or fix the term.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the packet is always discarded

Copy link
Member Author

@jia200x jia200x Apr 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I forgot something here (thanks for pointing it out @maribu ). All these packets are "Physical Service Data Unit" (PSDU) packets. I will update that

Edit: frames... I meant, frames

Copy link
Member Author

@jia200x jia200x Apr 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add a comment that indicates the frame is always discarded

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PSDU = frame right? Lets decide for one of both. You should add it to the "Acronyms" or "Definitions" section then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess PSDU is more explicit. (Frame can have other meanings in other contexts.) Also, PSDU sounds smarter :-)

/**
* @brief the device support the IEEE802.15.4 Sub GHz band
*/
IEEE802154_CAP_SUB_GHZ,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IEEE802154_PHY_MODES could be another capability.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this can be used to report available PHY modes too

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, this list can/should be expanded. There are other capabilities that are not listed here but exist anyway (e.g LBT, CSMA_BACKOFF timers, etc)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The list can and should be extended as explained here. Personally, I hope that it won't get extended as much as netopt.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we could try to limit the scope here. Usually, capabilities are mapped to hardware accelerations, features and certain conditions (e.g if the radio calculates the checksum, that's a CAP).

In practice, I've seen that this list doesn't explode (see here or here)

@maribu
Copy link
Member

maribu commented Apr 24, 2020

I expect similarities between the HALs of different network devices. Maybe it makes sense to do "object oriented" programming here. E.g. having a base "Network Device HAL" as base. (And even a "base struct" as driver in the implementation.) Something like this maybe:

                         +--------------------+
                         | Network Device HAL |
                         +--------------------+
                               |      |
                   +-----------+      +-------------+
                   |                                |
    +--------------------------+      +-----------------------------+
    | Wired Network Device HAL |      | Wireless Network Device HAL |-------------+
    +--------------------------+      +-----------------------------+             |
       |           |                     |             |        |                 |
+-----------+ +---------+       +--------------+ +----------+ +------------+ +------------+
| 802.3 HAL | | CAN HAL | ...   | 802.15.4 HAL | | LoRa HAL | | nrfmin HAL | | CC110x HAL | ...
+-----------+ +---------+       +--------------+ +----------+ +------------+ +------------+

And every child would provide all the features and API of all of its parents, (or at least return -ENOTSUP). The basic Network Device HAL would only provide means to send and receive frames. The Wired Network Device HAL could additional provide the info if a cable is plugged in. The Wireless Network Device HAL could additionally provide common features like the tuned in channel / frequency, the TX power, etc.

I believe that there are many use cases where the code does not care if what network device it talks to. It would be a pity, if that could would have to be re-written for every HAL. But not having a single "on-size-fits-all" HAL, but separate HALs to allow access to technology-specific features seems to be a very compelling idea - if compatibility between the HALs (at a level that makes sense) is provided.

@miri64 miri64 added Area: doc Area: Documentation Area: drivers Area: Device drivers Area: RDM Area: Discussions on RIOT Developer Memos Discussion: RFC The issue/PR is used as a discussion starting point about the item of the issue/PR Type: new feature The issue requests / The PR implemements a new feature for RIOT labels Apr 24, 2020
@jia200x
Copy link
Member Author

jia200x commented Apr 24, 2020

I expect similarities between the HALs of different network devices. Maybe it makes sense to do "object oriented" programming here. E.g. having a base "Network Device HAL" as base. (And even a "base struct" as driver in the implementation.) Something like this maybe:

Well, one of the reasons to implement the "ieee802154_radio_xxx" wrappers (not mentioned in the RDM but available in the API implementation) was to allow these kind of optimizations. You can re-arrange the function pointers without breaking the API.

Actually, I remember there were some offline comments about implementing "device" structures, which could include "Device Management" stuff (start, irq_handler, set_sleep, etc).

NB there are some classes of devices that might not require a HAL. There are several technologies that have only one device (LoRa, CC110x). We had some offline conversations with @miri64 and @PeterKietzmann , and we thought it's important to check which technologies require a HAL in the mid term.

E.g for LoRaWAN, it should be enough to call the sx127x driver directly since it's the only radio available in the market (patent reasons). All radios (RFM95, etc) inherit from the same chip. The "LoRaWAN devices" (e.g RN2483) are not radios , but FullMAC devices.

@mcr
Copy link
Contributor

mcr commented Apr 25, 2020

I expect similarities between the HALs of different network devices. Maybe it makes sense to do "object oriented" programming here. E.g. having a base "Network Device HAL" as base. (And even a "base struct" as driver in the implementation.) Something like this maybe:

                         +--------------------+
                         | Network Device HAL |
                         +--------------------+
                               |      |
                   +-----------+      +-------------+
                   |                                |
    +--------------------------+      +-----------------------------+
    | Wired Network Device HAL |      | Wireless Network Device HAL |-------------+
    +--------------------------+      +-----------------------------+             |
       |           |                     |             |        |                 |
+-----------+ +---------+       +--------------+ +----------+ +------------+ +------------+
| 802.3 HAL | | CAN HAL | ...   | 802.15.4 HAL | | LoRa HAL | | nrfmin HAL | | CC110x HAL | ...
+-----------+ +---------+       +--------------+ +----------+ +------------+ +------------+

And every child would provide all the features and API of all of its parents, (or at least return -ENOTSUP). The basic Network Device HAL would only provide means to send and receive frames. The Wired Network Device HAL could additional provide the info if a cable is plugged in. The Wireless Network Device HAL could additionally provide common features like the tuned in channel / frequency, the TX power, etc.

I believe that there are many use cases where the code does not care if what network device it talks to. It would be a pity, if that could would have to be re-written for every HAL. But not having a single "on-size-fits-all" HAL, but separate HALs to allow access to technology-specific features seems to be a very compelling idea - if compatibility between the HALs (at a level that makes sense) is provided.

I like your idea, but I am concerned about boiling the ocean here.

@maribu
Copy link
Member

maribu commented Apr 26, 2020

I like your idea, but I am concerned about boiling the ocean here.

Consider the diagram above as a vision where the changes should lead to. For now, only the following part would be designed and implemented:

     +--------------------+
     | Network Device HAL |
     +--------------------+
               |
+-----------------------------+
| Wireless Network Device HAL |
+-----------------------------+
               |
       +--------------+
       | 802.15.4 HAL |
       +--------------+

But with the vision taken into account when designing the architecture, adding other HALs can be done later on one PR at a time. It will still be unlikely that the "Network Device HAL" and the "Wireless Network Device HAL" would survive adding new interfaces into this unchanged. But if the architecture is developed with this vision in mind, I'm sure it will be possible that every concrete HAL implements a parent "Wireless Network Device HAL" (or "Wired Network Device HAL"), which in turn implements the "Network Device HAL".

And I don't think it would be an issue if the APIs during major architectural changes is not stable for a couple of release. IMO end users care about the high level APIs (like the Sock API or POSIX sockets) rather than network stack internals, and "core developers" are happily accepting the pain of transition to a new API, if the benefits of the new API are compelling enough.

@PeterKietzmann
Copy link
Member

@maribu thanks for your investigation. (i) Do I see it correctly that your main goal is to reduce duplication between HALs of different technologies? (ii) Where would you place a technology specific MAC layer in your proposal?

Copy link
Member

@maribu maribu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some inline comments. I have to general concerns:

  1. I would like to have the API non-blocking in order to allow implementing a network stack in a single thread. With blocking functions, at least one thread per network device is needed. (A shared thread would be unable to react to events of other network devices when an operation performed on another device blocks)
  2. I think the state machine with only three states is not simplifying life, but quite the opposite:
    • Operations like len() or read() are only possible to be called when the driver is not only in RX, but additionally has a frame in the receive buffer. (So at least two distinct states in RX to take into account)
    • The transmit() operations requires a previous call to prepare(). (So at least two distinct states in TX)
    • ==> I'm convinced that explicitly spelling out the states the HAL can have (so an abstract state machine of the HAL, not the fine-grained driver internal states) will make life easier

doc/memos/rdm-draft-alamos-ieee802154-radio-hal.md Outdated Show resolved Hide resolved
doc/memos/rdm-draft-alamos-ieee802154-radio-hal.md Outdated Show resolved Hide resolved
doc/memos/rdm-draft-alamos-ieee802154-radio-hal.md Outdated Show resolved Hide resolved
Comment on lines 157 to 162
| | +-----------------------------------------------+
| | | |
netdev_driver_t | | 802.15.4 Radio HAL |
| | | |
| | +-----------------------------------------------+
| | ^
| | |
| | Device Driver API
| | |
v | v
+---------------------+ | +-----------------------------------------------+
| | | | |
| Device Driver | | | Device Driver |
| | | | |
+---------------------+ | +-----------------------------------------------+
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really needed that the HAL sits on top of the driver? How about something like this:

+--------------------------------------+
| +--------------------+               |
| | 802.15.4 Radio HAL | Device Driver |
| +--------------------+               |
+--------------------------------------+

So that the Device Driver implements the 802.15.4 Radio HAL, rather than being called by the HAL? By having common code in a (say) driver_ieee802154_common module, no code duplication is needed.

Some drivers in RIOT currently have e.g. a foo_send() function in addition to the netdev_driver_t::send(). But the netdev_driver_t::send() implementation often just calls the foo_send() without adding any features. This overhead seemingly serves no purpose. I really don't see why we need a foo_send() at all. IMO the HAL should be the only API the driver provides for features that can be access through the HAL. (Features that are not part of the HAL, e.g. because they are specific to a single device, would obviously be still exposed outside of the common HAL.)

Comment on lines 212 to 215
hardware-specific device driver, by implementing the Radio HAL API. Since
devices drivers do not depend on the Radio HAL, it is still possible to use a
the raw driver of a specific device, for testing purposes or accessing device
specific features.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm strongly against this. What is the use case of a hardware-specific API for features that are accessible through the Radio HAL API? If the driver implements the Radio HAL API natively and as efficient as a driver specific API would be, what is the harm in using the Radio HAL API?

(Obviously: The driver should still be allowed to provide features not covered by the HAL by extending it with device specific APIs.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I'm not strongly in favor of having the Radio HAL on top of the device driver, but I tried to mimic what we have in other devices (e.g saul on top of sensor devices). Besides testing, I agree that the use cases for a "raw device driver" are rare.

If people don't care about that, I'm fine with the approach in #13943 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note: When new drivers for "strange" transceivers (e.g. let's say the CC1101 or the PR for adding netdev integration of the nRF24L01+) get merged, it would be very helpful to already have a good testing application in place. If the Radio HAL is the only API and a common part of the HAL exists, a shared testing application will likely find it's way into RIOT. That would ease testing for new transceivers in a "one-of-a-kind" class a lot, as those wouldn't need their own test application. (Except for features unique to them, if implemented.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I'm OK with that approach. @miri64 @PeterKietzmann @benpicco @bergzand is it OK for you too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't seem to understand this point correctly. So far, a technology specific HAL for 802.15.4 devices has been proposed. @maribu proposes a common HAL component that includes "strange" transceivers. @jia200x is OK with that approach, in general. Isn't that contradictory?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea is that the common HAL component is a subset of the 802.15.4 specific HAL. Thus, every device implementing the 802.15.4 Radio HAL would implicitly also be compliant with the common HAL.

So: The IEEE 802.15.4 HAL would be technology specific and technology aware, but it would have a generic subset. To me, this works fine and is not contradicting.

But I think what @jia200x above agreed to is allowing and encouraging device drivers to directly implement the functions of the Radio HAL, rather than writing an 802.15.4 driver with an internal API and a Radio HAL wrapper above, with the Radio HAL wrapper being the only user of the driver internal API.

(Please note: I'm not against sharing code between different IEEE 802.15.4 driver; quite the contrary. I just would like to prevent having e.g. a ieee802154_driver_t::transmit() function that contains a single line of code calling e.g. at86rf2xx_transmit(). This would add one function call of overhead without improving code structure or maintainability. So if there is indeed a reason to implement one function of the HAL as a layer on top of more low level driver function (e.g. avoiding code duplication), I'm also all in for that. But please let us avoid adding a layer for the sake of adding a layer.)

Comment on lines 344 to 345
Unlike the `netdev` interface, the 802.15.4 Radio HAL requires only these three
states to drive the radio. This is due to the fact that the radio is either in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not true. The programmer needs to take into account that a frame can only be fetched after it has been received. So the actual number of states to take into account is larger. (See above.)

Copy link
Member Author

@jia200x jia200x Apr 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what I wrote here is not accurate (see #13943 (comment))

To me, this is an upper layer logic. The HAL guarantees that it's safe to access the frame buffer on RX_DONE. But the transceiver state could be TRX_OFF or TX_ON, since the FB access is independent of the state.

That's what I mean with the fact these are only "transceiver states" (PHY states). In fact, these states are mostly independent to MAC and device states (the exception is the TRX_OFF during sleep, since it's implicit that the transceiver is disabled when the device sleeps)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct. Only three states are sufficient, when only looking at the transceiver. (But you might want to highlight a bit more that these states should only cover the transceiver?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will clarify that

doc/memos/rdm-draft-alamos-ieee802154-radio-hal.md Outdated Show resolved Hide resolved
Comment on lines 596 to 608
/**
* @brief Set the sleep state of the device (sleep or awake)
*
* @param[in] dev IEEE802.15.4 device descriptor
* @param[in] sleep whether the device should sleep or not.
*
* @post the transceiver state is @ref IEEE802154_TRX_STATE_TRX_OFF,
* regardless of the value of @p sleep
*
* @return 0 on success
* @return negative errno on error
*/
int (*set_sleep)(ieee802154_dev_t *dev, bool sleep);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually needed? How about renaming the start() to on() and providing an off() function in addition?

To me, there is little to no different between starting a network device after init() or after it has been put into lowest power mode.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I like the on/off approach better than the start. I just wanted to avoid extra functions, but I think it would give even more control

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, what do you think of moving the "set_sleep" function to "off"?

Basically the behavior would be the same

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I think that's what you meant, am I right?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to self: since we are now using "on" and "off", the "on" function must initialize all values with the default "IEEE802154_XXX" params (e.g CONFIG_IEEE802154_DEFAULT_CHANNEL). Radios lose register content when turned off

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm maybe it's even better to not set anything. There are some changes the MAC already will set a different PHY state than the default one when waking up the radio

Comment on lines 652 to 666
/**
* @brief Set IEEE802.15.4 promiscuous mode
*
* @pre the device is not sleeping
*
* @note this function pointer can be NULL if the device doesn't support
* hardware address filtering.
*
* @param[in] dev IEEE802.15.4 device descriptor
* @param[in] enable whether the promiscuous mode should be enabled or not.
*
* @return 0 on success
* @return negative errno on error
*/
int (*set_promiscuous)(ieee802154_dev_t *dev, bool enable);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have this as first element? It would be super nice if the struct of the IEEE 802.15.4 HAL implementation would overlap with a Wireless Network Device HAL. Otherwise (assuming the idea of "object orientation" is agreed upon), one might end up having to device two structs.

This feature is certainly something I would put into the generic Wireless Network Device HAL.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can rearrange it as we want here

Comment on lines 752 to 785
typedef enum {
/**
* @brief the transceiver detected a valid Start Frame Delimiter, which
indicates the radio started to receive a packet.
*/
IEEE802154_RADIO_RX_START,

/**
* @brief the transceiver received a packet and there is a packet in the
* internal framebuffer.
*
* The transceiver is in @ref IEEE802154_TRX_STATE_RX_ON state when
* this function is called, but with framebuffer protection (either
* dynamic framebuffer protection or disabled RX). Thus, the packet
* will not be overwritten before calling the @ref ieee802154_radio_read
* function. However, @ref ieee802154_radio_read MUST be called in
* order to receive new packets. If there is no interest in the
* packet, the function can be called with a NULL buffer to drop
* the packet.
*/
IEEE802154_RADIO_RX_DONE,

/**
* @brief the transceiver finished sending a packet or the
* retransmission procedure.
*
* If the radio supports frame retransmissions the
* @ref ieee802154_radio_get_tx_status MAY be called to retrieve useful
* information (number of retries, frame pending bit, etc). The
* transceiver is in @ref IEEE802154_TRX_STATE_TX_ON state when this function
* is called.
*/
IEEE802154_RADIO_TX_DONE,
} ieee802154_trx_status_t;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the events can be generic to all wireless network devices.

Additionally, as already said above: I would like this API to allow non-blocking use. As a result, more events are needed. E.g. in a blocking mode you can simply do:

prepare();
transmit();

But if the prepare() does not block, the transmit() might return -EAGAIN (or -EBUSY?) when the prepare() has not yet completed. (prepare() could take some significant time if the device has the frequency generator offline.)

Instead, an event would need to be triggered when the device is ready to transmit. One could add some convenience code to block until an event is triggered, e.g. it could be used like this:

prepare();
await_event(READY_FOR_TX);
transmit();

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing that I'm concerned here: if we need to care about "blocking" stuff here at radio level, IMO a higher layer is not doing it's job well.

file

By design, these Wireless MAC layers work under a request/confirm architecture (MLME-SAP, MCPS-SAP) which are by nature non blocking. A user "request" the MAC layer to send data, and the MAC layer might respond with a "queue is full" or "queued".

So, the MAC is the non-blocking layer in the IEEE802.15.4 architecture.

Also, consider the IEEE802.15.4 PHY definition:
file

Basically the Radio HAL is the RF-SAP in the IEEE802.15.4 PHY model. The specification doesn't say anything about the blocking/non-blocking nature of the PD-SAP interface, but IMO the interface definition suggest that this is not the layer that should care about this.
As an example, this is the "confirm" primitive of the PD-DATA.request command (used to send data at PHY level):

file

Another example, some APIs (OpenWSN) expect the radio to simply transmit data (because the media access is already controlled by the MAC). In fact, the "transmit" function is a void function and the stack has no way to deal with a -EBUSY (if it happens for whatever reason)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing that I'm concerned here: if we need to care about "blocking" stuff here at radio level, IMO a higher layer is not doing it's job well.

Well, the current situation is that e.g. GNRC happily calls send() at every given point in time no matter if the device is currently already sending a frame or not. If the driver does not block during send() and returns an -EBUSY, the frame is just lost. As a result when e.g. using 6LoWPAN and sending any packet needing layer 2 fragmentation, all but the first fragment are lost.

In fact, the "transmit" function is a void function and the stack has no way to deal with a -EBUSY (if it happens for whatever reason)

This would not be an issue, if the upper layer waits for the driver to be ready for transmission first. E.g. in the state machine above, a transition to the state "Ready for TX" would trigger an event the MAC could wait for / react to. And in that state, a transmit() never fails.

So, if the MAC layer synchronizes with the driver, an -EBUSY should never ever happen.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the current situation is that e.g. GNRC happily calls send() at every given point in time no matter if the device is currently already sending a frame or not. If the driver does not block during send() and returns an -EBUSY, the frame is just lost. As a result when e.g. using 6LoWPAN and sending any packet needing layer 2 fragmentation, all but the first fragment are lost.

I strongly agree with this. But the actual problem here is the lack of MAC layers, not the implementation of the send function. In practice, GNRC would interact with the MAC layer via non blocking requests (it becomes even more important when using slotted MACs, unless we want to block until the slot is available)

So, if the MAC layer synchronizes with the driver, an -EBUSY should never ever happen.

That's my point. If the radio HAL provides resources to know when it's safe to send again (event callbacks, etc), why do we need to add an -EBUSY here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's my point. If the radio HAL provides resources to know when it's safe to send again (event callbacks, etc), why do we need to add an -EBUSY here?

Sorry for being unclear, the -EBUSY was not my point. I would only like the Radio HAL to define that a call to transmit() fails, unless prepare() has been called previously and the device signaled completion of the prepare(). I don't mind if the error is then reported using a return value, or an assert() blows, or a log message is generation, or the frame is silently dropped. I really just want the transmit() to fail when called in an inappropriate state.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the intention here, but under what circumstance you would get to a state where you are not sure if it's safe to call "transmit"? With this API, when the "prepare" function finishes the radio is ready to transmit (FB is already loaded). The only case I could imagine is the case where we use some kind of DMA to prepare a frame. But with the current "iolist" this would be hard anyway (I'm not aware of scattering/gathering DMA APIs).

If a MAC layer decides to prepare in a previous cycle and transmit later, the API can in practice rely on the TX_DONE event.

I just did some measurements for the samr21-xpro (basic mode, direct send). For sending 127 bytes packet, it takes around 400 us to "prepare" the frame and ~4300 us from TX_START to TX_END (TOA). Considering the LIFS is 640us, calling "prepare" right after the TX_DONE event is still compliant with sending packets with IFS >= 640us.

Maybe we could add a precondition that the "prepare" function should be called while there's no ongoing transmission (e.g after TX_DONE)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it allowed to call prepare() when the transceiver state is not TX? When prepare() really just copies over the buffer and does not need to wait for the PLL to settle, prepare() could block. (As said, to me blocking for an SPI transfer does count as "non-blocking" 😉) In that case, there is no need for an event to notify the HAL user that prepare() finished.

However, a simplistic user of the HAL (e.g. an ALOHA MAC) would still have to await an event for calling prepare() again after having issued transmit(). For simplicity, this could just be the TX_COMPLETED event.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it allowed to call prepare() when the transceiver state is not TX?

Yes! In fact, it is preferred to call prepare during TRX_OFF (if there are no timing constraints, considering switching from RX_ON to TX_ON takes 1 us but from TRX_OFF ~200 us). TRX_OFF consumes ~0.5 mA in while TX_ON consume ~6 mA (at86rf2xx).

However, a simplistic user of the HAL (e.g. an ALOHA MAC) would still have to await an event for calling prepare() again after having issued transmit(). For simplicity, this could just be the TX_COMPLETED event.

Exactly, that's the idea

@maribu
Copy link
Member

maribu commented Apr 26, 2020

(i) Do I see it correctly that your main goal is to reduce duplication between HALs of different technologies?

This would also be possible to some degree. But I mostly have code duplication between users of the HALs in mind.

E.g. primitive non-persistent CSMA implementation could likely be implemented for every wireless network device; in the most primitive form a transceiver that provides the current RSSI value would be sufficient. If the subset of the features common to all wireless network devices could be provided in a way to allow portable access to those features, that could be implemented once and used for all wireless transceivers.

(Note: The function to perform a single CCA could be part of the generic Wireless Network Device HAL and implemented fully in software by primitive transceivers; e.g. by fetching the current RSSI value and comparing it to a threshold. If a device not supporting any means to perform a CCA, it could return -ENOTSUP.)

(ii) Where would you place a technology specific MAC layer in your proposal?

On top of the most generic HAL that provides all features needed. E.g. TSCH is targeting IEEE 802.15.4, so there is no way to implement it on top of a more generic HAL. But a primitive CSMA MAC should work fine on top of the generic HAL.

Note that I see the generic "Wireless Network Device HAL" not as an extra layer, but as a subset of the function pointers in the "IEEE 802.15.4 HAL". So a driver implementing IEEE 802.15.4 HAL would not need to do anything in addition to be compatible with the "Wireless Network Device HAL" - this compatibility would come for free.

@jia200x
Copy link
Member Author

jia200x commented Apr 26, 2020

A general request: since there are several pending technical discussions, could we maybe try to focus first in these aspects (transceiver state, blocking-non blocking stuff) before discussing about DRY/maintainability?

I'm in for everything that makes life easier when adding new HALs or maintaining code, but IMO we should first agree on the design decisions.

@jia200x jia200x closed this Apr 26, 2020
Idle. The former will set the Abstract State Machine state to the transient
state RX BUSY and then to Idle on `RF_RX_DONE` event. The latter will change
the state to Idle only if the radio is currently not receiving a frame.
TODO: Calls
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this sneaked in by accident? Having a TODO in the final document does not make it very final.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, good catch. I forgot that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

Comment on lines 569 to 577
# 10 Revision

- Rev6: Relax Abstract State Machine transitions
- Rev5: Add Source Address Matching
- Rev4: Add Async behavior
- Rev3: Merge Device Driver and HAL
- Rev2: Update sleep preconditions
- Rev1: Address on/off functions
- Rev0: initial document
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these can go, as the doc was not part of master.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alright

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rev0: initial document can stay, as far as I understand.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@jia200x
Copy link
Member Author

jia200x commented Feb 20, 2023

When I was working on #19213 I found it wasn’t clearly documented what events to expect in what state and how they should be handled.

I added in the last push a table that summarizes that.

@jia200x
Copy link
Member Author

jia200x commented Feb 20, 2023

I clarified the handling of each state a bit more.

@miri64
Copy link
Member

miri64 commented Feb 20, 2023

static-tests are also complaining.

@jia200x
Copy link
Member Author

jia200x commented Feb 20, 2023

done! I guess I have to rename the file to rfm0002 at some point too...

EDIT: Renamed!

@jia200x
Copy link
Member Author

jia200x commented Feb 21, 2023

May I squash?

Copy link
Member

@miri64 miri64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yepp. Please also add RDM2 to the list in the README and see the following comments.

Comment on lines 291 to 297
for the implementation of the Event Notification callback (e.g the `RX_DONE`
event might post an event to an event queue in order to fetch the packet or
`PLL_LOCK` might unlock a mutex).

The full list of events and implications are defined in the Interface
Definition section.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PLL_LOCK is not mentioned in the Interface Definition section at all. Is that intentional?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of. The RDM does not prevent adding such an event, but we didn't include it in the final implementation because it's kind of a niche use case. We can still leave it there to point out that other events can be introduced if needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can still leave it there to point out that other events can be introduced if needed.

Maybe rather remove the example here and add a mention of the introduction of other events still?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

@jia200x
Copy link
Member Author

jia200x commented Feb 21, 2023

squashed!

@jia200x jia200x force-pushed the pr/rdm/radio_hal branch 2 times, most recently from 0953490 to c767296 Compare February 22, 2023 09:18
@jia200x
Copy link
Member Author

jia200x commented Feb 22, 2023

@miri64 proposed to add the corresponding IEEE802154_RADIO_ prefix to the event references. I just amended that change directly and pushed.

Copy link
Member

@miri64 miri64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RE-ACK

@miri64 miri64 added CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR CI: skip compile test If set, CI server will run only non-compile jobs, but no compile jobs or their dependent jobs and removed State: waiting for author State: Action by the author of the PR is required labels Feb 22, 2023
@miri64
Copy link
Member

miri64 commented Feb 22, 2023

bors merge

@bors
Copy link
Contributor

bors bot commented Feb 22, 2023

🕐 Waiting for PR status (GitHub check) to be set, probably by CI. Bors will automatically try to run when all required PR statuses are set.

@riot-ci
Copy link

riot-ci commented Feb 22, 2023

Murdock results

✔️ PASSED

7ac3052 rdm/radio_hal: add radio HAL design document

Success Failures Total Runtime
1 0 1 55s

Artifacts

@miri64
Copy link
Member

miri64 commented Feb 22, 2023

bors merge

@bors
Copy link
Contributor

bors bot commented Feb 22, 2023

Build succeeded:

@bors bors bot merged commit e738fdd into RIOT-OS:master Feb 22, 2023
@miri64
Copy link
Member

miri64 commented Feb 22, 2023

🎉

@jia200x
Copy link
Member Author

jia200x commented Feb 22, 2023

it took almost three years! Thank you everyone for this!

@emmanuelsearch
Copy link
Member

Thanks for pushing all this time!

| IDLE | | RX |
| |<-------------| |
+---------+ +---------+
SET_TRX_STATE
Copy link
Contributor

@benpicco benpicco Feb 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting back to this, is there not an implicit TX state hidden inside IDLE when the device is busy sending the actual frame?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just checked this. There is definitely an implicit TX state, but it's already marked by the request_transmit and confirm_transmit function (which prevents calling set_trx_state).But to keep things simple, it just assumes the radio is either listening (RX), ready to transmit (TX) or the transceiver is off (which is not equivalent to turning off the device with off).

@MrKevinWeiss MrKevinWeiss added this to the Release 2023.04 milestone Apr 25, 2023

The author of this memo can be contacted via email at jose.alamos@haw-hamburg.de

# Appendix A: Radio HAL header file (2023-02)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of including this here?
This will just become outdated (if it isn't already)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was just a way to keep track of the Radio HAL status at the moment of merge. A commit hash could have done the same though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: doc Area: Documentation Area: RDM Area: Discussions on RIOT Developer Memos CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR CI: skip compile test If set, CI server will run only non-compile jobs, but no compile jobs or their dependent jobs Discussion: RFC The issue/PR is used as a discussion starting point about the item of the issue/PR Process: needs >1 ACK Integration Process: This PR requires more than one ACK Type: new feature The issue requests / The PR implemements a new feature for RIOT
Projects
None yet
Development

Successfully merging this pull request may close these issues.