-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RDM: The 802.15.4 Radio HAL #13943
RDM: The 802.15.4 Radio HAL #13943
Conversation
* If the radio is still transmitting, it should block until it is safe to | ||
* write to the frame buffer again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please return -EBUSY here and make the upper layer block?
Blocking in the radio thread (while still servicing radio events) is ugly and error-prone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we can. Ideally a MAC layer would synchronize access to the radio, so this shouldn't happen. But it doesn't hurt to return -EBUSY here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I strongly agree with @benpicco. E.g. the at86rf2xx
and the cc110x
driver have special quirks to implement this behavior that is both ugly and complex. But without those quirks, a simple ping6
with a size big enough to cause layer 2 fragmentation would currently result in the driver ignoring the subsequent frames while busy sending the first fragment. And the upper layers don't retry, resulting in the whole packet being lost.
With most use cases only sending IPv6 packets fitting into a single frame, this issue is also often overlooked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree that the bottleneck should not be here.
But even with an -EBUSY, the MAC layer on top should synchronize this anyway.
But I agree that this could improve a lot our current state while we don't have a full 802.15.4 MAC. So, I'm in for the change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm I remember the design decision behind this:
Most radios (if not all) allow to write packets into the framebuffer while the radio is still transmitting. IIRC, the at86rf233
allows to write again into the framebuffer 192 us after the TX_START command. I agree that under no circumstances we should block during the whole TX procedure, but this delay is really short compared to the LIFS (640 us)
I've seen some radios that simply "sleep" for 192 us before trying to load the framebuffer again.
How to proceed? I can think of:
- Drop this "load before TX done" feature and simply return -EBUSY if the radio is BUSY
- Try to return -EBUSY if it's not possible to load the framebuffer (T<192us)
- Block for 192 us
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/** Check if the radio driver is currently receiving a packet /
int ( receiving_packet)(void);
I'm strongly against this. This interface will inherently result in TOC/TOU bugs.
Even if the driver accepts frames while e.g. RX or TX without ACK request is ongoing,
transmit
would then just set a flag to execute the TX as the next operation.
I'm not sure if I understand this correctly. But as @jia200x already pointed out, this function should have predictable timing behavior and very little delay to allow implementing TDMA(ish) MAC schemes.
In slotted MAC layers, the state change can be forced calling
ieee802154_radio_set_trx_state(dev, TX_ON)
directly.
Are you sure you really want to allow explicitly setting the TX state? The internal state machines of the hardware can be rather complex and hard to map reasonably to the HAL. Wouldn't it be better to just define that prepare()
in addition to loading the buffer brings the device into a state in which starting a transmission can be triggered with as little delay as possible? So if some intermediate state is needed to put the device to before being able to transmit (e.g. some transceivers turn their frequency generator of in an idle state and need some time for the phase locked loop to settle), prepare()
would just do so. And transmit()
then does implicitly the state change to TX and back to IDLE/RX once done.
So only the following three states IMHO should be settable through the HAL:
OFF: Device is in the lowest possible power consumption mode. Switching to IDLE or RX can require some time. Neitherprepare()
nortransmit()
are allowed to be called in this state.IDLE: Device is not listing for incoming packets, butprepare()
andtransmit()
are allowed to be called. Switching to RX might be faster than from OFF and more power might be consumed.RX: Device is listening for packets. A call toprepare()
is allowed and may implicitly change the state to an internal state appropriate for fast transmission. Thus, afterprepare()
the device might not be listening for incoming packets anymore until either the transmission is triggered or aborted.
I think exposing these three states can be implemented reasonably for every network device. And I don't see a use case for having explicit control over more states.
Update: I added a more detailed proposal of which states an abstract state machine representation of a Radio could have
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I didn't consider slotted MACs.
In that case it might be possible for prepare
to succeed but transmit
would return -EBUSY
if the radio is still receiving.
But if the driver then generates en event to the upper layer when it switches back to idle, that upper layer can decide to do the transmit
immediately - or wait for the next time slot.
That way the whole TX after RX and MAC access considerations could be kept out of the radio driver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm strongly against this. This interface will inherently result in TOC/TOU bugs.
I agree, just threw some ideas without deep thinking :P
I'm not sure if I understand this correctly. But as @jia200x already pointed out, this function should have predictable timing behavior and very little delay to allow implementing TDMA(ish) MAC schemes.
Yes. The radio operation should be more or less predictable.
I'm still confused about the blocking/non-blocking thing here. I strongly agree that a non-blocking mechanism is needed for sending packets, but the right layer to handle that is the MAC. In fact, the MAC MUST be non-blocking because in most cases packets are delayed (TSCH, slotted mode, sending beacons over data, etc).
In practice, an upper layer would request the MAC layer to send a packet and the MAC might refuse it or send when it's the right time. The MAC layer would read packets from a queue and send them one by one.
The question is, do we need a non-blocking radio HAL to implement a a non-blocking MAC? If the bottleneck is in the radio, the MAC is doing something wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the documentation above needs to be updated, no that the request/confirm paradigm was settled on.
Two additional remarks:
- Can you replace the term packet by frame?
- It would be nice to also state how long the driver can be sure "own" the buffer passed. An peripheral driver with DMA might not need to copy the iolist into an contiguous RAM buffer. (E.g the Ethernet driver of the STM32 fetches the frame via DMA, and the DMA happily accepts a linked list of DMA descriptors with each containing only segment of the frame. This concept is not specific to the Ethernet case, so peripheral radios might provide the same feature.) So if the driver would "own" the iolist until
confirm_transmit()
returns success or an error, this would allow zero-copy transmission. It would come with the price of extra care being taken by the upper layer, if the upper layer truly exploits the features of the async API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the documentation above needs to be updated, no that the request/confirm paradigm was settled on.
I agree.
- Can you replace the term packet by frame?
Yes. It was in my bucketlist, but forgot to update it.
- It would be nice to also state how long the driver can be sure "own" the buffer passed. An peripheral driver with DMA might not need to copy the iolist into an contiguous RAM buffer. (E.g the Ethernet driver of the STM32 fetches the frame via DMA, and the DMA happily accepts a linked list of DMA descriptors with each containing only segment of the frame. This concept is not specific to the Ethernet case, so peripheral radios might provide the same feature.) So if the driver would "own" the iolist until confirm_transmit() returns success or an error, this would allow zero-copy transmission. It would come with the price of extra care being taken by the upper layer, if the upper layer truly exploits the features of the async API.
Good catch, I think this must be specified. I agree with "owning" the packet until confirm_transmt
is returns success or error.
* NULL if this information is not needed. | ||
* | ||
* @return number of bytes written in @p buffer (0 if @p buf == NULL) | ||
* @return -ENOBUFS if the packet doesn't fit in @p |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the packet be discarded here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Frame is the correct term at layer 2. We should try carefully to get those terms correct to avoid confusion. (Sadly, RIOT's doc uses the term packet regardless of the layer at many places. I should grep
for all usage of packet
and verify or fix the term.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the packet is always discarded
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I forgot something here (thanks for pointing it out @maribu ). All these packets are "Physical Service Data Unit" (PSDU) packets. I will update that
Edit: frames... I meant, frames
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add a comment that indicates the frame is always discarded
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PSDU = frame right? Lets decide for one of both. You should add it to the "Acronyms" or "Definitions" section then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess PSDU is more explicit. (Frame can have other meanings in other contexts.) Also, PSDU sounds smarter :-)
/** | ||
* @brief the device support the IEEE802.15.4 Sub GHz band | ||
*/ | ||
IEEE802154_CAP_SUB_GHZ, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IEEE802154_PHY_MODES
could be another capability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this can be used to report available PHY modes too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, this list can/should be expanded. There are other capabilities that are not listed here but exist anyway (e.g LBT, CSMA_BACKOFF timers, etc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The list can and should be extended as explained here. Personally, I hope that it won't get extended as much as netopt
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expect similarities between the HALs of different network devices. Maybe it makes sense to do "object oriented" programming here. E.g. having a base "Network Device HAL" as base. (And even a "base struct" as driver in the implementation.) Something like this maybe:
And every child would provide all the features and API of all of its parents, (or at least return I believe that there are many use cases where the code does not care if what network device it talks to. It would be a pity, if that could would have to be re-written for every HAL. But not having a single "on-size-fits-all" HAL, but separate HALs to allow access to technology-specific features seems to be a very compelling idea - if compatibility between the HALs (at a level that makes sense) is provided. |
Well, one of the reasons to implement the "ieee802154_radio_xxx" wrappers (not mentioned in the RDM but available in the API implementation) was to allow these kind of optimizations. You can re-arrange the function pointers without breaking the API. Actually, I remember there were some offline comments about implementing "device" structures, which could include "Device Management" stuff (start, irq_handler, set_sleep, etc). NB there are some classes of devices that might not require a HAL. There are several technologies that have only one device (LoRa, CC110x). We had some offline conversations with @miri64 and @PeterKietzmann , and we thought it's important to check which technologies require a HAL in the mid term. E.g for LoRaWAN, it should be enough to call the |
I like your idea, but I am concerned about boiling the ocean here. |
Consider the diagram above as a vision where the changes should lead to. For now, only the following part would be designed and implemented:
But with the vision taken into account when designing the architecture, adding other HALs can be done later on one PR at a time. It will still be unlikely that the "Network Device HAL" and the "Wireless Network Device HAL" would survive adding new interfaces into this unchanged. But if the architecture is developed with this vision in mind, I'm sure it will be possible that every concrete HAL implements a parent "Wireless Network Device HAL" (or "Wired Network Device HAL"), which in turn implements the "Network Device HAL". And I don't think it would be an issue if the APIs during major architectural changes is not stable for a couple of release. IMO end users care about the high level APIs (like the Sock API or POSIX sockets) rather than network stack internals, and "core developers" are happily accepting the pain of transition to a new API, if the benefits of the new API are compelling enough. |
@maribu thanks for your investigation. (i) Do I see it correctly that your main goal is to reduce duplication between HALs of different technologies? (ii) Where would you place a technology specific MAC layer in your proposal? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some inline comments. I have to general concerns:
- I would like to have the API non-blocking in order to allow implementing a network stack in a single thread. With blocking functions, at least one thread per network device is needed. (A shared thread would be unable to react to events of other network devices when an operation performed on another device blocks)
- I think the state machine with only three states is not simplifying life, but quite the opposite:
- Operations like
len()
orread()
are only possible to be called when the driver is not only in RX, but additionally has a frame in the receive buffer. (So at least two distinct states in RX to take into account) - The
transmit()
operations requires a previous call toprepare()
. (So at least two distinct states in TX) - ==> I'm convinced that explicitly spelling out the states the HAL can have (so an abstract state machine of the HAL, not the fine-grained driver internal states) will make life easier
- Operations like
| | +-----------------------------------------------+ | ||
| | | | | ||
netdev_driver_t | | 802.15.4 Radio HAL | | ||
| | | | | ||
| | +-----------------------------------------------+ | ||
| | ^ | ||
| | | | ||
| | Device Driver API | ||
| | | | ||
v | v | ||
+---------------------+ | +-----------------------------------------------+ | ||
| | | | | | ||
| Device Driver | | | Device Driver | | ||
| | | | | | ||
+---------------------+ | +-----------------------------------------------+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really needed that the HAL sits on top of the driver? How about something like this:
+--------------------------------------+
| +--------------------+ |
| | 802.15.4 Radio HAL | Device Driver |
| +--------------------+ |
+--------------------------------------+
So that the Device Driver implements the 802.15.4 Radio HAL, rather than being called by the HAL? By having common code in a (say) driver_ieee802154_common
module, no code duplication is needed.
Some drivers in RIOT currently have e.g. a foo_send()
function in addition to the netdev_driver_t::send()
. But the netdev_driver_t::send()
implementation often just calls the foo_send()
without adding any features. This overhead seemingly serves no purpose. I really don't see why we need a foo_send()
at all. IMO the HAL should be the only API the driver provides for features that can be access through the HAL. (Features that are not part of the HAL, e.g. because they are specific to a single device, would obviously be still exposed outside of the common HAL.)
hardware-specific device driver, by implementing the Radio HAL API. Since | ||
devices drivers do not depend on the Radio HAL, it is still possible to use a | ||
the raw driver of a specific device, for testing purposes or accessing device | ||
specific features. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm strongly against this. What is the use case of a hardware-specific API for features that are accessible through the Radio HAL API? If the driver implements the Radio HAL API natively and as efficient as a driver specific API would be, what is the harm in using the Radio HAL API?
(Obviously: The driver should still be allowed to provide features not covered by the HAL by extending it with device specific APIs.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I'm not strongly in favor of having the Radio HAL on top of the device driver, but I tried to mimic what we have in other devices (e.g saul
on top of sensor devices). Besides testing, I agree that the use cases for a "raw device driver" are rare.
If people don't care about that, I'm fine with the approach in #13943 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also note: When new drivers for "strange" transceivers (e.g. let's say the CC1101 or the PR for adding netdev integration of the nRF24L01+) get merged, it would be very helpful to already have a good testing application in place. If the Radio HAL is the only API and a common part of the HAL exists, a shared testing application will likely find it's way into RIOT. That would ease testing for new transceivers in a "one-of-a-kind" class a lot, as those wouldn't need their own test application. (Except for features unique to them, if implemented.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I'm OK with that approach. @miri64 @PeterKietzmann @benpicco @bergzand is it OK for you too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My idea is that the common HAL component is a subset of the 802.15.4 specific HAL. Thus, every device implementing the 802.15.4 Radio HAL would implicitly also be compliant with the common HAL.
So: The IEEE 802.15.4 HAL would be technology specific and technology aware, but it would have a generic subset. To me, this works fine and is not contradicting.
But I think what @jia200x above agreed to is allowing and encouraging device drivers to directly implement the functions of the Radio HAL, rather than writing an 802.15.4 driver with an internal API and a Radio HAL wrapper above, with the Radio HAL wrapper being the only user of the driver internal API.
(Please note: I'm not against sharing code between different IEEE 802.15.4 driver; quite the contrary. I just would like to prevent having e.g. a ieee802154_driver_t::transmit()
function that contains a single line of code calling e.g. at86rf2xx_transmit()
. This would add one function call of overhead without improving code structure or maintainability. So if there is indeed a reason to implement one function of the HAL as a layer on top of more low level driver function (e.g. avoiding code duplication), I'm also all in for that. But please let us avoid adding a layer for the sake of adding a layer.)
Unlike the `netdev` interface, the 802.15.4 Radio HAL requires only these three | ||
states to drive the radio. This is due to the fact that the radio is either in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not true. The programmer needs to take into account that a frame can only be fetched after it has been received. So the actual number of states to take into account is larger. (See above.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what I wrote here is not accurate (see #13943 (comment))
To me, this is an upper layer logic. The HAL guarantees that it's safe to access the frame buffer on RX_DONE. But the transceiver state could be TRX_OFF or TX_ON, since the FB access is independent of the state.
That's what I mean with the fact these are only "transceiver states" (PHY states). In fact, these states are mostly independent to MAC and device states (the exception is the TRX_OFF during sleep, since it's implicit that the transceiver is disabled when the device sleeps)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're correct. Only three states are sufficient, when only looking at the transceiver. (But you might want to highlight a bit more that these states should only cover the transceiver?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I will clarify that
/** | ||
* @brief Set the sleep state of the device (sleep or awake) | ||
* | ||
* @param[in] dev IEEE802.15.4 device descriptor | ||
* @param[in] sleep whether the device should sleep or not. | ||
* | ||
* @post the transceiver state is @ref IEEE802154_TRX_STATE_TRX_OFF, | ||
* regardless of the value of @p sleep | ||
* | ||
* @return 0 on success | ||
* @return negative errno on error | ||
*/ | ||
int (*set_sleep)(ieee802154_dev_t *dev, bool sleep); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this actually needed? How about renaming the start()
to on()
and providing an off()
function in addition?
To me, there is little to no different between starting a network device after init()
or after it has been put into lowest power mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I like the on/off
approach better than the start
. I just wanted to avoid extra functions, but I think it would give even more control
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, what do you think of moving the "set_sleep" function to "off"?
Basically the behavior would be the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I think that's what you meant, am I right?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note to self: since we are now using "on" and "off", the "on" function must initialize all values with the default "IEEE802154_XXX" params (e.g CONFIG_IEEE802154_DEFAULT_CHANNEL). Radios lose register content when turned off
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm maybe it's even better to not set anything. There are some changes the MAC already will set a different PHY state than the default one when waking up the radio
/** | ||
* @brief Set IEEE802.15.4 promiscuous mode | ||
* | ||
* @pre the device is not sleeping | ||
* | ||
* @note this function pointer can be NULL if the device doesn't support | ||
* hardware address filtering. | ||
* | ||
* @param[in] dev IEEE802.15.4 device descriptor | ||
* @param[in] enable whether the promiscuous mode should be enabled or not. | ||
* | ||
* @return 0 on success | ||
* @return negative errno on error | ||
*/ | ||
int (*set_promiscuous)(ieee802154_dev_t *dev, bool enable); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have this as first element? It would be super nice if the struct of the IEEE 802.15.4 HAL implementation would overlap with a Wireless Network Device HAL. Otherwise (assuming the idea of "object orientation" is agreed upon), one might end up having to device two structs.
This feature is certainly something I would put into the generic Wireless Network Device HAL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can rearrange it as we want here
typedef enum { | ||
/** | ||
* @brief the transceiver detected a valid Start Frame Delimiter, which | ||
indicates the radio started to receive a packet. | ||
*/ | ||
IEEE802154_RADIO_RX_START, | ||
|
||
/** | ||
* @brief the transceiver received a packet and there is a packet in the | ||
* internal framebuffer. | ||
* | ||
* The transceiver is in @ref IEEE802154_TRX_STATE_RX_ON state when | ||
* this function is called, but with framebuffer protection (either | ||
* dynamic framebuffer protection or disabled RX). Thus, the packet | ||
* will not be overwritten before calling the @ref ieee802154_radio_read | ||
* function. However, @ref ieee802154_radio_read MUST be called in | ||
* order to receive new packets. If there is no interest in the | ||
* packet, the function can be called with a NULL buffer to drop | ||
* the packet. | ||
*/ | ||
IEEE802154_RADIO_RX_DONE, | ||
|
||
/** | ||
* @brief the transceiver finished sending a packet or the | ||
* retransmission procedure. | ||
* | ||
* If the radio supports frame retransmissions the | ||
* @ref ieee802154_radio_get_tx_status MAY be called to retrieve useful | ||
* information (number of retries, frame pending bit, etc). The | ||
* transceiver is in @ref IEEE802154_TRX_STATE_TX_ON state when this function | ||
* is called. | ||
*/ | ||
IEEE802154_RADIO_TX_DONE, | ||
} ieee802154_trx_status_t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the events can be generic to all wireless network devices.
Additionally, as already said above: I would like this API to allow non-blocking use. As a result, more events are needed. E.g. in a blocking mode you can simply do:
prepare();
transmit();
But if the prepare()
does not block, the transmit()
might return -EAGAIN
(or -EBUSY
?) when the prepare()
has not yet completed. (prepare()
could take some significant time if the device has the frequency generator offline.)
Instead, an event would need to be triggered when the device is ready to transmit. One could add some convenience code to block until an event is triggered, e.g. it could be used like this:
prepare();
await_event(READY_FOR_TX);
transmit();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only thing that I'm concerned here: if we need to care about "blocking" stuff here at radio level, IMO a higher layer is not doing it's job well.
By design, these Wireless MAC layers work under a request/confirm architecture (MLME-SAP, MCPS-SAP) which are by nature non blocking. A user "request" the MAC layer to send data, and the MAC layer might respond with a "queue is full" or "queued".
So, the MAC is the non-blocking layer in the IEEE802.15.4 architecture.
Also, consider the IEEE802.15.4 PHY definition:
Basically the Radio HAL is the RF-SAP in the IEEE802.15.4 PHY model. The specification doesn't say anything about the blocking/non-blocking nature of the PD-SAP interface, but IMO the interface definition suggest that this is not the layer that should care about this.
As an example, this is the "confirm" primitive of the PD-DATA.request command (used to send data at PHY level):
Another example, some APIs (OpenWSN) expect the radio to simply transmit data (because the media access is already controlled by the MAC). In fact, the "transmit" function is a void
function and the stack has no way to deal with a -EBUSY (if it happens for whatever reason)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only thing that I'm concerned here: if we need to care about "blocking" stuff here at radio level, IMO a higher layer is not doing it's job well.
Well, the current situation is that e.g. GNRC happily calls send()
at every given point in time no matter if the device is currently already sending a frame or not. If the driver does not block during send()
and returns an -EBUSY
, the frame is just lost. As a result when e.g. using 6LoWPAN and sending any packet needing layer 2 fragmentation, all but the first fragment are lost.
In fact, the "transmit" function is a void function and the stack has no way to deal with a
-EBUSY
(if it happens for whatever reason)
This would not be an issue, if the upper layer waits for the driver to be ready for transmission first. E.g. in the state machine above, a transition to the state "Ready for TX" would trigger an event the MAC could wait for / react to. And in that state, a transmit()
never fails.
So, if the MAC layer synchronizes with the driver, an -EBUSY
should never ever happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the current situation is that e.g. GNRC happily calls send() at every given point in time no matter if the device is currently already sending a frame or not. If the driver does not block during send() and returns an -EBUSY, the frame is just lost. As a result when e.g. using 6LoWPAN and sending any packet needing layer 2 fragmentation, all but the first fragment are lost.
I strongly agree with this. But the actual problem here is the lack of MAC layers, not the implementation of the send function. In practice, GNRC would interact with the MAC layer via non blocking requests (it becomes even more important when using slotted MACs, unless we want to block until the slot is available)
So, if the MAC layer synchronizes with the driver, an -EBUSY should never ever happen.
That's my point. If the radio HAL provides resources to know when it's safe to send again (event callbacks, etc), why do we need to add an -EBUSY here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's my point. If the radio HAL provides resources to know when it's safe to send again (event callbacks, etc), why do we need to add an -EBUSY here?
Sorry for being unclear, the -EBUSY
was not my point. I would only like the Radio HAL to define that a call to transmit()
fails, unless prepare()
has been called previously and the device signaled completion of the prepare()
. I don't mind if the error is then reported using a return value, or an assert()
blows, or a log message is generation, or the frame is silently dropped. I really just want the transmit()
to fail when called in an inappropriate state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the intention here, but under what circumstance you would get to a state where you are not sure if it's safe to call "transmit"? With this API, when the "prepare" function finishes the radio is ready to transmit (FB is already loaded). The only case I could imagine is the case where we use some kind of DMA to prepare a frame. But with the current "iolist" this would be hard anyway (I'm not aware of scattering/gathering DMA APIs).
If a MAC layer decides to prepare in a previous cycle and transmit later, the API can in practice rely on the TX_DONE event.
I just did some measurements for the samr21-xpro (basic mode, direct send). For sending 127 bytes packet, it takes around 400 us to "prepare" the frame and ~4300 us from TX_START to TX_END (TOA). Considering the LIFS is 640us, calling "prepare" right after the TX_DONE event is still compliant with sending packets with IFS >= 640us.
Maybe we could add a precondition that the "prepare" function should be called while there's no ongoing transmission (e.g after TX_DONE)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it allowed to call prepare()
when the transceiver state is not TX? When prepare()
really just copies over the buffer and does not need to wait for the PLL to settle, prepare()
could block. (As said, to me blocking for an SPI transfer does count as "non-blocking" 😉) In that case, there is no need for an event to notify the HAL user that prepare()
finished.
However, a simplistic user of the HAL (e.g. an ALOHA MAC) would still have to await an event for calling prepare()
again after having issued transmit()
. For simplicity, this could just be the TX_COMPLETED event.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it allowed to call prepare() when the transceiver state is not TX?
Yes! In fact, it is preferred to call prepare
during TRX_OFF (if there are no timing constraints, considering switching from RX_ON to TX_ON takes 1 us but from TRX_OFF ~200 us). TRX_OFF consumes ~0.5 mA in while TX_ON consume ~6 mA (at86rf2xx
).
However, a simplistic user of the HAL (e.g. an ALOHA MAC) would still have to await an event for calling prepare() again after having issued transmit(). For simplicity, this could just be the TX_COMPLETED event.
Exactly, that's the idea
This would also be possible to some degree. But I mostly have code duplication between users of the HALs in mind. E.g. primitive non-persistent CSMA implementation could likely be implemented for every wireless network device; in the most primitive form a transceiver that provides the current RSSI value would be sufficient. If the subset of the features common to all wireless network devices could be provided in a way to allow portable access to those features, that could be implemented once and used for all wireless transceivers. (Note: The function to perform a single CCA could be part of the generic Wireless Network Device HAL and implemented fully in software by primitive transceivers; e.g. by fetching the current RSSI value and comparing it to a threshold. If a device not supporting any means to perform a CCA, it could return
On top of the most generic HAL that provides all features needed. E.g. TSCH is targeting IEEE 802.15.4, so there is no way to implement it on top of a more generic HAL. But a primitive CSMA MAC should work fine on top of the generic HAL. Note that I see the generic "Wireless Network Device HAL" not as an extra layer, but as a subset of the function pointers in the "IEEE 802.15.4 HAL". So a driver implementing IEEE 802.15.4 HAL would not need to do anything in addition to be compatible with the "Wireless Network Device HAL" - this compatibility would come for free. |
A general request: since there are several pending technical discussions, could we maybe try to focus first in these aspects (transceiver state, blocking-non blocking stuff) before discussing about DRY/maintainability? I'm in for everything that makes life easier when adding new HALs or maintaining code, but IMO we should first agree on the design decisions. |
Idle. The former will set the Abstract State Machine state to the transient | ||
state RX BUSY and then to Idle on `RF_RX_DONE` event. The latter will change | ||
the state to Idle only if the radio is currently not receiving a frame. | ||
TODO: Calls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this sneaked in by accident? Having a TODO in the final document does not make it very final.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, good catch. I forgot that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
# 10 Revision | ||
|
||
- Rev6: Relax Abstract State Machine transitions | ||
- Rev5: Add Source Address Matching | ||
- Rev4: Add Async behavior | ||
- Rev3: Merge Device Driver and HAL | ||
- Rev2: Update sleep preconditions | ||
- Rev1: Address on/off functions | ||
- Rev0: initial document |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these can go, as the doc was not part of master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alright
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rev0: initial document
can stay, as far as I understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
I added in the last push a table that summarizes that. |
I clarified the handling of each state a bit more. |
|
done! I guess I have to rename the file to EDIT: Renamed! |
May I squash? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yepp. Please also add RDM2 to the list in the README and see the following comments.
doc/memos/rdm0002.md
Outdated
for the implementation of the Event Notification callback (e.g the `RX_DONE` | ||
event might post an event to an event queue in order to fetch the packet or | ||
`PLL_LOCK` might unlock a mutex). | ||
|
||
The full list of events and implications are defined in the Interface | ||
Definition section. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PLL_LOCK
is not mentioned in the Interface Definition section at all. Is that intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kind of. The RDM does not prevent adding such an event, but we didn't include it in the final implementation because it's kind of a niche use case. We can still leave it there to point out that other events can be introduced if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can still leave it there to point out that other events can be introduced if needed.
Maybe rather remove the example here and add a mention of the introduction of other events still?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
squashed! |
0953490
to
c767296
Compare
@miri64 proposed to add the corresponding |
c767296
to
7ac3052
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RE-ACK
bors merge |
🕐 Waiting for PR status (GitHub check) to be set, probably by CI. Bors will automatically try to run when all required PR statuses are set. |
bors merge |
Build succeeded: |
🎉 |
it took almost three years! Thank you everyone for this! |
Thanks for pushing all this time! |
| IDLE | | RX | | ||
| |<-------------| | | ||
+---------+ +---------+ | ||
SET_TRX_STATE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting back to this, is there not an implicit TX state hidden inside IDLE when the device is busy sending the actual frame?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just checked this. There is definitely an implicit TX state, but it's already marked by the request_transmit
and confirm_transmit
function (which prevents calling set_trx_state
).But to keep things simple, it just assumes the radio is either listening (RX), ready to transmit (TX) or the transceiver is off (which is not equivalent to turning off the device with off
).
|
||
The author of this memo can be contacted via email at jose.alamos@haw-hamburg.de | ||
|
||
# Appendix A: Radio HAL header file (2023-02) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the point of including this here?
This will just become outdated (if it isn't already)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was just a way to keep track of the Radio HAL status at the moment of merge. A commit hash could have done the same though
Contribution description
This RDM introduces a new Radio Hardware Abstraction Layer for 802.15.4 compliant radios.
Abstract:
Implementation
The proposed API, 2 PoC reference implementations (nrf802154
andat86rf23x
) and a small test application were implemented in this branch.Radio HAL APInrf802154 Radio HAL PoC implementationat86rf2xx Radio HAL PoC implementionTest application(the test application is already added as a submodule under $(RIOTBASE)/tests)See #14655
Memory efficiency
EDIT: BASED ON OLD TEST! THIS IS OUTDATED NOW
Here are some preliminary memory usage results against
netdev
. Note that the Radio HAL version consumes less ROM because most getters are not needed (e.g channel, tx_power, addresses are stored by the PHY/MAC, so it's not necessary to ask the device for Information Base). Also, a compliant Radio HAL requires less instructions (no unnecessary state changes, sleep/wake-up routines everywhere, etc).samr21-xpro
`hal_test` application with netdev using base Radio HAL base commit (3e7ddcd):
`hal_test` application with Radio HAL:
nrf802154
`hal_test` application with netdev using base Radio HAL base commit (3e7ddcd):
`hal_test` application with Radio HAL:
Testing procedure
Issues/PRs references
#13771
#14655
#14371