-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sys/net/gnrc/netif: make use of confirm_send #15821
Conversation
@benpicco: Maybe you can give this a look. I have a very strange regression on the
Do you have any clue what the issue could be? |
My suspicion is that this has to do with how ACK timeouts are handled. Indeed I get
The 'proper' solution would be porting this to the new radio HAL that takes care of ACK timeouts in a generic way. But that will also require some 802.15.4g plumbing inside the radio HAL. |
Thanks for the pointer. This helped to uncover quite a monster of a bug :-) It works now as expected :-) |
We should get there soon :). There are already some traces (pages, 16bit channels, etc). |
I like the idea. Specially the fact that we use thread flags now for processing events here. I also think in the future this could be written in a network-stack independent manner, so we can reuse the same mechanism for OpenThread, LWIP or even custom applications. This:
And this:
I'm afraid these might introduce a regression in some cases. For IEEE 802.15.4 for instance, this could easily take 50~70 ms in some configurations (long frames=~4ms ToA plus IFS, CSMA-CA backoff is 20 symbols = 320 us and the backoff exponent can be 7 => up to 2^7*320us = up to 40ms, several retransmissions). This would affect the upper layers trying to fetch information for the interfaces and any existing MAC layer that requires some housekeeping (GoMACH, lwmac, gnrc_lorawan and the upcoming IEEE 802.15.4 MAC). I would suggest to simply use one loop (as it is now). We can safely rely on this:
And then we simply go back to the main loop until we get a TX done. If the concerns go in the direction of TX timestamps, I'm gathering through the SDR quite a lot of evidence that this should be handled by the driver (some even provide helpers). Also, regarding:
I really think we should aim to do that in all radios. Even SPI radios have mechanisms to do that quite fast (e.g the |
We would need an asynchronous SPI API though, But I'm sure it could be a mid-term goal |
I see. I think we have the following options here:
|
Hmmm IMO:
Is the more reasonable thing to do. But what speak against having simply one loop? (instead of a separate loop for TX) In fact, several radios using the old send API are still blocking and non-blocking at the same time (e.g the at86rf2xx only blocks if the radio is busy trying to send another frame). Having one loop IMO would help us to synchronize all cases |
BTW, there are some MAC layers that get some benefits when using such a mechanism. E.g I plan to port GNRC LoRaWAN to use an event queue for all MAC timeouts (what I did with #15038). |
Sure. But the IRQ should remain at higher priority, as e.g. some radios have an RX/TX FIFO that is smaller than the maximum length of the PDU. Hence, an ISR needs to feed into the TX FIFO while sending, or drain the RX FIFO while receiving - this has very strict time requirements. Hence, the current implementation cannot be used to also hook in MAC events. Since recently, we do have priority support in the event queue mechanism by providing one event queue per priority. But IMO we should for now just go with Generally speaking, unless all use of |
Right after this get's in, I will convert that one to the new API. This semi-blocking behavior is blowing my mind every time. |
I agree. The IRQ offloading should have a higher priority than the interface |
Because this PR also includes #15783, I refer to "pre-PR" as to pre PR:
|
... which we automatically get back when directly using the submac (instead of This goes in a good direction IMO |
Rebased to include the change that TX/RX start/end events are now generated unconditionally and again since I accidentally pushed a commit that I didn't meant to push. |
I split out the adaption of the Ethernet driver on STM32 into its own PR. Especially since this would break lwIP until the glue code between netdev and lwIP also is changed to handle confirm_send, this makes IMO sense. I will now adapt Btw: I cannot find the "semi-blocking" code in |
I noticed that the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that a release is missing, if pktq
is used and the driver has no confirm_send()
.
} | ||
#endif /* IS_USED(MODULE_GNRC_NETIF_PKTQ) */ | ||
/* remove previously held packet */ | ||
gnrc_pktbuf_release_error(pkt, -error_code); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the release be inside the
#if IS_USED(MODULE_GNRC_NETIF_PKTQ)
...
#endif /* IS_USED(MODULE_GNRC_NETIF_PKTQ) */
because the call to gnrc_pktbuf_hold()
is also inside of an
#if IS_USED(MODULE_GNRC_NETIF_PKTQ)
...
#endif /* IS_USED(MODULE_GNRC_NETIF_PKTQ) */
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, once the the frame was passed over to gnrc_netif, it has to release it eventually. With the legacy API, this is done on success by the netdev adaption layer on success. But on failure it has to be freed by gnrc_netif. In case of the new API, gnrc_netif will always be in charge of releasing it, even on success. (But only after TX was signaled as completed.)
if (!netif->dev->driver->confirm_send) { | ||
/* hold in case device was busy to not having to rewrite *all* the link | ||
* layer implementations in case `gnrc_netif_pktq` is included */ | ||
gnrc_pktbuf_hold(pkt, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we increment the reference counter of pkt
here, do we not have to release it also in _tx_succeeded()
and not only in _tx_failed()
?
I am experiencing error: packet buffer full
with nrf24l01p_ng
when I try ping6 fe80::a0cc:a5ff:fe6b:aa%7 -c 255
.
But when I remove gnrc_pktbuf_hold(pkt, 1)
, the error is gone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be fixed. You tested with the upstream nrf, so no confirm_send, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I tested with the upstream nrf first. The release does fix it.
#if IS_USED(MODULE_GNRC_TX_SYNC) | ||
gnrc_pktbuf_release(tx_sync); | ||
#endif | ||
return _tx_succeeded(netif, res); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be a release to drop the previously held packet?
return _tx_succeeded(netif, res); | |
#if IS_USED(MODULE_GNRC_NETIF_PKTQ) | |
if (!netif->dev->driver->confirm_send) { | |
/* remove previously held packet */ | |
gnrc_pktbuf_release(pkt); | |
} | |
#endif | |
return _tx_succeeded(netif, res); |
Or the packet is passed to, and released in _tx_succedded()
, because in _process_events_await_msg()
the packet is released right after tx_succeeded()
was called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but only when pktq is used. With the legacy API the netdev (or rather the netdev adaption layer) is in charge of releasing the packet.
dev->driver->isr(dev); | ||
} | ||
if (flags & THREAD_FLAG_RX_DONE) { | ||
gnrc_pktsnip_t *pkt = netif->ops->recv(netif); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it is reasonable to have something like this:
if (flags & THREAD_FLAG_RX_DONE) {
gnrc_pktsnip_t *pkt;
while(flags & THREAD_FLAG_RX_DONE) {
if ((pkt = netif->ops->recv(netif))) {
_process_receive_stats(netif, pkt);
_pass_on_packet(pkt);
}
if (!tx_pkt) {
tx_pkt = _send_queued_pkt(netif);
}
}
}
The nrf24l01p_ng
has an internal queue.
Inside the recv()
routine in the driver I would again call the netdev callback with RX_DONE
to reset the thread flag.
Else, I guess a send call could override frames in the internal receive queue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will not work so easy. The issue is that thread_flags_wait_any(THREAD_FLAG_RX_DONE)
needs to be called in the loop. But that will block until the flag is set.
But I see no reason to extend the API to provide a non-blocking function that would to just allow that. But for now it might be the best if the driver would just return -EBUSY
on TX if the receive queue is not yet drained.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well neighbor stats are currently not used yet, so we are not losing much.
Since this PR doesn't contain any driver implementations, there is no expected change with existing drivers besides that, right?
I'll give that a try. Is this the final form of this PR yet?
(My only grudge is that this will create two classes of drivers and converting a driver to the new way seems all but trivial, but the progress is real here)
if (error_code == -EBUSY) { | ||
result = NETSTATS_NB_BUSY; | ||
} else { | ||
result = NETSTATS_NB_NOACK; | ||
netdev_t *dev = netif->dev; | ||
dev->driver->get(dev, NETOPT_TX_RETRIES_NEEDED, &retries, sizeof(retries)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no driver that uses -ECOMM
to indicate no-ACK
#if IS_USED(MODULE_GNRC_NETIF_PKTQ) | ||
if (error_code == -EBUSY) { | ||
int put_res; | ||
|
||
/* Lower layer was busy. | ||
* Since "busy" could also mean that the lower layer is currently | ||
* receiving, trying to wait for the device not being busy any more | ||
* could run into the risk of overriding the received packet on send | ||
* Rather, queue the packet within the netif now and try to send them | ||
* again after the device completed its busy state. */ | ||
if (push_back) { | ||
put_res = gnrc_netif_pktq_push_back(netif, pkt); | ||
} | ||
else { | ||
put_res = gnrc_netif_pktq_put(netif, pkt); | ||
gnrc_netif_pktq_sched_get(netif); | ||
} | ||
if (put_res == 0) { | ||
DEBUG("gnrc_netif: (re-)queued pkt %p\n", (void *)pkt); | ||
return; /* early return to not release */ | ||
} | ||
|
||
LOG_ERROR("gnrc_netif: can't queue packet for sending\n"); | ||
/* If we got here, it means the device was busy and the pkt queue | ||
* was full. The packet should be dropped here anyway */ | ||
error_code = -ENOMEM; | ||
} | ||
#endif /* IS_USED(MODULE_GNRC_NETIF_PKTQ) */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move both of those conditional blocks to static inline
functions to enhance readability?
I guess every driver that calls |
Can there for example be a |
Co-authored-by: benpicco <benpicco@googlemail.com>
This is indeed an issue that needs to be addressed first. Technically, it is however better to not handle the event right away. The call graph in the current code is like this:
The additional call depths results in more stack memory being used. In addition, it makes it more difficult to implement the network interface as an event queue handler, that just posts events in the event queue. But such an implementation could share a single thread for all network interfaces. Not only for board routers but e.g. also for the AT86RF215 there are inherently more than one netdevs - and cutting the stacks required to handle them in half would be really beneficial. Bottom line: I really would like to keep the new behavior for netdevs that implement |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you want me to ignore this issue, please mark it with the "State: don't stale" label. Thank you for your contributions. |
So to push this forward, those drivers would need to be fixed right? |
I remember for |
Rebasing is futile, |
Contribution description
gnrc_netif.c
to make use ofnetdev_driver_t::confirm_send
, if not `NULLcore/thread_flags
is used to wait for TX completion and IRQsgnrc_netif_events
provides, but with less ROM. Consequently,gnrc_netif_events
is droppedconfirm_send()
is provided, the pkgsnip is now released bygnrc_netif.c
, rather by each adaption layer individually. This is needed, as the driver might still operate on the buffer during TX - e.g. peripheral network interfaces with DMA that supports scatter/gather, no copy of the outgoing frame is needed.gnrc_netif_ethernet.c
to only release the packet buffer ifconfirm_send
is not providedconfirm_send
for the STM32 Ethernet driverTesting procedure
confirm_send()
Issues/PRs references