-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gnrc/ipv6/nib: don't queue packets on 6lo neighbors and drop/flush if… #20834
gnrc/ipv6/nib: don't queue packets on 6lo neighbors and drop/flush if… #20834
Conversation
gnrc_pktqueue_t *oldest = _nbr_pop_pkt(node); | ||
assert(oldest != NULL); | ||
gnrc_icmpv6_error_dst_unr_send(ICMPV6_ERROR_DST_UNR_ADDR, oldest->pkt); | ||
gnrc_pktbuf_release_error(oldest->pkt, ENOBUFS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we should drop with ENOBUFS
or silently.
e4aeb78
to
f840f00
Compare
sys/include/net/gnrc/ipv6/nib/conf.h
Outdated
#if CONFIG_GNRC_IPV6_NIB_QUEUE_PKT | ||
/** | ||
* @brief queue capacity for the packets waiting for address resolution, | ||
* per neighbor. SHOULD always be smaller than @ref CONFIG_GNRC_IPV6_NIB_NUMOF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SHOULD always be smaller than @ref CONFIG_GNRC_IPV6_NIB_NUMOF
Why should that be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because if it's >= CONFIG_GNRC_IPV6_NIB_NUMOF
, then even for a single neighbor, you can't even get to drop old packets from it's queue because you can't allocate a new one in the first place. With multiple neighbors, this can happen anyway, so this is why I went for a SHOULD
instead of MUST
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah
#if IS_ACTIVE(CONFIG_GNRC_IPV6_NIB_QUEUE_PKT)
static gnrc_pktqueue_t _queue_pool[CONFIG_GNRC_IPV6_NIB_NUMOF];
#endif /* CONFIG_GNRC_IPV6_NIB_QUEUE_PKT */
That is not just an array of queues, but rather every packet to be queued has to get a slot here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If CONFIG_GNRC_IPV6_NIB_QUEUE_PKT_CAP
is the queue capacity per neighbor.
I would define this as 1 by default and change the upper code snipped to:
#if IS_ACTIVE(CONFIG_GNRC_IPV6_NIB_QUEUE_PKT)
static gnrc_pktqueue_t _queue_pool[CONFIG_GNRC_IPV6_NIB_NUMOF * CONFIG_GNRC_IPV6_NIB_QUEUE_PKT_CAP];
#endif /* CONFIG_GNRC_IPV6_NIB_QUEUE_PKT */
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm but it would be bad if packets cannot be queued even if there are 15 available slots because only one slot per neighbor is allowed. The current definition nevertheless looks a bit strange to me.
What about a neighbor is not allowed to have more than halve of the slots?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure we still need space in pktbuf to actually send out the neighbor solicitation (and receive the response), but that isn't deepening on the number of queued packets but their size - or do I misunderstand something here?
I think this is a different problem. What I'm trying to solve is the scenario where a neighbor gets flooded with packets and we run out of free queue entries. In that case, the neighbor(s) queue(s) will be filled with stale packets because we can't even _alloc_queue_entry()
in order to drop the oldest packets.
Then one neighbor could take all the slots and at some point _alloc_queue_entry() will always fail.
AFAIK we should always drop the oldest, but that can't be done if we can't allocate in the first place.
The current capping solution doesn't guarantee this scenario won't happen, just makes it less probable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also note that the default number of free queue entries is CONFIG_GNRC_IPV6_NIB_NUMOF == 16
, which is rather small.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could as well drop the oldest packet, as soon as allocation fails and try to allocate again.
It could be that 16 slots are held by one neighbor and another neighbor for which allocation fails does not have an oldest packet to drop, so allocation fails again because one neighbor holds all the slots.
With your capacity limit per neighbor you want to make this case less likely, but also do not prevent this case.
Yes the idea is good, but at the same time you accept that a packet is not queued even though a slot could be allocated.
If I got that right I am not sure if this is really beneficial as long as we don´t note an issue that packets cannot be queued because one neighbor is taking most of the queue slots.
Did you observe this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I got that right I am not sure if this is really beneficial as long as we don´t note an issue that packets cannot be queued because one neighbor is taking most of the queue slots.
Did you observe this case?
No, I only had issues with one host.
What bothers me most isn't failed allocations, but stale packets in a neighbor's queue. This goes against the first part of be strict when sending and tolerant when receiving.
It could be that 16 slots are held by one neighbor and another neighbor for which allocation fails does not have an oldest packet to drop, so allocation fails again because one neighbor holds all the slots.
This is partially true. We have the static _nib_onl_entry_t _nodes[CONFIG_GNRC_IPV6_NIB_NUMOF];
. We could iterate through that and drop either from the first neighbor (faster) or the one with the largest queue (may be slower for large CONFIG_GNRC_IPV6_NIB_NUMOF
). Anyway, I don't see iterating through that list as a performance problem. We already do so when searching for free packets, adding one more run isn't changing much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the per-neighbor queue cap. Packet allocation now never fails, it just pops a packet from the neighbor with the most packets in its queue. I made the queue entry count one more than the neighbor count. That way, there must always be a neighbor with at least two packets in it's queue, so we never leave it packet-less.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for not getting back to this earlier, this looks very sane IMHO.
How did you test the _alloc_queue_entry()
exhaustion case?
@@ -100,6 +100,7 @@ typedef struct _nib_onl_entry { | |||
* @note Only available if @ref CONFIG_GNRC_IPV6_NIB_QUEUE_PKT != 0. | |||
*/ | |||
gnrc_pktqueue_t *pktqueue; | |||
size_t pktqueue_len; /**< Number of queued packets */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only need this in the error path, so no need to cache this information.
Let's save some memory and do
static inline size_t gnrc_pktqueue_len(const gnrc_pktqueue_t *queue)
{
size_t len = 0;
while (queue->next) {
queue = queue->next;
++len;
}
return len;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about a large neighbor base (~128) and a high-bandwidth link (e.g. ethernet)? Once the queue entries are used up (at least one neighbor is unreachable and we to send it lots of packets), this will get executed for each packet you want to put on the wire.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm ok but it's only the sending to a neighbor that is already unreachable that would then be slow, right?
(Not sure if we can speed up the loop much by only considering neighbors in GNRC_IPV6_NIB_NC_INFO_NUD_STATE_PROBE
)
The alternative with the 128 neighbor NIB would be an additional 512 bytes of RAM for the counting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole network stack will be slow, I'm generally trying to avoid an unpredictable slowdown.
The alternative with the 128 neighbor NIB would be an additional 512 bytes of RAM for the counting.
I aggre that's a lot. I made it uint8_t
and moved it at the struct's end where 1 byte is wasted anyway because of padding.
The more I dig through the network stack I realize it is not really made for these numbers. I would also re-introduce the queue cap, see _nbr_push_pkt()
I updated the PR message. |
if (ARRAY_SIZE(_queue_pool) > CONFIG_GNRC_IPV6_NIB_NBR_QUEUE_CAP && | ||
node->pktqueue_len == CONFIG_GNRC_IPV6_NIB_NBR_QUEUE_CAP) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (ARRAY_SIZE(_queue_pool) > CONFIG_GNRC_IPV6_NIB_NBR_QUEUE_CAP && | |
node->pktqueue_len == CONFIG_GNRC_IPV6_NIB_NBR_QUEUE_CAP) { | |
if (node->pktqueue_len == (ARRAY_SIZE(_queue_pool) - 1) || | |
node->pktqueue_len == CONFIG_GNRC_IPV6_NIB_NBR_QUEUE_CAP) { |
I thought you wanted to prevent one node from hogging the entire queue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're talking past each other here. Let's recap:
- we have
CONFIG_GNRC_IPV6_NIB_NUMOF + 1
queue entries in the cache s.t. when they're used up there's always one neighbor with at least two - with
CONFIG_GNRC_IPV6_NIB_NUMOF
small ( ~ 16) we really don't care how they are distributed. Once we're out of entries, find the hog and take from there. This is fast, because we have at most ~16 neighbors and at most ~16 entries. The checkARRAY_SIZE(_queue_pool) > CONFIG_GNRC_IPV6_NIB_NBR_QUEUE_CAP
compiles the cap-check away. - with
CONFIG_GNRC_IPV6_NIB_NUMOF
big, we want a per-neighbor cap, which defaults at 16, out of the reasons stated previously:- A single hog cannot deplete the entry cache by itself -> less cases where we have to iterate all neighbors to find a hog
- RFC recommends keeping the per-neighbor queue small (whatever is considered small)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah makes perfect sense!
But since I had trouble getting it the first time, better also add a comment to the code to prevent misunderstandings with the next one reading this.
if (ARRAY_SIZE(_queue_pool) > CONFIG_GNRC_IPV6_NIB_NBR_QUEUE_CAP && | ||
node->pktqueue_len == CONFIG_GNRC_IPV6_NIB_NBR_QUEUE_CAP) { | ||
gnrc_pktqueue_t *oldest = _nbr_pop_pkt(node); | ||
gnrc_pktbuf_release(oldest->pkt); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not gnrc_pktbuf_release_error(oldest->pkt, ENOMEM)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think not. You can always send a packet by dropping older ones from a queue, and I think this is arguably normal operation. Also, in this case here we're not necessary out of queue entries, but reaching the per-neighbor limit.
* @attention This MUST be leq UINT8_MAX | ||
*/ | ||
#ifndef CONFIG_GNRC_IPV6_NIB_NBR_QUEUE_CAP | ||
#define CONFIG_GNRC_IPV6_NIB_NBR_QUEUE_CAP (16) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to have the default be so high?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This disables some code when the max neighbor count < CONFIG_GNRC_IPV6_NIB_NBR_QUEUE_CAP
, which is so per default. With a large neighbor count, we mainly want to prevent a neighbor depleting the queue - for performance reasons - and in that case it's not that big anymore.
This was my only rationale when picking 16.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, feel free to squash!
fd50e13
to
3a5612e
Compare
Contribution description
This following fix only applies for
CONFIG_GNRC_IPV6_NIB_QUEUE_PKT == 1
(default config).This PR is an alternative solution for #20781:
This PR adds following changes:
for UNREACHABLE neighbors: drop packets instead of queuingTests
Spun up this thread:
Both remote points are on-link (ethernet) but unreachable. The
sock_udp_send()
should never fail because we drop silently. By enabling debugging inipv6/nib/nib.c
, following is printed:i.e. packets are dropped from whichever nieghbor has the most in its queue.
Issues/PRs references
#20781