-
Notifications
You must be signed in to change notification settings - Fork 434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reset expiry of entries in the neighbor cache on packet reception #966
Conversation
5aee679
to
11941a5
Compare
I'm not sure either if this is the correct thing to do, I'll have to look into it. I don't think you need to pass the timestamp every time, you can just call |
Thanks for the pointer, I'll fix! |
11941a5
to
fd75877
Compare
@thvdveld Another way to solve this is to instead mark the entry as "in use" by a connection to not only evict on "the one with the shortest timeout". This would be even more robust. |
From what I understand, Linux [1] does not remove the one with the shortest timeout, they just return an error and drop the packet. However, they have a neighbour cache size of 1024 neighbours. We can also increase the size by using the What Linux also does is updating the Linux keeps track of just more than reachable, etc. [3], which is based on Neighbour Unreachability Detection I think, specified in RFC 4861 Section 7.3. In the end, we should update our implementation to follow this. I know this is for IPv6, but I think they do something similar for IPv4. I think that for now, updating the expiration timestamp is good enough. What do you think? [1] https://github.com/torvalds/linux/blob/master/net/core/neighbour.c#L469-L524 |
I agree we need some kind of "refreshing" MACs to prevent them from expiring while in active use. I'm not sure what's the best way though. There's historically been a few nasty bugs around this, about bad cache entries in broken networks. For example this 6210612 . I'd be comfortable with this if we made the check stricter. Update expiration if all these hold:
I think that should fix your issues while keeping the chances of brokenness low. The ideal fix would be to get "forward progress" confirmation from higher layer (e.g. TCP ACKs coming in for new data) like Linux does, but that's a more involved refactor. |
b6a0393
to
756b355
Compare
- Check for unicast destination - Source IP matches cache - Source hardware address matches cache
756b355
to
a11428d
Compare
Thanks for the feedback and analysis! Great to know what the steps towards a complete fix would be. |
@thvdveld on the comment of 1024 seems like a lot though, wouldn't that eat like 20kB of RAM? |
I've test-run this over night in the old system that broke about every 30s with the old |
1024 is indeed a lot when used on embedded. I looked at Contiki-NG and Zephyr, and they do 16 and 8 respectively. Maybe we should make it 8 by default in smoltcp as well. |
Yeah, the 4 is a bit too conservative for a good default. Do you have any preference @Dirbaio ? |
I tentatively updated the default to 8. EDIT: Sorry misread the defaults. |
f008d21
to
8e97c95
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you!
Tentative fix for #965
This does solve the issue in my testing, but I'm not sure if it's the "right thing to do".