-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[libteam]: timerfd read() could return 0 fix #3393
Conversation
read(timerfd...) returning 0 could be handled as if timer fired, because you are reading, which implies that fd is selected for read, which in turn implies that timer has fired. Something went wrong, between firing & reading, which made it return 0. |
@renukamanavalan we could also say that something went wrong when the event was fired. Also I chose this implementation because of https://github.com/facebook/folly/blob/master/folly/experimental/TimerFD.cpp#L83 |
i tend to agree with renuka, in our case, probably better to treat 0 as timer expired. if you look at the callback, https://github.com/jpirko/libteam/blob/54f137c10579bf97800c61ebb13e732aa1d843e6/teamd/teamd_runner_lacp.c#L812 it is the periodical timeout, if it is expired, we'd better send a lacp packet instead of skipping one timeout. |
Interesting! So questions are: Will timer continue to fire if we treat read 0 bytes as fired v.s. ignored? Will protocol partner ok with receiving more frequent handshakes than expected? (from warm/fast reboot, we know that they would be fine with a late one). If clock jumped forward, then the timer might have fired earlier than it should be, if clocked jumped backwards, then the timer would have fired late. In the later case, having a catch up handshake might be desirable. |
Based on past debugging, "how you treat the "return == 0", does not affect the subsequent firing of timer." |
@lguohan teamd register 2 timers for LACP (and one for eth_lw if I remember well). One LACP timer is 30 second timer, another one is 120 second timer. If the 30 sec timer expired sooner then expected it's more or less ok. We can send update sooner than 30 second. But if 120 second timer expires sooner than expected, it means drop portchannel down.
|
* Update sonic-quagga submodule * [libteam]: timerfd read() could return 0
- What I did
Fixed an issue with libteam. teamd is using timerfd for LACP timeouts. It's possible to see 0 as return from read() sometimes. I've added a fix, which will not produce an error in this case
- How I did it
As soon as read returns 0 on timerfd, just don't return the error, just skip the event.
- How to verify it
Build an image and run on your DUT
- Description for the changelog
- A picture of a cute animal (not mandatory but encouraged)