-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fast-reboot] ARP entries are not restored after fast-reboot #5217
Comments
Looking into this |
@tahmed-dev , can you look at this fast reboot related issue? |
@tahmed-dev to look into this issue. |
@tahmed-dev can you please update on status? |
@stepanblyschak, is the issue here the FDB are not restored or the restoration process is delayed? if it is the latter, what are quantitative expectations to have FDB entries restored? I can see from the following logs that the process is taking about 55s. However the order of restoration is not clear to me. @prsunny do you see a race condition here?
|
@tahmed-dev FDBs are restored, but ARP entries are not. ARP is populated only after the port is operationally up and then the system learns ARP entries. Yesterday I did another test and found out that when I change neighsyncd to start when orchagent is running the ARPs are restored. Looks like a race between swssconfig and neighsyncd. Basically what I did is in https://github.com/Azure/sonic-buildimage/blob/master/dockers/docker-orchagent/supervisord.conf I changed:
to
Hope, that can help. |
Thanks @stepanblyschak ! I am a bit reluctant to accept that the above change resolves the issue for two reasons, 1) the issues as stated is hard to produces and so we might not be hitting the race condition, and/or 2) the I found a bug with the way arp_update process is started (or not so to speak) and I cannot see this process running on master. I am building a fix for it in order to get |
Hey @stepanblyschak can you please try once more with PR:5391? |
@stepanblyschak please reopen if the issue is not resolved on your end. |
Still reproducible on hash 13d28f9 |
@stepanblyschak do you see arp_update process started in the syslog? Do you think PR:4165's relocation of 5min timer to be before the ping cmd has an effect? |
Thanks @tahmed-dev! |
Description
Fast-reboot should dump FDB, ARP entries known prior to reboot and set them on HW ASAP after boot.
This is not observed with ARP entries even if I workaround #5216.
Looking at logs I can observe that FDB entries were set, but neighbors were created only when ports become UP and traffic started to flow. I checked dumped arp.json and it is correct.
Steps to reproduce the issue:
Describe the results you received:
Neighbor entries are not restored after fast boot
Describe the results you expected:
Neighbor entries restored ASAP during fast boot
Additional information you deem important (e.g. issue happens only occasionally):
This issue is not 100% reproducible, never seen on SONiC.201911.142-52e45e82 and also it was not observed SONiC.201911.168-309a098.
Probably, it is caused by 5be374c that exposed some race between swssconfig and orchagent.
syslog.txt
(paste your output here)
```
The text was updated successfully, but these errors were encountered: