-
Notifications
You must be signed in to change notification settings - Fork 531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fast-reboot] Add a check for warmstart before cleaning up neigh table #1498
Conversation
…s enable. This commit is to address the issue that the NEIGH_TABLE loaded by swssconfig after fast-reboot is cleared by neighsyncd. Signed-off-by: bingwang <wang.bing@microsoft.com>
psTable->clear(); | ||
if (m_warmStartInProgress) | ||
{ | ||
psTable->clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this change will affect both neighsyncd
and natsyncd
. @lguohan is it OK to not clear the table for natsyncd
when the dut is warm-rebooting? If not and to limit the change to limit neighsyncd
, we can also check the table name to be neighsyncd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't nat hit same issue if this protection is not there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not clear if it is required for NAT tables to be cleared unconditionally or not. Needs a bit of digging into how NAT is using this shared class. NAT is using this method for 4 table here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bug introduced by #1126 where the refactoring of the library to support multiple tables was calling this psTable->clear unconditionally. The way the fix doing here should be correct. NAT tables or any client should not use the library to flush producer state table in non warm-reboot cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw: The reason we use psTable clear was to make sure the relevant table wouldn’t change in some corner cases after we dumped it to memory, this is required only in warm-reboot/restart case before we dump the table. Also, since the daemon using the library was the producer itself, it was safe to do so. However, this assumption was broken if we use swssconfig to load the table at the same time, which is the case for non warm-reboot cases for arp, nat tables etc. In those cases, the library cleared the requests from swssconfig incorrectly and cause the issues reported.
psTable->clear(); | ||
if (m_warmStartInProgress) | ||
{ | ||
psTable->clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bug introduced by #1126 where the refactoring of the library to support multiple tables was calling this psTable->clear unconditionally. The way the fix doing here should be correct. NAT tables or any client should not use the library to flush producer state table in non warm-reboot cases.
…s enable. (#1498) This commit is to address the issue that the NEIGH_TABLE loaded by swssconfig after fast-reboot is cleared by neighsyncd. **What I did** Fix sonic-net/sonic-buildimage#5841 and sonic-net/sonic-buildimage#5580 We found that neighbor table loaded by ```swssconfig``` from ```arp.json``` after ```fast-reboot``` is cleared by ```neighsyncd``` mistakenly at the initial stage. This PR adds a check for ```WarmStart``` before cleaning up, and only do that if ```WarmStart``` is enable. **Why I did it** This PR is to fix the issue that arp table is not recovered after fast-reboot. **How I verified it** Verified on Arista-7260, running 201911 image. 1. Run some test to populate ARP entries on DUT, such as ```test_fast_reboot``` 2. Issue a fast-reboot 3. Verify the ```arp.json``` backed up by ```fast-reboot-dump.py``` is loaded and NEIGH_TABLE is restored.
…s enable. (sonic-net#1498) This commit is to address the issue that the NEIGH_TABLE loaded by swssconfig after fast-reboot is cleared by neighsyncd. **What I did** Fix sonic-net/sonic-buildimage#5841 and sonic-net/sonic-buildimage#5580 We found that neighbor table loaded by ```swssconfig``` from ```arp.json``` after ```fast-reboot``` is cleared by ```neighsyncd``` mistakenly at the initial stage. This PR adds a check for ```WarmStart``` before cleaning up, and only do that if ```WarmStart``` is enable. **Why I did it** This PR is to fix the issue that arp table is not recovered after fast-reboot. **How I verified it** Verified on Arista-7260, running 201911 image. 1. Run some test to populate ARP entries on DUT, such as ```test_fast_reboot``` 2. Issue a fast-reboot 3. Verify the ```arp.json``` backed up by ```fast-reboot-dump.py``` is loaded and NEIGH_TABLE is restored.
What I did
Fix sonic-net/sonic-buildimage#5841 and sonic-net/sonic-buildimage#5580
We found that neighbor table loaded by
swssconfig
fromarp.json
afterfast-reboot
is cleared byneighsyncd
mistakenly at the initial stage. This PR adds a check forWarmStart
before cleaning up, and only do that ifWarmStart
is enable.Why I did it
This PR is to fix the issue that arp table is not recovered after fast-reboot.
How I verified it
Verified on Arista-7260, running 201911 image.
test_fast_reboot
arp.json
backed up byfast-reboot-dump.py
is loaded and NEIGH_TABLE is restored.Details if related