-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[chassis] Chassis DB cleanup when asic comes up #16213
Conversation
end | ||
return " 0 $lc $asic | ||
|
||
sleep 30 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the criteria to pick this value to sleep? If it is merely cleanup the redis DB entry, does it require "sleep xx" at all?
Just looking at how we can spend less time to bring up a module and only wait when it is absolutely necessary so the overall chassis/module bring up time can be reduced...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sleep time is required. At the line card/asic, we need to make sure that neighbors associated with a system interface is removed in the syncd/SAI before deleting the system interface. If this delay was not there, we observed an issue that at the orchangent system interface refcount will be zero but the syncd meta layer will stil have non-zero. Due to this out of sync issue which is in turn diue to time difference between the orchangent and syncd, syncd returns error and hence orchagent exits. There is no specific criteria used to select this value. This is thie lowest rounded time we used with which we did not see the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation. Please add a comment to disclose this so that in future someone may not try to remove it or at least when try to better tunning the value will know what to verify so it will not end up breaking something...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comment.
end | ||
return " 0 $lc $asic | ||
|
||
sleep 15 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question as above... can we eliminate unnecessary sleep if it is not really needed...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for the similar reason as explained for the previous comment but for SYSTEM_LAG_TABLE and SYSTEM_LAG_MEMBER_TABLE entries. At the line card/asic, deletion of system lag will happen only if all the members of the system lag are removed. Even though we send system lag member delete before sending the system lag delete, due to the order of processing and other dependency (like system inteface delete and neighbours delete which introduces timing differences), by the time system lag is tried to be deleted, sometimes, the a system lag still has system interface pending to be deleted. Having this delay (the lowest rounded time we tested with which we did not see the issue) helps to make sure that all system lag members are deleted before deleting system lag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation. Please add a comment to disclose this so that in future someone may not try to remove it or at least when try to better tunning the value will know what to verify so it will not end up breaking something...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good to me.. Thanks
if [[ !($($SONIC_DB_CLI CHASSIS_APP_DB PING | grep -c True) -gt 0) ]]; then | ||
return | ||
fi | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a check to do the clean-up for VoQ Linecards only. The code as it is now will add delay of about 45 seconds on supervisor and non-voq linecards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added check to run the clean up only for voq switches.
Cleanup the entries from the following tables in chassis app db in redis_chassis server in the supervisor (1) SYSTEM_NEIGH (2) SYSTEM_INTERFACE (3) SYSTEM_LAG_MEMBER_TABLE (4) SYSTEM_LAG_TABLE As part of the clean up only those entries created by the asic that is coming up are deleted. The LAG IDs used by the asics are also de-allocated from SYSTEM_LAG_ID_TABLE and SYSTEM_LAG_ID_SET Signed-off-by: vedganes <veda.ganesan@nokia.com>
Changes to add comments describing reason for adding delay before system interface and system lag entries clean up. Signed-off-by: vedganes <veda.ganesan@nokia.com>
# but the syncd (meta) still has no-zero refcount. Because of this, orchagent gets "object still in use" | ||
# error and aborts. | ||
|
||
sleep 30 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also add a check to see if the number of SYSTEM_NEIGHBOR entries ( and similarly below for SYSTEM_LAG_MEMBER_TABLE) are non-zero before adding this sleep(). In case of retries when swss systemd service restarts multiple times etc we need not each time wait for 45 sec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is not with number of system neighbors or system lag members in the chassis app db. The problem is with timing in processing in other remote asics and linecards. It depends on how much time the remote asics/linecards take to remove the referenced entries in that asic/linecard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vganesan-nokia , I think the asks is for this script while executing to cleanup the various stale entries to be able to know whether anything actually did get cleaned up during the iteration. if so, do wait for the needed sleep time whether it be just one entry or 1000 entries... but if NONE is cleaned up let's say for nexthop resource, then we should not dead wait for the 30 seconds there...
This would work out great when swss keep restarting let's say after the cleanup all done it encountered some other issues that caused it to restart again. since the cleanup already done on first restart, it should have already waited during the first restart iteration so that on the second restart, the clean up steps will end up with nothing needs to be clean up any more and therefore for that case no need to sleep at all... Is this something achievable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is true, I was pointing to a few cases as below where we can skip the sleep altogether
(a) no SYSTEM_NEIGHBORS/LAG for that asic/LC ( so ideally nothing to clean locally and remote)
or in scenario below
(b) where swss restarts, we check and cleanup SYSTEM_NEIGHBORS for that asic/LC in chassis_db. Wait for 30 secs and make sure that the processing in remote asic/linecard is done.
But not for some reason swss restarts again (assume orchgent crashed). Again we will do same cleanup SYSTEM_NEIGHBORS for that asic/LC in chassis_db -- but this time there are no entries as in last round we cleaned up all .
So now there is no need to wait for 30 sec and 15 sec as there is no SYSTEM_NEIGHBOR/LAG entries relating to my LC/asic present in CHASSIS_DB
But don't block merge for this comment -- we can fine tune this in subsequent commits too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification. I'll fix this.
Follwing changes done for review comments: - Added check to run the chassis db clean up only for voq switches. Signed-off-by: vedganes <veda.ganesan@nokia.com>
fea9612
to
b3ebd82
Compare
BRM build failed... retrying the pipeline again... |
/azp run |
Commenter does not have sufficient privileges for PR 16213 in repo sonic-net/sonic-buildimage |
/Azp run Azure.sonic-buildimage |
Azure Pipelines successfully started running 1 pipeline(s). |
# but the syncd (meta) still has no-zero refcount. Because of this, orchagent gets "object still in use" | ||
# error and aborts. | ||
|
||
sleep 30 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have some concerns about the sleep 30 here and the sleep 15 after the LAG cleanup.
Is there a way you can check for the case where there are no entries need to be removed, we can skip over the sleep statements? Basically, let's not blindly wait unless there is a need to wait... it creates unnecessary delays on swss restart.
# but the syncd (meta) still has no-zero refcount. Because of this, orchagent gets "object still in use" | ||
# error and aborts. | ||
|
||
sleep 30 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vganesan-nokia , I think the asks is for this script while executing to cleanup the various stale entries to be able to know whether anything actually did get cleaned up during the iteration. if so, do wait for the needed sleep time whether it be just one entry or 1000 entries... but if NONE is cleaned up let's say for nexthop resource, then we should not dead wait for the 30 seconds there...
This would work out great when swss keep restarting let's say after the cleanup all done it encountered some other issues that caused it to restart again. since the cleanup already done on first restart, it should have already waited during the first restart iteration so that on the second restart, the clean up steps will end up with nothing needs to be clean up any more and therefore for that case no need to sleep at all... Is this something achievable?
@yxieca , @StormLiangMS |
* [chassis]Chassis DB cleanup when asic comes up Cleanup the entries from the following tables in chassis app db in redis_chassis server in the supervisor (1) SYSTEM_NEIGH (2) SYSTEM_INTERFACE (3) SYSTEM_LAG_MEMBER_TABLE (4) SYSTEM_LAG_TABLE As part of the clean up only those entries created by the asic that is coming up are deleted. The LAG IDs used by the asics are also de-allocated from SYSTEM_LAG_ID_TABLE and SYSTEM_LAG_ID_SET - Added check to run the chassis db clean up only for voq switches. Signed-off-by: vedganes <veda.ganesan@nokia.com> Co-authored-by: vganesan-nokia <67648637+vganesan-nokia@users.noreply.github.com>
* [chassis]Chassis DB cleanup when asic comes up Cleanup the entries from the following tables in chassis app db in redis_chassis server in the supervisor (1) SYSTEM_NEIGH (2) SYSTEM_INTERFACE (3) SYSTEM_LAG_MEMBER_TABLE (4) SYSTEM_LAG_TABLE As part of the clean up only those entries created by the asic that is coming up are deleted. The LAG IDs used by the asics are also de-allocated from SYSTEM_LAG_ID_TABLE and SYSTEM_LAG_ID_SET - Added check to run the chassis db clean up only for voq switches. Signed-off-by: vedganes <veda.ganesan@nokia.com>
Cherry-pick PR to 202305: #16417 |
* [chassis]Chassis DB cleanup when asic comes up Cleanup the entries from the following tables in chassis app db in redis_chassis server in the supervisor (1) SYSTEM_NEIGH (2) SYSTEM_INTERFACE (3) SYSTEM_LAG_MEMBER_TABLE (4) SYSTEM_LAG_TABLE As part of the clean up only those entries created by the asic that is coming up are deleted. The LAG IDs used by the asics are also de-allocated from SYSTEM_LAG_ID_TABLE and SYSTEM_LAG_ID_SET - Added check to run the chassis db clean up only for voq switches. Signed-off-by: vedganes <veda.ganesan@nokia.com>
) The Test Case tacacs/test_ro_disk.py::test_ro_disk, in finally block a reboot is performed, currently, the condition for reboot to be successful is to check the docker is running, the enhancement done is to wait till all the critical process in the docker are up and the interfaces are also up for all the line cards in case the supervisor is rebooted The Particular improvement is required due to the changes with respect to PR sonic-net/sonic-buildimage#16213. The Parent PR on Master was merged #9893 What is the motivation for this PR? The Parent Master PR #9893 The Particular improvement is required due to the changes with respect to PR sonic-net/sonic-buildimage#16213. How did you do it? Assertion for successful reboot of supervisor card is updated to check all the process are up on critical docker and interfaces are also up on line cards. How did you verify/test it? Ran the test cases against a multi-asic line card in a T2 chassis.
Why I did it
In an operational system, after some configuration is changed (that involves line cards generating entries to chassis db), if config reload or system reboot is done without saving the new configuration changes and when system comes up, entries created by the new config are still present in the chassis db and hence the corresponding voq system entries (such as system interface, system neighbor and so on ) in all other line cards. These stale entries may affect the accuracy of the current entries and hence the intended operation of chassis. To fix this, we cleanup the chassis db when the an asic comes up. The chassis db is cleaned up for entries created by the asic that is coming up.
Work item tracking
N/A
How I did it
Cleanup the entries from the following tables in chassis app db in redis_chassis server in the supervisor
(1) SYSTEM_NEIGH
(2) SYSTEM_INTERFACE
(3) SYSTEM_LAG_MEMBER_TABLE
(4) SYSTEM_LAG_TABLE
As part of the clean up only those entries created by the asic that is coming up are deleted. The LAG IDs used by the asics are also de-allocated from SYSTEM_LAG_ID_TABLE and SYSTEM_LAG_ID_SET
How to verify it
Which release branch to backport (provide reason below if selected)
N/A
Tested branch (Please provide the tested image version)
master
Description for the changelog
Link to config_db schema for YANG module changes
None
A picture of a cute animal (not mandatory but encouraged)