-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Mellanox] make sure shared storage with syncd is cleared on restarts #14547
Conversation
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
@stepanblyschak does it have any impact of upgrades between different version on the same branch and between branches? |
/azp run Azure.sonic-buildimage |
Azure Pipelines successfully started running 1 pipeline(s). |
@lguohan would you like to review or assign someone to review? LGTM |
frankly, i do not like the idea of having additional channel for communication between dockers. there are lots of challenges of dependency between dockers and very hard to debug. I would really like to understand why this is needed, what is the data shared between syncd docker and other dockers. |
@vaibhavhd and @yxieca for more comments. |
@lguohan This is not new communication channel. It exists for quite some time and used by pmon, what-just-happened to communicate with SDK directly as not all features are now supported by SAI. This PR is not about adding new communication channel but making the shared location is cleared on restarts. |
This change has an risk: since the shared folder is used by syncd and other dockers:
|
@yxieca This is guernteed by systemd. What is the risk? |
@yxieca @liat-grozovik @lguohan Clarified the purpose in PR description. |
I am not sure that is fully guaranteed. e.g. if syncd is stopping while pmon is accessing the shared memory. Then the syncd won't be able to remove the shared memory folder. As result, it could see stale information after next start. Or the service stop/start could fail? |
@yxieca Pmon is instructed to start After=syncd -
That means, on shutdown the reverse order is used. -
If syncd service is about to stop (no matter if gracefully by request or abnormally) systemd first stops pmon service. |
@lguohan could you please help to review/approve? |
@yxieca , i see this note. NOTE: No plans to use it for standard SONIC dockers and we are working on removing the SDK dependency from PMON docker looks like it is only used between mellanox syncd docker and pmon docker, and they are working on removing that dependency, so that will be removed in the future. is this ok? |
I believe @stepanblyschak is checking warm reboot scenario, we can merge on his signal. |
@yxieca We are good to merge it. |
…onic-net#14547) Why I did it Sharing the storage of syncd with other proprietary application extensions allows them to communicate with syncd in differnt ways. If one container wants to pass some information to syncd then shared storage can be used. However, today the shared storage isn't cleaned on restarts making it possible for syncd to read out-of-date information generated in the past. NOTE: No plans to use it for standard SONIC dockers and we are working on removing the SDK dependency from PMON docker How I did it Implemented new service to clean the shared storage. How to verify it Do reboot/fast-reboot/warm-reboot/config-reload/systemctl restart swss and verify /tmp/ is cleaned after each restart in syncd container. Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Cherry-pick PR to 202305: #16046 |
…14547) (#16046) Why I did it Sharing the storage of syncd with other proprietary application extensions allows them to communicate with syncd in differnt ways. If one container wants to pass some information to syncd then shared storage can be used. However, today the shared storage isn't cleaned on restarts making it possible for syncd to read out-of-date information generated in the past. NOTE: No plans to use it for standard SONIC dockers and we are working on removing the SDK dependency from PMON docker How I did it Implemented new service to clean the shared storage. How to verify it Do reboot/fast-reboot/warm-reboot/config-reload/systemctl restart swss and verify /tmp/ is cleaned after each restart in syncd container. Signed-off-by: Stepan Blyschak <stepanb@nvidia.com> Co-authored-by: Stepan Blyshchak <38952541+stepanblyschak@users.noreply.github.com>
…onic-net#14547) Why I did it Sharing the storage of syncd with other proprietary application extensions allows them to communicate with syncd in differnt ways. If one container wants to pass some information to syncd then shared storage can be used. However, today the shared storage isn't cleaned on restarts making it possible for syncd to read out-of-date information generated in the past. NOTE: No plans to use it for standard SONIC dockers and we are working on removing the SDK dependency from PMON docker How I did it Implemented new service to clean the shared storage. How to verify it Do reboot/fast-reboot/warm-reboot/config-reload/systemctl restart swss and verify /tmp/ is cleaned after each restart in syncd container. Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Merge code from master to internal Related work items: sonic-net#32, sonic-net#49, sonic-net#376, sonic-net#2598, sonic-net#11862, sonic-net#12530, sonic-net#14000, sonic-net#14547, sonic-net#14549, sonic-net#14814, sonic-net#15077, sonic-net#15239, sonic-net#15252, sonic-net#15253, sonic-net#15298, sonic-net#15357, sonic-net#15384, sonic-net#15394, sonic-net#15399, sonic-net#15405, sonic-net#15511, sonic-net#15566, sonic-net#15583, sonic-net#15591, sonic-net#15592, sonic-net#15593, sonic-net#15602, sonic-net#15604, sonic-net#15611, sonic-net#15621, sonic-net#15625, sonic-net#15634, sonic-net#15635, sonic-net#15645, sonic-net#15646, sonic-net#15647, sonic-net#15657, sonic-net#15658, sonic-net#15697, sonic-net#15699
Why I did it
Sharing the storage of syncd with other proprietary application extensions allows them to communicate with syncd in differnt ways.
If one container wants to pass some information to syncd then shared storage can be used. However, today the shared storage isn't cleaned on restarts making it possible for syncd to read out-of-date information generated in the past.
NOTE: No plans to use it for standard SONIC dockers and we are working on removing the SDK dependency from PMON docker
How I did it
Implemented new service to clean the shared storage.
How to verify it
Do reboot/fast-reboot/warm-reboot/config-reload/systemctl restart swss and verify /tmp/ is cleaned after each restart in syncd container.
Which release branch to backport (provide reason below if selected)
Description for the changelog
Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)