-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[syncd.sh,pmon.service] Prevent pmon from starting ahead of syncd #8
[syncd.sh,pmon.service] Prevent pmon from starting ahead of syncd #8
Conversation
During system starting, pmon isn't supposed to start ahead of syncd starting in order to avoid racing condition between syncd and pmon. Currently it is done by killing pmon if is alive when syncd is starting. However such implementation is still risky. Consider the following flow: 1. pmon is inactive when syncd.sh is checking. but syncd.sh is scheduled out somehow just ahead of "chipdown" called 2. systemd is switched in and starts pmon service 3. at this point, pmon and syncd are running simultaneously, critical section broken and racing condition formed To prevent that issue, ony solution is to add syncd as "After" in pmon.service, which ensure that whenever pmon starts syncd has been started. However, dong so requires to defer starting pmon.service after syncd.service has fully started otherwise a deadlock is formed as following: 1. syncd.sh starts pmon ahead of itself fully started, while 2. pmon not being able to start due to syncd, one of its "After", not fully started. 3. as a result, syncd and pmon have to wait for each other forever To solve that, move starting pmon.service to "wait()" so that pmon is started after syncd fully started, breaking the deadlock.
@stephenxs Maybe it is a good idea to have the entire synchronization flow based on systemd dependencies and remove all the rest stuff from syncd.sh? The pmon stop for warm-fast flows can be done in a dedicated scripts. Could you please check sych an option and compare it to existing approach? If you have any concerns we can discuss them. |
The idea to have Wants=pmon.service in syncd.service and After=syncd.service with Requires=syncd.service in pmon.service. This will definitely simplify the entire flow. What do you think? |
This update brings in the following commits. 86c1108 Enable arm architecture to build in addition to amd64 (#37) 4acb2c3 fix bugs and enhance Transformer (#35) 49e5a22 ygot related enhancements and fixes (#34) 51224de Fix ietf yang search path for cvl schema builds (#32) 3c6cdb3 CVL Changes #8: 'must' and 'when' expression evaluation (#31) dabf231 CVL Changes #7: 'leafref' evaluation (#28) 6f9535f CVL Changes #6: Customized Xpath Engine integration (#27) 5e2466b DB-Layer fixes/enhancements (#26) 9a27302 CVL Changes #4: Implementation of new CVL APIs (#22) dbf1093 Translib support for authorization, yang versioning and Delete flag (#21) 80f369e CVL Changes #5: YParser enhancement (#23) 904ce18 CVL Changes #3: Multi-db instance support (#20) 9d24a34 CVL Changes #2: YValidator infra changes for evaluating xpath expression (#19) f3fc40f CVL Changes #1: Initial CVL code reorganization and common infra changes (#18) 4922601 Bulk and RPC API support in translib (#16) 1d730df RFC7895 yang module library implementation (#15)
Update Barefoot platform support for Bullseye and 5.10 kernel, and add python3-venv.
* [BFN] Updated platform APIs impl Signed-off-by: Andriy Kokhan <andriyx.kokhan@intel.com> * Extended BFN platform SFP APIs implementation * Update sfp.py * [BFN] Extended SFP platform plugin implementation Signed-off-by: Andriy Kokhan <andriyx.kokhan@intel.com> * [BFN] Extended Fans platform plugin implementation * [BFN] divided classes Fan and FanDrawer into 2 files * Signed-off-by: Vadym Yashchenko <vadymx.yashchenko@intel.com> What I did Add get_model() function Add get_low_critical_threshold() function Change __get(...) function. How I did it Differnece from previous implementation of __get(...) function is return real value or -9999.9 if value is not provided by thrift API * Add get_presence() function and revised __get() function Signed-off-by: Vadym Yashchenko <vadymx.yashchenko@intel.com> * [BFN] Updated PSU platform APIs impl Signed-off-by: Dmytro Lytvynenko <dmytrox.lytvynenko@intel.com> * Added BFN PSU cache (#9) Signed-off-by: Andriy Kokhan <andriyx.kokhan@intel.com> * [BFN] Fans and Fantray platform APIs update (#7) * [BFN] Updated SFP platform APIs (#10) Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com> * [BFN] Updated platform API for thermal (#8) * Signed-off-by: Vadym Yashchenko <vadymx.yashchenko@intel.com> * Revert "[BFN] Fans and Fantray platform APIs update (#7)" (#11) This reverts commit c62a733. * Add support health monitor system (#15) Signed-off-by: Petro Bratash <petrox.bratash@intel.com> * Update chassis.py * [BFN] Updated FANs and FAN Tray platform API (#14) * Fix fix_alignment (#17) Signed-off-by: Petro Bratash <petrox.bratash@intel.com> * [BFN] Improvement show environment (#16) * Added PSU temperature skip into platform.json (#18) Signed-off-by: Andriy Kokhan <andriyx.kokhan@intel.com> * Do not skip psud on Newport Signed-off-by: Andriy Kokhan <andriyx.kokhan@intel.com> * [BFN] fix fan status from Not OK to Ok (#19) * [BFN] Updated SFP platform plugin (#13) Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com> * [DPB] Fix typo for Ethernet0 2x200G[100G,40G] breakout mode (#21) Signed-off-by: Mykola Gerasymenko <mykolax.gerasymenko@intel.com> * [barefoot] Tmp fix vendor_rev (#22) Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com> * Fixed python issues in sonic_platform/fan_drawer.py Signed-off-by: Andriy Kokhan <andriyx.kokhan@intel.com> * Updated fan_drawer.py * Fixing trailing white spaces in fan_drawer.py * [BFN] Fix thrift for SFPs API Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com> * In platform.json, replaced 'false' with '0' to workaround ast.literal_eval() issue Signed-off-by: Andriy Kokhan <andriyx.kokhan@intel.com> * [Newport] Thermal manager (#23) * Signed-off-by: Vadym Yashchenko <vadymx.yashchenko@intel.com> * Revert "In platform.json, replaced 'false' with '0' to workaround ast.literal_eval() issue" This reverts commit 1e73127. * Removed 'controllable' options from platform.json to fix factory default config generation Signed-off-by: Andriy Kokhan <andriyx.kokhan@intel.com> * Update thermal_manager.py * Migrated SFP plugin to sonic_xcvr API (#30) Signed-off-by: Andriy Kokhan <andriyx.kokhan@intel.com> Co-authored-by: KostiantynYarovyiBf <kostiantynx.yarovyi@intel.com> Co-authored-by: Vadym Yashchenko <vadymx.yashchenko@intel.com> Co-authored-by: Dmytro Lytvynenko <dmytrox.lytvynenko@intel.com> Co-authored-by: Volodymyr Boiko <volodymyrx.boiko@intel.com> Co-authored-by: Petro Bratash <petrox.bratash@intel.com> Co-authored-by: Mykola Gerasymenko <mykolax.gerasymenko@intel.com>
- What I did
Prevent pmon from starting ahead of syncd by adding syncd.service as "After" of pmon.service.
This PR is one of the options that solve the comments of [syncd.sh] stop pmon ahead of syncd in flows except warm reboot #7
- How I did it
During system starting, pmon isn't supposed to start ahead of syncd starting in order to avoid racing condition between syncd and pmon.
Currently it is done by killing pmon if is alive when syncd is starting. However such implementation is still risky. Consider the following flow:
To prevent that issue, ony solution is to add syncd as "After" in pmon.service, which ensure that whenever pmon starts syncd has been started.
However, dong so requires to defer starting pmon.service after syncd.service has fully started otherwise a deadlock is formed as following:
To solve that, move starting pmon.service to "wait()" so that pmon is started after syncd fully started, breaking the deadlock.
- How to verify it
- Description for the changelog
- A picture of a cute animal (not mandatory but encouraged)