Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[syncd.sh] stop pmon ahead of syncd in flows except warm reboot #7

Closed
wants to merge 7 commits into from

Conversation

stephenxs
Copy link
Owner

@stephenxs stephenxs commented Sep 4, 2019

- What I did
Issue Overview
shutdown flow
For any shutdown flow, which means all dockers are stopped in order, pmon docker stops after syncd docker has stopped, causing pmon docker fail to release sx_core resources and leaving sx_core in a bad state. The related logs are like the following:

INFO syncd.sh[23597]: modprobe: FATAL: Module sx_core is in use.
INFO syncd.sh[23597]: Unloading sx_core[FAILED]
INFO syncd.sh[23597]: rmmod: ERROR: Module sx_core is in use

config reload & service swss.restart
In the flows like "config reload" and "service swss restart", the failure cause further consequences:

  1. sx_core initialization error with error message like "sx_core: create EMAD sdq 0 failed. err: -16"
  2. syncd fails to execute the create switch api with error message "syncd_main: Runtime error: :- processEvent: failed to execute api: create, key: SAI_OBJECT_TYPE_SWITCH:oid:0x21000000000000, status: SAI_STATUS_FAILURE"
  3. swss fails to call SAI API "SAI_SWITCH_ATTR_INIT_SWITCH", which causes orchagent to restart. This will introduce an extra 1 or 2 minutes for the system to be available, failing related test cases.

reboot, warm-reboot & fast-reboot
In the reboot flows including "reboot", "fast-reboot" and "warm-reboot" this failure doesn't have further negative effects since the system has already rebooted. In addition, "warm-reboot" requires the system to be shutdown as soon as possible to meet the GR time restriction of both BGP and LACP. "fast-reboot" also requires to meet the GR time restriction of BGP which is longer than LACP. In this sense, any unnecessary steps should be avoided. It's better to keep those flows untouched.

summary
To summarize, we have to come up with a way to ensure:

  1. shutdown pmon docker ahead of syncd for "config reload" or "service swss restart" flow;
  2. don't shutdown pmon docker ahead of syncd for "fast-reboot" or "warm-reboot" flow in order to save time.
  3. for "reboot" flow, either order is acceptable.

Solution
To solve the issue, pmon shoud be stopped ahead of syncd stopped for all flows except for the warm-reboot.

- How I did it

  1. To stop pmon ahead of syncd stopped. This is done in /usr/local/bin/syncd.sh::stop() and for all shutdown sequence.
  2. Now pmon stops ahead of syncd so there must be a way in which pmon can start after syncd started. Another point that should be taken consideration is that pmon starting should be deferred so that services which have the logic of graceful restart in fast-reboot and warm-reboot have sufficient CPU cycles to meet their deadline.
    This is done by add "syncd.service" as "After" to pmon.service and startin /usr/local/bin/syncd.sh::wait()
    To start pmon automatically after syncd started.

- How to verify it
Test the following flows and ensure pmon and syncd started and stopped in the correct sequence:

  1. config reload
  2. service swss restart
  3. regular reboot
  4. warm-reboot
  5. fast-reboot

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

@keboliu
Copy link
Collaborator

keboliu commented Sep 5, 2019

as we have discussed, not sure whether we should change the reload flow since pmon will be sopped in this change.

@stephenxs
Copy link
Owner Author

as we have discussed, not sure whether we should change the reload flow since pmon will be sopped in this change.

Yes, we should. This part has been committed in [config/main.py] don't start/stop pmon during config reload#3

fi
/usr/bin/hw-management.sh chipdown
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previously if pmon not active, there is no action to pmon service, here we changed the behavior, no matter pmon is active or not, it will be restarted, I would like to hear comments from someone else.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previously if pmon not active, there is no action to pmon service, here we changed the behavior, no matter pmon is active or not, it will be restarted, I would like to hear comments from someone else.

Originally there is no dependency between pmon and syncd, which means they can start/stop independently. Now we've introduced a dependency on syncd for pmon so that pmon is stopped whenever syncd stops. However there is no mechanism to ensure that pmon will be started after syncd starts especially in case of "systemctl reload swss".
In this sense, we have to make pmon started automatically after syncd started.

/usr/bin/hw-management.sh chipdown
debug "Starting pmon service..."
/bin/systemctl restart pmon
debug "Started pmon service"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"pmon service started"

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"pmon service started"

In order to be consistent with other services.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephenxs looks like these changes prevent pmon restart in case of config reload after warm-boot. Not sure if this is desired behaviour.

files/scripts/syncd.sh Show resolved Hide resolved
@@ -150,6 +151,12 @@ stop() {
TYPE=cold
fi

if [[ x$sonic_asic_platform == x"mellanox" ]] && [[ x$TYPE == x"cold" ]]; then
debug "Stopping pmon service ahead of syncd..."
/bin/systemctl stop pmon
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need chipdown here?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need chipdown here?

I don't think so. But this should be double-checked with @nazariig

Copy link

@nazariig nazariig Sep 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephenxs since the pmon was removed from the config reload sequence, i suggest to do pmon stop for all cases.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nazariig @stephenxs How long does it take to stop pmon? If we do stop pmon for all cases it affects warm shutdown. In warm shutdown every second matters

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stepanblyschak a couple of seconds and yes, it might affect warm-boot flow.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nazariig @stephenxs I must say that right now in master we are very close to the limit 3 lacp timeouts 90sec, which is going to be optimized, however a couple of seconds can kill the warm reboot flow.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This is why pmon isn't stopped for warm-reboot. I suggest remaining the current logic.
BTW, maybe not related, is it possible that others ways, like defer the initialization of not timing-sensitive services, contributes more to optimizing warm reboot than just not to stop pmon gracefully?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephenxs This is done in this way (e.g. SNMP is delayed, however such delay seems to me more like a hack)
And right now even with such optimization sonic is very slow at startup causing trouble in warm reboot scenario, so I suggest not to add additional delays at shutdown. Also originally only syncd should stop gracefully, anything else is either killed by signal or we don't care about graceful shutdown in favor of faster reboot. Even if platform has enough time to not let LAG/BGP flapping it is better to save 2-3 sec for the case when subsequent changes in configuration may slow down the startup

@@ -104,11 +104,12 @@ start() {
if [[ x"$WARM_BOOT" != x"true" ]]; then
if [[ x"$(/bin/systemctl is-active pmon)" == x"active" ]]; then

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephenxs since you have removed pmon restart from CLI, we do not need this anymore.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephenxs i suggest the next flow:

if [[ x"$WARM_BOOT" != x"true" ]]; then
    /usr/bin/hw-management.sh chipdown
fi

/bin/systemctl restart pmon

if [[ x"$BOOT_TYPE" == x"fast" ]]; then
    /usr/bin/hw-management.sh chipupdis
fi

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct. The original code prevents pmon from starting in warm-reboot flow.

Copy link
Owner Author

@stephenxs stephenxs Sep 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephenxs since you have removed pmon restart from CLI, we do not need this anymore.

I am also considering to remove it. originally I intend to make a protection here so that if pmon is still alive after syncd died somehow we can make it sure that pmon is stopped to avoid racing condition. however it seems that this situation has never happened.
so if all of us think it's unnecessary I am going to remove it.
@keboliu @nazariig @stepanblyschak what's your opinion?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephenxs since you have removed pmon restart from CLI, we do not need this anymore.

When system is booting pmon can start ahead of syncd. To solve this we can put protection here.
Another way to solve this is to only add syncd.service as "after" of pmon.service, just like what is done for syncd.service. see sonic-buildimage/files/build_templates/syncd.service.j2
@keboliu @nazariig @stepanblyschak

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephenxs after warm-boot it looks like the pmon won't be restrtaed at all regardless of any user actions except explicit service restart. This can be an issue.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephenxs after warm-boot it looks like the pmon won't be restrtaed at all regardless of any user actions except explicit service restart. This can be an issue.

This has been fixed. I haven't yet uploaded the code. Just as what you suggested.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephenxs the flow becomes very complicated and some aspects are missing comparing to original config reload. Maybe we should revisit the architecture?

I'm afraid that the original implementation may have an issue. Consider the following flow:
at the beginning the pmon is not active, and then syncd.sh is scheduled out somehow just ahead of "chipdown", and then systemd is scheduled and starts pmon, and then pmon can run simultaneously with chipdown, the race condition formed and critical section broken.
is it possible?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

originally I intend to make a protection here so that if pmon is still alive after syncd died somehow we can make it sure that pmon is stopped to avoid racing condition.

@stephenxs nice catch. This also should be taken into consideration.

To address this situation, we have two options,

  1. to remain the logic that stops pmon if it was active ahead of syncd starting. As I mentioned before, it's also risky even the possibility is very low. If we intend to address this issue, we have:
  2. [syncd.sh,pmon.service] Prevent pmon from starting ahead of syncd

Copy link
Owner Author

@stephenxs stephenxs Sep 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephenxs the flow becomes very complicated and some aspects are missing comparing to original config reload. Maybe we should revisit the architecture?

Having gone through all the flow, the difference compared to the original flow includes:
for config reload and swss restart, pmon stops ahead of syncd in the updated flow and after syncd in the original one.
for warm-reboot, in the original flow, the point when pmon starts can be one of the following 2 cases:
1. ahead of syncd started. in this case, the updated flow doesn't have any difference.
2. at any point after syncd. in this case, in the updated flow the pmon starts immediate after "chipdown" called, which means it may introduce a bit more latency for warm reboot. It can be demonstrated as the following (The "ORI FLOW" stands for the time sequence in the original flow and the "UDT FLOW" stands for that of the updated one):
ORI FLOW: --syncd starting------chip down--------syncd fully started-----pmon starting------
UDT FLOW: --syncd starting------chip down--pmon starting---syncd fully started-------------
Fortunately, per my test it is always the first case which means in most cases no extra latency introduced. However, if we decide to address the second case, we can do as [syncd.sh,pmon.service] Prevent pmon from starting ahead of syncd

@@ -150,6 +151,12 @@ stop() {
TYPE=cold
fi

if [[ x$sonic_asic_platform == x"mellanox" ]] && [[ x$TYPE == x"cold" ]]; then

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that TYPE is set according to /proc/cmdline.
In the flow like this:

1. sudo warm-reboot
2. # wait for warmboot-finalaizer to finish and disable services WR mode
3. sudo systemctl restart swss # which restarts syncd

In such flow x$TYPE == x"fastfast" but obviously such service restart was meant to be cold

Copy link
Owner Author

@stephenxs stephenxs Sep 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the code, type is determined by whether WARM_BOOT being "true" and WARM_BOOT is determined by whether WARM_RESTART_ENABLE_TABLE|system and WARM_RESTART_ENABLE_TABLE|syncd contain "enable" in the redisdb.
Do you mean even warmboot is stopped (finalized), WARM_RESTART_ENABLE_TABLE|system or WARM_RESTART_ENABLE_TABLE|syncd remains "enable"?
If so, is there any way to check whether the current flow is warm-reboot?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephenxs Oh, yes. We have 3 variables here representing almost the same thing (BOOT_TYPE, WARM_BOOT, TYPE) which confused me

fi
/usr/bin/hw-management.sh chipdown
debug "Starting pmon service..."
/bin/systemctl restart pmon

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any preference for 'restart' over 'start' ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case 'Restarting pmon service...' debug message should better describe the flow

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to use "restart" here. Use 'start' instead.

Stephen Sun added 2 commits September 12, 2019 16:40
…onal branch to fix the issue that pmon not started after warm reboot
During system starting, pmon isn't supposed to start ahead of syncd starting in order to avoid racing condition between syncd and pmon.
Currently it is done by killing pmon if is alive when syncd is starting. However such implementation is still risky. Consider the following flow:
1. pmon is inactive when syncd.sh is checking. but syncd.sh is scheduled out somehow just ahead of "chipdown" called
2. systemd is switched in and starts pmon service
3. at this point, pmon and syncd are running simultaneously, critical section broken and racing condition formed
To prevent that issue, ony solution is to add syncd as "After" in pmon.service, which ensure that whenever pmon starts syncd has been started.
However, dong so requires to defer starting pmon.service after syncd.service has fully started otherwise a deadlock is formed as following:
1. syncd.sh starts pmon ahead of itself fully started, while
2. pmon not being able to start due to syncd, one of its "After", not fully started.
3. as a result, syncd and pmon have to wait for each other forever
To solve that, move starting pmon.service to "wait()" so that pmon is started after syncd fully started, breaking the deadlock.
@nazariig
Copy link

nazariig commented Sep 16, 2019

Summary:

COLD

reboot

root@sonic:/home/admin# cat /service.log
Mon Sep 16 13:09:52 UTC 2019 [pmon]: => start service
Mon Sep 16 13:10:33 UTC 2019 [syncd]: => start service
Mon Sep 16 13:10:45 UTC 2019 [syncd]: => WARM_BOOT: false
Mon Sep 16 13:10:45 UTC 2019 [syncd]: => BOOT_TYPE: cold
Mon Sep 16 13:10:45 UTC 2019 [syncd]: => do pmon stop
Mon Sep 16 13:10:45 UTC 2019 [pmon]: => stop service
Mon Sep 16 13:10:50 UTC 2019 [syncd]: => hw-mgmt chipdown
Mon Sep 16 13:10:52 UTC 2019 [syncd]: => do pmon start
Mon Sep 16 13:10:52 UTC 2019 [pmon]: => start service
Mon Sep 16 13:11:22 UTC 2019 [syncd]: => start docker

config reload -y

root@sonic:/home/admin# cat /service.log
Mon Sep 16 13:04:45 UTC 2019 [syncd]: => stop service
Mon Sep 16 13:04:46 UTC 2019 [syncd]: => WARM_BOOT: false
Mon Sep 16 13:04:46 UTC 2019 [syncd]: => TYPE: cold
Mon Sep 16 13:04:46 UTC 2019 [syncd]: => do pmon stop
Mon Sep 16 13:04:46 UTC 2019 [pmon]: => stop service
Mon Sep 16 13:04:49 UTC 2019 [syncd]: => stop docker
Mon Sep 16 13:06:23 UTC 2019 [syncd]: => start service
Mon Sep 16 13:06:30 UTC 2019 [syncd]: => WARM_BOOT: false
Mon Sep 16 13:06:30 UTC 2019 [syncd]: => BOOT_TYPE: cold
Mon Sep 16 13:06:30 UTC 2019 [syncd]: => hw-mgmt chipdown
Mon Sep 16 13:06:32 UTC 2019 [syncd]: => do pmon start
Mon Sep 16 13:06:32 UTC 2019 [pmon]: => start service
Mon Sep 16 13:07:12 UTC 2019 [syncd]: => start docker

FAST

fast-reboot

root@sonic:/home/admin# cat /service.log
Mon Sep 16 12:54:14 UTC 2019 [syncd]: => stop service
Mon Sep 16 12:54:15 UTC 2019 [syncd]: => WARM_BOOT: false
Mon Sep 16 12:54:15 UTC 2019 [syncd]: => TYPE: cold
Mon Sep 16 12:54:15 UTC 2019 [syncd]: => do pmon stop
Mon Sep 16 12:54:15 UTC 2019 [pmon]: => stop service
Mon Sep 16 12:54:17 UTC 2019 [syncd]: => stop docker
Mon Sep 16 12:54:49 UTC 2019 [pmon]: => start service
Mon Sep 16 12:55:30 UTC 2019 [syncd]: => start service
Mon Sep 16 12:55:38 UTC 2019 [syncd]: => WARM_BOOT: false
Mon Sep 16 12:55:38 UTC 2019 [syncd]: => BOOT_TYPE: fast
Mon Sep 16 12:55:38 UTC 2019 [syncd]: => do pmon stop
Mon Sep 16 12:55:38 UTC 2019 [pmon]: => stop service
Mon Sep 16 12:55:44 UTC 2019 [syncd]: => hw-mgmt chipdown
Mon Sep 16 12:55:45 UTC 2019 [syncd]: => do pmon start
Mon Sep 16 12:55:45 UTC 2019 [pmon]: => start service
Mon Sep 16 12:56:06 UTC 2019 [syncd]: => hw-mgmt chipupdis
Mon Sep 16 12:56:10 UTC 2019 [syncd]: => start docker

config reload -y

root@sonic:/home/admin# cat /service.log
Mon Sep 16 12:43:35 UTC 2019 [syncd]: => stop service
Mon Sep 16 12:43:36 UTC 2019 [syncd]: => WARM_BOOT: false
Mon Sep 16 12:43:36 UTC 2019 [syncd]: => TYPE: cold
Mon Sep 16 12:43:36 UTC 2019 [syncd]: => do pmon stop
Mon Sep 16 12:43:36 UTC 2019 [pmon]: => stop service
Mon Sep 16 12:43:39 UTC 2019 [syncd]: => stop docker
Mon Sep 16 12:44:22 UTC 2019 [syncd]: => start service
Mon Sep 16 12:44:29 UTC 2019 [syncd]: => WARM_BOOT: false
Mon Sep 16 12:44:29 UTC 2019 [syncd]: => BOOT_TYPE: cold
Mon Sep 16 12:44:29 UTC 2019 [syncd]: => hw-mgmt chipdown
Mon Sep 16 12:44:30 UTC 2019 [syncd]: => do pmon start
Mon Sep 16 12:44:30 UTC 2019 [pmon]: => start service
Mon Sep 16 12:45:09 UTC 2019 [syncd]: => start docker

WARM

warm-reboot

root@sonic:/home/admin# cat /service.log
Mon Sep 16 13:13:57 UTC 2019 [syncd]: => stop service
Mon Sep 16 13:13:58 UTC 2019 [syncd]: => WARM_BOOT: true
Mon Sep 16 13:13:58 UTC 2019 [syncd]: => TYPE: warm
Mon Sep 16 13:14:00 UTC 2019 [syncd]: => stop docker
Mon Sep 16 13:14:03 UTC 2019 [pmon]: => stop service
Mon Sep 16 13:14:30 UTC 2019 [pmon]: => start service
Mon Sep 16 13:15:14 UTC 2019 [syncd]: => start service
Mon Sep 16 13:15:31 UTC 2019 [syncd]: => WARM_BOOT: true
Mon Sep 16 13:15:31 UTC 2019 [syncd]: => BOOT_TYPE: fastfast
Mon Sep 16 13:15:31 UTC 2019 [syncd]: => do pmon start
Mon Sep 16 13:15:38 UTC 2019 [syncd]: => start docker

config reload -y

root@sonic:/home/admin# cat /service.log
Mon Sep 16 13:19:26 UTC 2019 [syncd]: => stop service
Mon Sep 16 13:19:26 UTC 2019 [syncd]: => WARM_BOOT: false
Mon Sep 16 13:19:26 UTC 2019 [syncd]: => TYPE: cold
Mon Sep 16 13:19:26 UTC 2019 [syncd]: => do pmon stop
Mon Sep 16 13:19:26 UTC 2019 [pmon]: => stop service
Mon Sep 16 13:19:29 UTC 2019 [syncd]: => stop docker
Mon Sep 16 13:20:06 UTC 2019 [syncd]: => start service
Mon Sep 16 13:20:13 UTC 2019 [syncd]: => WARM_BOOT: false
Mon Sep 16 13:20:13 UTC 2019 [syncd]: => BOOT_TYPE: fastfast
Mon Sep 16 13:20:13 UTC 2019 [syncd]: => hw-mgmt chipdown
Mon Sep 16 13:20:14 UTC 2019 [syncd]: => do pmon start
Mon Sep 16 13:20:14 UTC 2019 [pmon]: => start service
Mon Sep 16 13:20:53 UTC 2019 [syncd]: => start docker

As you can see we have extra pmon start/stop on system boot for cold/fast flows.
Maybe we can use systemctl is-system-running to avoid that?

@stephenxs
Copy link
Owner Author

As you can see we have extra pmon start/stop on system boot for cold/fast flows.
Maybe we can use systemctl is-system-running to avoid that?

Which pmon start/stop do you think is extra, can you elaborate it?
As far as I see it, for the cold start flow,
Mon Sep 16 13:04:46 UTC 2019 [syncd]: => do pmon stop
is followed by
Mon Sep 16 13:10:50 UTC 2019 [syncd]: => hw-mgmt chipdown
which means pmon is stopped due to it has started ahead of syncd.

And for the fast boot flow,
Mon Sep 16 12:54:15 UTC 2019 [syncd]: => do pmon stop
is part of shutting down procedure of the fast reboot, since we can see it is followd by

Sep 16 12:54:32.903975 mtbc-sonic-01-2410 INFO systemd[1]: Starting LSB: Execute the kexec -e command to reboot system...
Sep 16 12:54:32.877427 mtbc-sonic-01-2410 INFO systemd[1]: Started LSB: Execute the kexec -e command to reboot system.

which is the mark of reloading the kernel.

And
Mon Sep 16 12:55:38 UTC 2019 [syncd]: => do pmon stop
is followed by
Mon Sep 16 12:44:29 UTC 2019 [syncd]: => hw-mgmt chipdown
which means pmon is stopped due to it has started ahead of syncd.

BTW, I'm not quite familiar with "is-system-running", why it helps here?

@nazariig
Copy link

@stephenxs For the FAST boot flow i meant:

Mon Sep 16 12:55:38 UTC 2019 [syncd]: => WARM_BOOT: false
Mon Sep 16 12:55:38 UTC 2019 [syncd]: => BOOT_TYPE: fast
Mon Sep 16 12:55:38 UTC 2019 [syncd]: => do pmon stop
Mon Sep 16 12:55:38 UTC 2019 [pmon]: => stop service

And regarding "is-system-running" - we can try something like this:

if [[ x"$(/bin/systemctl is-active pmon)" == x"active" ]]; then
    if [[ x"$(/bin/systemctl is-system-running)" != x"starting" ]]; then # to avoid stop on system boot
        /bin/systemctl stop pmon
        debug "pmon is active while syncd starting, stop it first"
    fi
fi
/usr/bin/hw-management.sh chipdown

P.S: this is valid only in case After=syncd.service is present in pmon [Unit] section

…-start

[syncd.sh,pmon.service] Prevent pmon from starting ahead of syncd
@stephenxs
Copy link
Owner Author

Community PR [Mellanox] Stop pmon ahead of syncd #3505 has been created

@stephenxs stephenxs closed this Sep 25, 2019
@stephenxs stephenxs deleted the pmon-dependency-syncd branch September 25, 2019 20:54
stephenxs pushed a commit that referenced this pull request Dec 10, 2020
This update brings in the following commits.

86c1108 Enable arm architecture to build in addition to amd64 (#37)
4acb2c3 fix bugs and enhance Transformer (#35)
49e5a22 ygot related enhancements and fixes (#34)
51224de Fix ietf yang search path for cvl schema builds (#32)
3c6cdb3 CVL Changes #8: 'must' and 'when' expression evaluation (#31)
dabf231 CVL Changes #7: 'leafref' evaluation (#28)
6f9535f CVL Changes #6: Customized Xpath Engine integration (#27)
5e2466b DB-Layer fixes/enhancements (#26)
9a27302 CVL Changes #4: Implementation of new CVL APIs (#22)
dbf1093 Translib support for authorization, yang versioning and Delete flag (#21)
80f369e CVL Changes #5: YParser enhancement (#23)
904ce18 CVL Changes #3: Multi-db instance support (#20)
9d24a34 CVL Changes #2:  YValidator infra changes for evaluating xpath expression (#19)
f3fc40f CVL Changes #1: Initial CVL code reorganization and common infra changes (#18)
4922601 Bulk and RPC API support in translib (#16)
1d730df RFC7895 yang module library implementation (#15)
stephenxs pushed a commit that referenced this pull request Nov 16, 2021
1. Fix build for armhf and arm64
2. upgrade centec tsingma bsp support to 5.10 kernel
3. modify centec platform driver for linux 5.10

Co-authored-by: Shi Lei <shil@centecnetworks.com>
stephenxs pushed a commit that referenced this pull request Feb 10, 2022
* [BFN] Updated platform APIs impl

Signed-off-by: Andriy Kokhan <andriyx.kokhan@intel.com>

* Extended BFN platform SFP APIs implementation

* Update sfp.py

* [BFN] Extended SFP platform plugin implementation

Signed-off-by: Andriy Kokhan <andriyx.kokhan@intel.com>

* [BFN] Extended Fans platform plugin implementation

* [BFN] divided classes Fan and  FanDrawer into 2 files

* Signed-off-by: Vadym Yashchenko <vadymx.yashchenko@intel.com>

What I did
	Add get_model() function
	Add get_low_critical_threshold() function
	Change __get(...) function.
How I did it
	Differnece from previous implementation of __get(...) function is return real value or -9999.9 if value is not provided by thrift API

* Add get_presence() function and revised __get() function

Signed-off-by: Vadym Yashchenko <vadymx.yashchenko@intel.com>

* [BFN] Updated PSU platform APIs impl

Signed-off-by: Dmytro Lytvynenko <dmytrox.lytvynenko@intel.com>

* Added BFN PSU cache (#9)

Signed-off-by: Andriy Kokhan <andriyx.kokhan@intel.com>

* [BFN]  Fans and Fantray platform APIs update (#7)

* [BFN] Updated SFP platform APIs (#10)

Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>

* [BFN] Updated platform API for thermal (#8)

* Signed-off-by: Vadym Yashchenko <vadymx.yashchenko@intel.com>

* Revert "[BFN]  Fans and Fantray platform APIs update (#7)" (#11)

This reverts commit c62a733.

* Add support health monitor system (#15)

Signed-off-by: Petro Bratash <petrox.bratash@intel.com>

* Update chassis.py

* [BFN] Updated FANs and FAN Tray platform API (#14)

* Fix fix_alignment (#17)

Signed-off-by: Petro Bratash <petrox.bratash@intel.com>

* [BFN] Improvement show environment (#16)

* Added PSU temperature skip into platform.json (#18)

Signed-off-by: Andriy Kokhan <andriyx.kokhan@intel.com>

* Do not skip psud on Newport

Signed-off-by: Andriy Kokhan <andriyx.kokhan@intel.com>

* [BFN] fix fan status from Not OK to Ok (#19)

* [BFN] Updated SFP platform plugin (#13)

Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>

* [DPB] Fix typo for Ethernet0 2x200G[100G,40G] breakout mode (#21)

Signed-off-by: Mykola Gerasymenko <mykolax.gerasymenko@intel.com>

* [barefoot] Tmp fix vendor_rev (#22)

Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>

* Fixed python issues in sonic_platform/fan_drawer.py

Signed-off-by: Andriy Kokhan <andriyx.kokhan@intel.com>

* Updated fan_drawer.py

* Fixing trailing white spaces in fan_drawer.py

* [BFN] Fix thrift for SFPs API

Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>

* In platform.json, replaced 'false' with '0' to workaround ast.literal_eval() issue

Signed-off-by: Andriy Kokhan <andriyx.kokhan@intel.com>

* [Newport] Thermal manager  (#23)

* Signed-off-by: Vadym Yashchenko <vadymx.yashchenko@intel.com>

* Revert "In platform.json, replaced 'false' with '0' to workaround ast.literal_eval() issue"

This reverts commit 1e73127.

* Removed 'controllable' options from platform.json to fix factory default config generation

Signed-off-by: Andriy Kokhan <andriyx.kokhan@intel.com>

* Update thermal_manager.py

* Migrated SFP plugin to sonic_xcvr API (#30)

Signed-off-by: Andriy Kokhan <andriyx.kokhan@intel.com>

Co-authored-by: KostiantynYarovyiBf <kostiantynx.yarovyi@intel.com>
Co-authored-by: Vadym Yashchenko <vadymx.yashchenko@intel.com>
Co-authored-by: Dmytro Lytvynenko <dmytrox.lytvynenko@intel.com>
Co-authored-by: Volodymyr Boiko <volodymyrx.boiko@intel.com>
Co-authored-by: Petro Bratash <petrox.bratash@intel.com>
Co-authored-by: Mykola Gerasymenko <mykolax.gerasymenko@intel.com>
stephenxs pushed a commit that referenced this pull request Nov 4, 2022
#### Why I did it
Update sonic-host-services submodule to include below commits:
```
bc8698d Merge pull request #21 from abdosi/feature
557a110 Fix the issue where if dest port is not specified in ACL rule than for multi-asic where we create NAT rule to forward traffic from Namespace to host fail with exception.
6e45acc (master) Merge pull request #14 from abdosi/feature
4d6cad7 Merge remote-tracking branch 'upstream/master' into feature
bceb13e Install libyang to azure pipeline (#20)
82299f5 Merge pull request #13 from SuvarnaMeenakshi/cacl_fabricns
15d3bf4 Merge branch 'master' into cacl_fabricns
de54082 Merge pull request #16 from ZhaohuiS/feature/caclmgrd_external_client_warning_log
b4b368d Add warning log if destination port is not defined
d4bb96d Merge branch 'master' into cacl_fabricns
35c76cb Add unit-test and fix typo.
17d44c2 Made Changes to be Python 3.7 compatible
978afb5 Aligning Code
1fbf8fb Merge remote-tracking branch 'upstream/master' into feature
7b8c7d1 Added UT for the changes
91c4c42 Merge pull request #9 from ZhaohuiS/feature/caclmgrd_external_client
7c0b56a Add 4 test cases for external_client_acl, including single port and port range for ipv4 and ipv6
b71e507 Merge remote-tracking branch 'origin/master' into HEAD
d992dc0 Merge branch 'master' into feature/caclmgrd_external_client
bd7b172 DST_PORT is configuralbe in json config file for EXTERNAL_CLIENT_ACL
f9af7ae [CLI] Move hostname, mgmt interface/vrf config to hostcfgd (#2)
70ce6a3 Merge pull request #10 from sujinmkang/cold_reset
29be8d2 Added Support to render Feature Table using Device running metadata. Also added support to render 'has_asic_scope' field of Feature Table.
3437e35 [caclmgrd][chassis]: Add ip tables rules to accept internal docker traffic from fabric asic namespaces.
8720561 Fix and add hardware reboot cause determination tests
0dcc7fe remove the empty bracket if no hardware reboot cause minor
e47d831 fix the wrong expected result comparision
ef86b53 Fix startswith Attribute error
8a630bb fix mock patch
8543ddf update the reboot cause logic and update the unit test
53ad7cd fix the mock patch function
7c8003d fix the reboot-cause regix for test
1ba611f fix typo
25379d3 Add unit test case
a56133b Add hardware reboot cause as actual reboot cause for soft reboot failed
c7d3833 Support Restapi/gnmi control plane acls
f6ea036 caclmgrd: Don't block traffic to mgmt by default
a712fc4 Update test cases
adc058b caclmgrd: Don't block traffic to mgmt by default
06ff918 Merge pull request #7 from bluecmd/patch-1
e3e23bc ci: Rename sonic-buildimage repository
e83a858 Merge pull request #4 from kamelnetworks/acl-ip2me-test
f5a2e50 [caclmgrd]: Tests for IP2ME rules generation
```
stephenxs pushed a commit that referenced this pull request Jul 3, 2023
…sonic-net#15634)

#### Why I did it
src/dhcpmon
```
* 824a144 - (HEAD -> master, origin/master, origin/HEAD) replace atoi with strtol (#6) (3 hours ago) [Mai Bui]
* 32c0c3f - Fix libswsscommon package installation for non-amd64 (#7) (6 hours ago) [Saikrishna Arcot]
```
#### How I did it
#### How to verify it
#### Description for the changelog
stephenxs pushed a commit that referenced this pull request Dec 26, 2023
Why I did it
Advance dhcpmon to a3c5381 in 202305 branch.

a3c5381 - (HEAD, origin/master, origin/HEAD, master) Merge pull request src: Add libnl3 build.sh script #11 from jcaiMR/dev/jcai_fix_err_log (11 days ago) [StormLiangMS]
c5ef7e7 - Change common_libs dependencies from buster to bullseye (Updating docker-orchagent/syncd Dockerfile and start.sh #9)
824a144 - replace atoi with strtol (Rename hostname #6) (10 weeks ago) [Mai Bui]
32c0c3f - Fix libswsscommon package installation for non-amd64 (README.md leaves out docker-database #7) (10 weeks ago) [Saikrishna Arcot]
Work item tracking
Microsoft ADO (25048723):
How I did it
How to verify it
Run test_dhcp_relay.py, no failure
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants