Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[syncd.sh] stop pmon ahead of syncd in flows except warm reboot #7

Closed
wants to merge 7 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions files/build_templates/pmon.service.j2
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@
Description=Platform monitor container
Requires=updategraph.service
After=updategraph.service
{% if sonic_asic_platform == 'mellanox' %}
After=syncd.service
{% endif %}
Before=ntp-config.service

[Service]
Expand Down
17 changes: 13 additions & 4 deletions files/scripts/syncd.sh
Original file line number Diff line number Diff line change
Expand Up @@ -104,11 +104,9 @@ start() {
if [[ x"$WARM_BOOT" != x"true" ]]; then
if [[ x"$(/bin/systemctl is-active pmon)" == x"active" ]]; then

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephenxs since you have removed pmon restart from CLI, we do not need this anymore.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephenxs i suggest the next flow:

if [[ x"$WARM_BOOT" != x"true" ]]; then
    /usr/bin/hw-management.sh chipdown
fi

/bin/systemctl restart pmon

if [[ x"$BOOT_TYPE" == x"fast" ]]; then
    /usr/bin/hw-management.sh chipupdis
fi

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct. The original code prevents pmon from starting in warm-reboot flow.

Copy link
Owner Author

@stephenxs stephenxs Sep 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephenxs since you have removed pmon restart from CLI, we do not need this anymore.

I am also considering to remove it. originally I intend to make a protection here so that if pmon is still alive after syncd died somehow we can make it sure that pmon is stopped to avoid racing condition. however it seems that this situation has never happened.
so if all of us think it's unnecessary I am going to remove it.
@keboliu @nazariig @stepanblyschak what's your opinion?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephenxs since you have removed pmon restart from CLI, we do not need this anymore.

When system is booting pmon can start ahead of syncd. To solve this we can put protection here.
Another way to solve this is to only add syncd.service as "after" of pmon.service, just like what is done for syncd.service. see sonic-buildimage/files/build_templates/syncd.service.j2
@keboliu @nazariig @stepanblyschak

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephenxs after warm-boot it looks like the pmon won't be restrtaed at all regardless of any user actions except explicit service restart. This can be an issue.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephenxs after warm-boot it looks like the pmon won't be restrtaed at all regardless of any user actions except explicit service restart. This can be an issue.

This has been fixed. I haven't yet uploaded the code. Just as what you suggested.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephenxs the flow becomes very complicated and some aspects are missing comparing to original config reload. Maybe we should revisit the architecture?

I'm afraid that the original implementation may have an issue. Consider the following flow:
at the beginning the pmon is not active, and then syncd.sh is scheduled out somehow just ahead of "chipdown", and then systemd is scheduled and starts pmon, and then pmon can run simultaneously with chipdown, the race condition formed and critical section broken.
is it possible?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

originally I intend to make a protection here so that if pmon is still alive after syncd died somehow we can make it sure that pmon is stopped to avoid racing condition.

@stephenxs nice catch. This also should be taken into consideration.

To address this situation, we have two options,

  1. to remain the logic that stops pmon if it was active ahead of syncd starting. As I mentioned before, it's also risky even the possibility is very low. If we intend to address this issue, we have:
  2. [syncd.sh,pmon.service] Prevent pmon from starting ahead of syncd

Copy link
Owner Author

@stephenxs stephenxs Sep 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephenxs the flow becomes very complicated and some aspects are missing comparing to original config reload. Maybe we should revisit the architecture?

Having gone through all the flow, the difference compared to the original flow includes:
for config reload and swss restart, pmon stops ahead of syncd in the updated flow and after syncd in the original one.
for warm-reboot, in the original flow, the point when pmon starts can be one of the following 2 cases:
1. ahead of syncd started. in this case, the updated flow doesn't have any difference.
2. at any point after syncd. in this case, in the updated flow the pmon starts immediate after "chipdown" called, which means it may introduce a bit more latency for warm reboot. It can be demonstrated as the following (The "ORI FLOW" stands for the time sequence in the original flow and the "UDT FLOW" stands for that of the updated one):
ORI FLOW: --syncd starting------chip down--------syncd fully started-----pmon starting------
UDT FLOW: --syncd starting------chip down--pmon starting---syncd fully started-------------
Fortunately, per my test it is always the first case which means in most cases no extra latency introduced. However, if we decide to address the second case, we can do as [syncd.sh,pmon.service] Prevent pmon from starting ahead of syncd

/bin/systemctl stop pmon
/usr/bin/hw-management.sh chipdown
/bin/systemctl restart pmon
else
/usr/bin/hw-management.sh chipdown
debug "pmon is active while syncd starting, stop it first"
fi
/usr/bin/hw-management.sh chipdown
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previously if pmon not active, there is no action to pmon service, here we changed the behavior, no matter pmon is active or not, it will be restarted, I would like to hear comments from someone else.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previously if pmon not active, there is no action to pmon service, here we changed the behavior, no matter pmon is active or not, it will be restarted, I would like to hear comments from someone else.

Originally there is no dependency between pmon and syncd, which means they can start/stop independently. Now we've introduced a dependency on syncd for pmon so that pmon is stopped whenever syncd stops. However there is no mechanism to ensure that pmon will be started after syncd starts especially in case of "systemctl reload swss".
In this sense, we have to make pmon started automatically after syncd started.

fi

if [[ x"$BOOT_TYPE" == x"fast" ]]; then
Expand All @@ -134,6 +132,11 @@ start() {
}

wait() {
if [[ x"$sonic_asic_platform" == x"mellanox" ]]; then
debug "Starting pmon service..."
/bin/systemctl start pmon
debug "Started pmon service"
fi
/usr/bin/${SERVICE}.sh wait
}

Expand All @@ -150,6 +153,12 @@ stop() {
TYPE=cold
fi

if [[ x$sonic_asic_platform == x"mellanox" ]] && [[ x$TYPE == x"cold" ]]; then

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that TYPE is set according to /proc/cmdline.
In the flow like this:

1. sudo warm-reboot
2. # wait for warmboot-finalaizer to finish and disable services WR mode
3. sudo systemctl restart swss # which restarts syncd

In such flow x$TYPE == x"fastfast" but obviously such service restart was meant to be cold

Copy link
Owner Author

@stephenxs stephenxs Sep 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the code, type is determined by whether WARM_BOOT being "true" and WARM_BOOT is determined by whether WARM_RESTART_ENABLE_TABLE|system and WARM_RESTART_ENABLE_TABLE|syncd contain "enable" in the redisdb.
Do you mean even warmboot is stopped (finalized), WARM_RESTART_ENABLE_TABLE|system or WARM_RESTART_ENABLE_TABLE|syncd remains "enable"?
If so, is there any way to check whether the current flow is warm-reboot?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephenxs Oh, yes. We have 3 variables here representing almost the same thing (BOOT_TYPE, WARM_BOOT, TYPE) which confused me

debug "Stopping pmon service ahead of syncd..."
/bin/systemctl stop pmon
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need chipdown here?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need chipdown here?

I don't think so. But this should be double-checked with @nazariig

Copy link

@nazariig nazariig Sep 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephenxs since the pmon was removed from the config reload sequence, i suggest to do pmon stop for all cases.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nazariig @stephenxs How long does it take to stop pmon? If we do stop pmon for all cases it affects warm shutdown. In warm shutdown every second matters

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stepanblyschak a couple of seconds and yes, it might affect warm-boot flow.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nazariig @stephenxs I must say that right now in master we are very close to the limit 3 lacp timeouts 90sec, which is going to be optimized, however a couple of seconds can kill the warm reboot flow.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This is why pmon isn't stopped for warm-reboot. I suggest remaining the current logic.
BTW, maybe not related, is it possible that others ways, like defer the initialization of not timing-sensitive services, contributes more to optimizing warm reboot than just not to stop pmon gracefully?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephenxs This is done in this way (e.g. SNMP is delayed, however such delay seems to me more like a hack)
And right now even with such optimization sonic is very slow at startup causing trouble in warm reboot scenario, so I suggest not to add additional delays at shutdown. Also originally only syncd should stop gracefully, anything else is either killed by signal or we don't care about graceful shutdown in favor of faster reboot. Even if platform has enough time to not let LAG/BGP flapping it is better to save 2-3 sec for the case when subsequent changes in configuration may slow down the startup

debug "Stopped pmon service."
stephenxs marked this conversation as resolved.
Show resolved Hide resolved
fi

if [[ x$sonic_asic_platform != x"mellanox" ]] || [[ x$TYPE != x"cold" ]]; then
debug "${TYPE} shutdown syncd process ..."
/usr/bin/docker exec -i syncd /usr/bin/syncd_request_shutdown --${TYPE}
Expand Down