Skip to content

Commit

Permalink
[fast/warm reboot] ignore errors after shutting down critical service…
Browse files Browse the repository at this point in the history
…(s) (sonic-net#761)

Once any critical service is shutdown (radv/swss/syncd), we have to
commit to the reboot. Failing in the middle will leave the system in
bad state.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
  • Loading branch information
yxieca authored Dec 10, 2019
1 parent 0e4fc9c commit 41f5961
Showing 1 changed file with 10 additions and 7 deletions.
17 changes: 10 additions & 7 deletions scripts/fast-reboot
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,6 @@ function request_pre_shutdown()
debug "Requesting pre-shutdown ..."
/usr/bin/docker exec -i syncd /usr/bin/syncd_request_shutdown --pre &> /dev/null || {
error "Failed to request pre-shutdown"
exit "${EXIT_SYNCD_SHUTDOWN}"
}
}
Expand Down Expand Up @@ -180,9 +179,9 @@ function wait_for_pre_shutdown_complete_or_fail()
if [[ x"${STATE}" != x"pre-shutdown-succeeded" ]]; then
debug "Syncd pre-shutdown failed: ${STATE} ..."
exit "${EXIT_SYNCD_SHUTDOWN}"
else
debug "Pre-shutdown succeeded ..."
fi
debug "Pre-shutdown succeeded ..."
}
function backup_database()
Expand Down Expand Up @@ -402,6 +401,10 @@ if [[ "$REBOOT_TYPE" = "warm-reboot" || "$REBOOT_TYPE" = "fastfast-reboot" ]]; t
fi
fi
# We are fully committed to reboot from this point on becasue critical
# service will go down and we cannot recover from it.
set +e
# Kill radv before stopping BGP service to prevent annoucing our departure.
debug "Stopping radv ..."
docker kill radv &>/dev/null || [ $? == 1 ]
Expand Down Expand Up @@ -474,7 +477,7 @@ if [[ "$REBOOT_TYPE" = "warm-reboot" || "$REBOOT_TYPE" = "fastfast-reboot" ]]; t
fi
debug "Stopping syncd ..."
systemctl stop syncd
systemctl stop syncd || debug "Ignore stopping syncd service error $?"
debug "Stopped syncd ..."
# Kill other containers to make the reboot faster
Expand All @@ -485,20 +488,20 @@ debug "Stopping all remaining containers ..."
for CONTAINER_NAME in $(docker ps --format '{{.Names}}'); do
CONTAINER_STOP_RC=0
docker kill $CONTAINER_NAME &> /dev/null || CONTAINER_STOP_RC=$?
systemctl stop $CONTAINER_NAME
systemctl stop $CONTAINER_NAME || debug "Ignore stopping $CONTAINER_NAME error $?"
if [[ CONTAINER_STOP_RC -ne 0 ]]; then
debug "Failed killing container $CONTAINER_NAME RC $CONTAINER_STOP_RC ."
fi
done
debug "Stopped all remaining containers ..."
# Stop the docker container engine. Otherwise we will have a broken docker storage
systemctl stop docker.service
systemctl stop docker.service || debug "Ignore stopping docker service error $?"
# Stop kernel modules for Nephos platform
if [[ "$sonic_asic_type" = 'nephos' ]];
then
systemctl stop nps-modules-`uname -r`.service
systemctl stop nps-modules-`uname -r`.service || debug "Ignore stopping nps service error $?"
fi
# Update the reboot cause file to reflect that user issued this script
Expand Down

0 comments on commit 41f5961

Please sign in to comment.