Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[config reload] Fix config reload failure due to sonic.target job cancellation #1814

Merged
merged 2 commits into from
Sep 13, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions config/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -686,7 +686,7 @@ def _stop_services():
pass

click.echo("Stopping SONiC target ...")
clicommon.run_command("sudo systemctl stop sonic.target")
clicommon.run_command("sudo systemctl stop sonic.target --job-mode replace-irreversibly")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add this option also for systemctl restart sonic.target case at line#709

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added !



def _get_sonic_services():
Expand All @@ -706,7 +706,7 @@ def _reset_failed_services():

def _restart_services():
click.echo("Restarting SONiC target ...")
clicommon.run_command("sudo systemctl restart sonic.target")
clicommon.run_command("sudo systemctl restart sonic.target --job-mode replace-irreversibly")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vivekreddynv please double check this change
@stepanblyschak please elaborate on the consequences

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vivekreddynv It looks like if restart is executed with --job-mode replace-irreversibly we will still have the same issue, because the start job will be placed in the in systemd's job queue as "replace-irreverisbly" and the next config reload stop job will be discarded by systemd due to "replace-irreverisbly" of the start job leading to an error that looks smth like this:

Transaction for sonic.target/stop is destructive

Could you please double check? I think we only need to stop services with the guaranty that it will be successfully executed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when the restart job (and all the other dependent jobs) are finished and is taken out of systemd's job queue, i don't think the next stop job will be cancelled.

However, simultaneous config reloads in the quick succession, can lead to the behavior you have said.

admin@sonic:~$ sudo systemctl restart sonic.target --job-mode replace-irreversibly
admin@sonic:~$ sudo systemctl stop sonic.target --job-mode replace-irreversibly (Ran Immediately)
Failed to stop sonic.target: Transaction for sonic.target/stop is destructive (ntp-config.service has 'start' job queued, but 'stop' is included in transaction).
See system logs and 'systemctl status sonic.target' for details.
(After all the dependent jobs of restart sonic.target  are done, it works)
admin@sonic:~$ sudo systemctl stop sonic.target --job-mode replace-irreversibly
admin@sonic:

The problem here is that the sonic.target start is not a blocking call unlike sonic.target stop.

Nevertheless, the idea of this PR is that config reload should not fail and thus it makes sense to remove this. I'll raise a separate PR


try:
subprocess.check_call("sudo monit status", shell=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
Expand Down