Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logs more VM state info in updater #523

Merged
merged 3 commits into from
Apr 10, 2020
Merged

Conversation

conorsch
Copy link
Contributor

@conorsch conorsch commented Apr 2, 2020

Status

Ready for review

Description of Changes

Fixes #521. Refs #498

Changes proposed in this pull request:

For comparison:

Before:

2020-03-27 13:01:30,410 - sdw_updater_gui.Updater:386(apply_dom0_state) INFO: Applying dom0 state
2020-03-27 13:01:30,415 - sdw_updater_gui.UpdaterApp:173(update_progress_bar) INFO: Signal: Progress 95%
2020-03-27 13:02:12,054 - sdw_updater_gui.Updater:389(apply_dom0_state) INFO: Dom0 state applied
2020-03-27 13:02:12,054 - sdw_updater_gui.Updater:410(shutdown_and_start_vms) INFO: Rebooting system fedora-based VMs
2020-03-27 13:04:00,453 - sdw_updater_gui.Updater:443(_safely_start_vm) ERROR: Error while starting sys-usb
2020-03-27 13:04:00,453 - sdw_updater_gui.Updater:444(_safely_start_vm) ERROR: Command '['qvm-start', '--skip-if-running', 'sys-usb']' returned non-zero exit status 1
2020-03-27 13:04:00,453 - sdw_updater_gui.Updater:422(shutdown_and_start_vms) INFO: Rebooting all vms for updates
2020-03-27 13:04:55,884 - sdw_updater_gui.Updater:443(_safely_start_vm) ERROR: Error while starting sd-proxy
2020-03-27 13:04:55,885 - sdw_updater_gui.Updater:444(_safely_start_vm) ERROR: Command '['qvm-start', '--skip-if-running', 'sd-proxy']' returned non-zero exit status 1
2020-03-27 13:06:05,919 - sdw_updater_gui.Updater:248(_write_updates_status_flag_to_disk) INFO: Setting update flag to 0 in sd-app
2020-03-27 13:06:06,447 - sdw_updater_gui.Updater:261(_write_updates_status_flag_to_disk) INFO: Setting update flag to 0 in dom0
2020-03-27 13:06:06,448 - sdw_updater_gui.Updater:216(_write_last_updated_flags_to_disk) INFO: Setting last updated to 2020-03-27 13:06:06 in sd-app
2020-03-27 13:06:06,727 - sdw_updater_gui.Updater:229(_write_last_updated_flags_to_disk) INFO: Setting last updated to 2020-03-27 13:06:06 in dom0

After:

2020-04-01 18:50:34,916 - sdw_updater_gui.Updater:386(apply_dom0_state) INFO: Applying dom0 state
2020-04-01 18:50:34,916 - sdw_updater_gui.UpdaterApp:173(update_progress_bar) INFO: Signal: Progress 95%
2020-04-01 18:51:16,796 - sdw_updater_gui.Updater:389(apply_dom0_state) INFO: Dom0 state applied
2020-04-01 18:51:16,796 - sdw_updater_gui.Updater:417(shutdown_and_start_vms) INFO: Shutting down SDW VMs for updates
2020-04-01 18:51:16,964 - sdw_updater_gui.Updater:444(_safely_shutdown_vm) ERROR: Failed to shut down sys-whonix
2020-04-01 18:51:16,965 - sdw_updater_gui.Updater:445(_safely_shutdown_vm) ERROR: Command '['qvm-shutdown', '--wait', 'sys-whonix']' returned non-zero exit status 1
2020-04-01 18:51:16,965 - sdw_updater_gui.Updater:446(_safely_shutdown_vm) ERROR: b'qvm-shutdown: error: Failed to shut down: sys-whonix\n'
2020-04-01 18:51:42,329 - sdw_updater_gui.Updater:422(shutdown_and_start_vms) INFO: Killing system fedora-based VMs for updates
2020-04-01 18:51:53,442 - sdw_updater_gui.Updater:431(shutdown_and_start_vms) INFO: Starting system fedora-based VMs after updates
2020-04-01 18:51:53,730 - sdw_updater_gui.Updater:455(_safely_start_vm) INFO: VMs running before start of sys-usb: b'anon-whonix\ndom0\nfpf-email\nfpf-infra\nfpf-signal\nfpf-vault\nfpf-vidchat\nfpf-vpn\nfpf-web\nsd-dev\nsd-devices\nsys-firewall-inner\nsys-whonix\n'
2020-04-01 18:52:08,822 - sdw_updater_gui.Updater:455(_safely_start_vm) INFO: VMs running before start of sys-net: b'anon-whonix\ndom0\nfpf-email\nfpf-infra\nfpf-signal\nfpf-vault\nfpf-vidchat\nfpf-vpn\nfpf-web\nsd-dev\nsd-devices\nsys-firewall-inner\nsys-usb\nsys-whonix\n'
2020-04-01 18:52:27,316 - sdw_updater_gui.Updater:455(_safely_start_vm) INFO: VMs running before start of sys-firewall: b'anon-whonix\ndom0\nfpf-email\nfpf-infra\nfpf-signal\nfpf-vault\nfpf-vidchat\nfpf-vpn\nfpf-web\nsd-dev\nsd-devices\nsys-firewall-inner\nsys-net\nsys-usb\nsys-whonix\n'
2020-04-01 18:52:37,586 - sdw_updater_gui.Updater:435(shutdown_and_start_vms) INFO: Starting SDW VMs after updates
2020-04-01 18:52:38,072 - sdw_updater_gui.Updater:455(_safely_start_vm) INFO: VMs running before start of sd-log: b'anon-whonix\ndom0\nfpf-email\nfpf-infra\nfpf-signal\nfpf-vault\nfpf-vidchat\nfpf-vpn\nfpf-web\nsd-dev\nsd-devices\nsys-firewall\nsys-firewall-inner\nsys-net\nsys-usb\nsys-whonix\n'
2020-04-01 18:53:07,760 - sdw_updater_gui.Updater:455(_safely_start_vm) INFO: VMs running before start of sd-gpg: b'anon-whonix\ndom0\nfpf-email\nfpf-infra\nfpf-signal\nfpf-vault\nfpf-vidchat\nfpf-vpn\nfpf-web\nsd-dev\nsd-devices\nsd-log\nsys-firewall\nsys-firewall-inner\nsys-net\nsys-usb\nsys-whonix\n'
2020-04-01 18:53:35,512 - sdw_updater_gui.Updater:455(_safely_start_vm) INFO: VMs running before start of sd-app: b'anon-whonix\ndom0\nfpf-email\nfpf-infra\nfpf-signal\nfpf-vault\nfpf-vidchat\nfpf-vpn\nfpf-web\nsd-dev\nsd-devices\nsd-gpg\nsd-log\nsys-firewall\nsys-firewall-inner\nsys-net\nsys-usb\nsys-whonix\n'
2020-04-01 18:54:03,973 - sdw_updater_gui.Updater:455(_safely_start_vm) INFO: VMs running before start of sd-whonix: b'anon-whonix\ndom0\nfpf-email\nfpf-infra\nfpf-signal\nfpf-vault\nfpf-vidchat\nfpf-vpn\nfpf-web\nsd-app\nsd-dev\nsd-devices\nsd-gpg\nsd-log\nsys-firewall\nsys-firewall-inner\nsys-net\nsys-usb\nsys-whonix\n'
2020-04-01 18:54:14,158 - sdw_updater_gui.Updater:455(_safely_start_vm) INFO: VMs running before start of sd-proxy: b'anon-whonix\ndom0\nfpf-email\nfpf-infra\nfpf-signal\nfpf-vault\nfpf-vidchat\nfpf-vpn\nfpf-web\nsd-app\nsd-dev\nsd-devices\nsd-gpg\nsd-log\nsd-whonix\nsys-firewall\nsys-firewall-inner\nsys-net\nsys-usb\nsys-whonix\n'
2020-04-01 18:54:43,306 - sdw_updater_gui.Updater:455(_safely_start_vm) INFO: VMs running before start of sys-whonix: b'anon-whonix\ndom0\nfpf-email\nfpf-infra\nfpf-signal\nfpf-vault\nfpf-vidchat\nfpf-vpn\nfpf-web\nsd-app\nsd-dev\nsd-devices\nsd-gpg\nsd-log\nsd-proxy\nsd-whonix\nsys-firewall\nsys-firewall-inner\nsys-net\nsys-usb\nsys-whonix\n'
2020-04-01 18:54:43,466 - sdw_updater_gui.Updater:248(_write_updates_status_flag_to_disk) INFO: Setting update flag to 0 in sd-app
2020-04-01 18:54:43,724 - sdw_updater_gui.Updater:261(_write_updates_status_flag_to_disk) INFO: Setting update flag to 0 in dom0
2020-04-01 18:54:43,725 - sdw_updater_gui.Updater:216(_write_last_updated_flags_to_disk) INFO: Setting last updated to 2020-04-01 18:54:43 in sd-app
2020-04-01 18:54:43,938 - sdw_updater_gui.Updater:229(_write_last_updated_flags_to_disk) INFO: Setting last updated to 2020-04-01 18:54:43 in dom0

Testing

make clone
make prep-dom0
tail -f ~/.securedrop_client/logs/launcher.log
# then, in separate terminal:
/opt/securedrop/launcher/sdw-launcher.py --skip-delta 0

Confirm you see the more verbose logs, particularly the list of all running VMs.

Checklist

If you have made code changes

  • Linter (make flake8) passes in the development environment (this box may
    be left unchecked, as flake8 also runs in CI)

If you have made changes to the provisioning logic

  • All tests (make test) pass in dom0 of a Qubes install

  • This PR adds/removes files, and includes required updates to the packaging
    logic in MANIFEST.in and rpm-build/SPECS/securedrop-workstation-dom0-config.spec

@conorsch conorsch requested review from eloquence and emkll April 2, 2020 02:04
@conorsch
Copy link
Contributor Author

conorsch commented Apr 2, 2020

Drat, I neglected to update the mocked calls, which I had plenty of time to do while the updater was running. Will pick this back up tomorrow.

@conorsch
Copy link
Contributor Author

conorsch commented Apr 3, 2020

Have not revisited the mocked tests yet, which need to be updated due to the subprocess.check_call -> subprocess.check_output changes in the updater code. Would appreciate a hand, particularly the eyes of @kushaldas 🙂

@eloquence
Copy link
Member

I'll apply this locally once it's rebased.

Conor Schaefer and others added 3 commits April 6, 2020 14:49
In an attempt to understand the specific failures during the various
actions the GUI updater runs against VMs:

  * qvm-start
  * qvm-shutdown
  * qvm-kill

Let's make sure to capture the stderr of the CLI action. We should
switch to using the Python API for most of these actions, at which time
we'll have to update the exception handling, but for now this'll give us
more visibility into the specific failures in the logs.
In the graphical updater, let's log the state of all running VMs
immediately prior to running `qvm-start`. That will help us
evaluate we should alter the reboot logic to avoid problems.

Really, this should be a "debug" line, not an "info" line, but since the
default logging is "info" I've left it as-is.

The original request was that we log the output of `xl list`, which has
more informative output, but it's inscrutable in log lines so I'm taking
a simpler approach that should prove as useful.
@conorsch conorsch force-pushed the 521-more-logs-from-the-updater branch from 2aad118 to 6836eb5 Compare April 6, 2020 21:52
@conorsch
Copy link
Contributor Author

conorsch commented Apr 6, 2020

Rebased. Ready for review.

running_vms = subprocess.check_output(
["qvm-ls", "--running", "--raw-list"], stderr=subprocess.PIPE
)
sdlog.info("VMs running before start of {}: {}".format(vm, running_vms))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will produce logs like:

2020-04-06 15:12:53,712 - sdw_updater_gui.Updater:463(_safely_start_vm) INFO: VMs running before start of sd-app: b'dom0\nsd-gpg\nsd-log\nsd-proxy\nsd-whonix\nsys-firewall\nsys-net\nsys-usb\nsys-whonix\nwork\n'

I would recommend converting the list of VMs to a comma-separated list for readability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. New format looks like:

2020-04-06 16:13:37,826 - sdw_updater_gui.Updater:463(_safely_start_vm) INFO: VMs running before start of sd-app: ['dom0', 'fpf-infra', 'fpf-vault', 'fpf-vpn', 'fpf-web', 'sd-dev', 'sd-devices', 'sd-gpg', 'sd-kernel-builder', 'sd-log', 'sd-proxy', 'sd-whonix', 'sys-firewall', 'sys-firewall-inner', 'sys-net', 'sys-usb', 'sys-whonix']

@eloquence
Copy link
Member

Running with this patch now; so far logs look good, but am unable to reproduce the error reliably at this point.

@conorsch
Copy link
Contributor Author

conorsch commented Apr 7, 2020

The change here: 74ab8c4 inadvertently broke the updater: the removal of the decode() call means the action raises an exception inside the try block and the process fails (silently). Such silent failures were my concern when writing: 1789293

A bit cautious around chaining formatting specifically for logs when we're
already in a try block, but these changes are darn simple.

And now, if you'll allow me a moment to eat my hat, I propose we back out those formatting changes and live with the single-line bytes-formatted string for stability.

@eloquence
Copy link
Member

Heh, I'll squint a little bit every time I see one of those lines b'dom0\nsd-gpg\nsd-log\nsd-proxy\nsd-whonix\nsys-firewall\nsys-net\nsys-usb\nsys-whonix\nwork\n' but if the only choice is between unreadable and rock solid vs. readable and dangerously brittle, I agree the former takes precedence. :)

@conorsch conorsch force-pushed the 521-more-logs-from-the-updater branch from 74ab8c4 to 6836eb5 Compare April 7, 2020 20:19
Copy link
Contributor

@emkll emkll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @conorsch and @kushaldas , these changes work as expected. Small suggestion inline, but also good to merge as-is to start collecting more information.

@@ -446,19 +447,27 @@ def shutdown_and_start_vms():

def _safely_shutdown_vm(vm):
try:
subprocess.check_call(["qvm-shutdown", "--wait", vm])
subprocess.check_output(["qvm-shutdown", "--wait", vm], stderr=subprocess.PIPE)
except subprocess.CalledProcessError as e:
sdlog.error("Failed to shut down {}".format(vm))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would adding a log line about information about running VMs here make sense as well? This would help debug situations where a VM can't shutdown due to a VM depending on it being still powered on. In the current workstation case, no VM will be a netVM, but it can be a receiver of qubes-rpc calls, which will cause it to power on again

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, would rather have the logs than not. Will add momentarily!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neglected to add more logging info as suggested. Merging now so devs get the more verbose logs pronto, will continue to look for opportunities to expand logging in follow-ups.

@conorsch conorsch merged commit 2c67f92 into master Apr 10, 2020
@eloquence eloquence added this to the 0.3.0 milestone Apr 25, 2020
cfm pushed a commit that referenced this pull request Apr 1, 2024
…dater

Logs more VM state info in updater
@legoktm legoktm deleted the 521-more-logs-from-the-updater branch May 28, 2024 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve logging of updater failure cases
4 participants