Document how to run staging env in Qubes #3624

conorsch · 2018-07-09T16:25:01Z

Status

Ready for review.

Description of Changes

Fixes freedomofpress/securedrop-workstation#105.

Implements a staging environment for use in Qubes, so that devs working on the workstation code do not need to run VMs on a separate laptop in order to contact an SD server for development. The work is the result of collaboration with @joshuathayer. Submitting the PR as a "docs-only" PR because it doesn't change any files used for managing production SecureDrop instances:

$ git diff develop --name-status
A       docs/development/qubes_staging.rst
M       docs/index.rst
M       docs/servers.rst
A       molecule/qubes-staging/create.yml
A       molecule/qubes-staging/destroy.yml
A       molecule/qubes-staging/molecule.yml
A       molecule/qubes-staging/playbook.yml
A       molecule/qubes-staging/qubes-vars.yml
A       molecule/qubes-staging/ssh_config.j2

There are a few unimplemented features of the staging environment:

Config tests are not wired up

We already have the full config test suite running in CI on every (non-docs) PR. That's sufficient for preventing regressions. Happy to add config tests to the staging environment to the work presented here, but it should wait for the Vagrant -> Molecule conversion (#3208), in which can ideally axe the testinfra/test.py wrapper script.

Machines need a nudge to boot after a reboot has triggered

When Ansible requests the VM reboot, it powers off, but Qubes won't restart it. You must run qvm-start sd-app sd-mon in order to boot the VMs again, and you must run that command within the "wait_for" window, which is five minutes. Technically it's possible for us to detect that we're in Qubes and make the qvm-start calls dynamically—the Qubes Admin API makes this possible—but doing so would affect Ansible logic that runs against prod instances, so I'd prefer to follow up in a discrete PR targeting that improvement at a later date, if warranted.

Halting is not a first-class feature

Molecule assumes we want to destroy a VM after testing it. By default, there's no corresponding action in Molecule to vagrant halt <vm>. In the future, we can certainly implement one, with a bit of additional scripting, but I'd prefer to rely on developer feedback before attempting a convenient solution. Developers working in Qubes will need to run qvm-shutdown sd-app sd-mon in order to preserve the VMs for later use. The Molecule logic is only required the first time (or if server-side code changes are required); thereafter, the qvm- actions will be sufficient to get the Source and Journalist Interfaces up and running.

No automatic integration of HidServAuth tokens

After provisioning the staging environment, the sd-dev VM will have the HidServAuth tokens in the usual place: install_files/ansible-base/*-aths. These will need to be copied manually as config.json for use with development on the SecureDrop Workstation. Given that the VM provisioning and token porting tasks are each one-time, I consider that an acceptable trade-off, especially because we've reduced the number of required laptops from two to one with these changes.

Testing

Qubes machine required for testing.

Follow the documentation to create a "staging" VM setup in Qubes.
Run molecule test -s qubes-staging.
When the machines go down for a reboot, run qvm-start sd-app sd-mon in dom0.
Confirm playbook finishes without error
Access the Source Interface over Tor and confirm it loads.
Run qvm-prefs sys-whonix and confirm it fails with "Service call error: request refused".

If anything is unclear in the docs, get loud. If anything doesn't work with the staging provisioning process, please post error messaging to aid in debugging.

Worthy of special attention are the Qubes RPC grants, which enable VM management from within sd-dev, rather than from dom0. Using the Qubes Admin API makes for a much slicker developer workflow, but we should scrutinize any grants we provide, here or hereafter. To the best of my ability, I've restricted the permissions grant management rights of only the staging instances. For further reading:

Deployment

None, dev env only.

Checklist

If you made changes to the server application code:

Linting (make ci-lint) and tests (make -C securedrop test) pass in the development container

If you made changes to `securedrop-admin`:

Linting and tests (make -C admin test) pass in the admin development container

If you made changes to the system configuration:

Configuration tests pass

If you made non-trivial code changes:

I have written a test plan and validated it for this PR

If you made changes to documentation:

Doc linting (make docs-lint) passed locally

conorsch · 2018-07-09T16:26:05Z

On a few recent runs of this setup, I noticed a provisioning error related to a missing AppArmor profile for the tor abstractions. None of the diff here indicates that's caused by these changes, so I'm inviting review immediately, to see if the problem occurs for anyone else.

codecov-io · 2018-07-09T16:44:57Z

Codecov Report

Merging #3624 into develop will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff            @@
##           develop    #3624   +/-   ##
========================================
  Coverage    85.04%   85.04%           
========================================
  Files           37       37           
  Lines         2367     2367           
  Branches       260      260           
========================================
  Hits          2013     2013           
  Misses         290      290           
  Partials        64       64

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 997db39...0d02344. Read the comment docs.

emkll

I went through these (very clear) docs and successfully built a SecureDrop staging environment in Qubes. The Qubes API policy works as intended.
Because the playbook updates all packages and given the incremental effort in using template-based VMs, I think standalone VMs are the right choice in this situation. I've encountered two errors on playbook runs during the course of my testing, both occuring when I use the molecule scenarios:

An issue with reboot_if_first_install (more comments inline, but does not impact the run, just fails at the end)
An error at the OSSEC step where the public key is copied over to the mon server as part of the Add the OSSEC GPG public key to the OSSEC manager keyring task:

Failed to set permissions on the temporary files Ansible needs to create when becoming an unprivileged user (rc:1, err: chown: changing ownership of '/tmp/ansible-tmp-15325777.69-4139702993444/': Operation not permitted

Workaround (or fix) is to install the acl package on the mon server and running the converge scenario once again.

emkll · 2018-07-11T13:54:20Z

docs/development/qubes_staging.rst

+The development VM, ``sd-dev``, will be based on Debian 9. All the other VMs
+will be based on Ubuntu Trusty.
+
+Create ``sd-dev``


Perhaps we refer to the (now merged) Qubes dev env docs introduced in #3623 to avoid duplication

refer to the (now merged) Qubes dev env docs

Absolutely, will rebase and cross-link.

emkll · 2018-07-11T13:56:35Z

docs/development/qubes_staging.rst

+  -  For DNS, use Qubes's DNS servers: ``10.139.1.1`` and ``10.139.1.2``.
+  -  Hostname: ``sd-trusty-base``
+  -  Domain name should be left blank
+


While it is the default option, it may be useful to specify on which disk to install the operating system, perhaps by stating configure LVM and use Virtual disk 1 (xvda 20.0GB Xen Virtual Block device)

Can do; @joshuathayer had originally explicitly documented the target volume, and the declaration got lost in one of the follow-up concision edits.

emkll · 2018-07-11T14:01:32Z

docs/development/qubes_staging.rst

+   molecule test -s qubes-staging
+
+.. note::
+   The reboot actions run against the VMs during provisioning will only shutdown


There also appears to be an issue at the end of the install playbook run where it errors out at the end with the following error:

Unable to retrieve file contents from /home/user/securedrop/molecule/qubes-staging/tasks/reboot_if_first_install.yml

This happens at the very end, and the install is still completely functional.

Thanks, @msheiny wrote a different reboot handler for the AWS scenario. Will take a closer look and see if we can modify the original to work everywhere, otherwise I'll implement a scenario-specific handler.

emkll · 2018-07-11T15:42:21Z

docs/development/qubes_staging.rst

+.. code:: sh
+
+   molecule create -s qubes-staging
+   molecule prepare -s qubes-staging


Is the prepare step necessary? The molecule create does call the prepare action after creating the VMs, and warns that there is no prepare playbook found (molecule/qubes-staging/prepare.yml). Running molecule prepare explicitly returns a Skipping, instances already prepared message.

Good catch, @emkll, the prepare step is indeed not necessary. I had the prepare action in there as a stopgap to do some of the config that we later moved to the base VM setup docs. Will revise!

Using the "delegated" driver for Molecule, meaning that create and destroy actions must be handled explicitly via the playbooks within the scenario (rather than leaning on a convenient API). There are a few shortcomings with the current implementation: * become pass is hardcoded (due to template VMs; should be changed) * VMs only halt, not start, when rebooted * the create/destroy logic is task-based, rather than module-based we should consider writing a "qubes_vm" Ansible module, so the VMs can be managed via e.g. state=started, state=absent, etc. The catch, however, is that the `qubesadmin` Python library is only installable via apt/yum, meaning that the libs won't be available inside the virtualenv without a sys.path hack.

The changes here are *very* permissive, and not appropriate for a production Qubes machine. We can refine them after more testing, to get the minimum set of perms required. Did not link out to the Qubes Admin API docs, but we should definitely do so.

Using qvm-prefs to force the static IP on the base VMs.

Few edits throughout, focusing on docs clarity.

During the first pass on documentation for running the staging environment in Qubes, we used a Markdown file to jot notes quickly. This commit preserves the content of the docs, while converting the format to ReST, so it's visible alongside all the other project documentation.

Trying to slot all the related commands into one codeblock, so it's easy for devs to follow along.

We already have copious documentation on how to install Trusty from an ISO, so let's link out to that to be sure. Tacked on a reference in the Trusty docs so that we can link directly to the install part. Removes the recommendation to edit /etc/hosts on sd-dev, since we can trust the provisioning scripts to manage access to the VMs for us.

Removes some extraneous notes from the WIP draft of the docs, and includes some links out to Qubes official docs for more info.

Clarifying that this is a user account for a human administrator, rather than a machine account used for running the SecureDrop application. We've used "sdadmin" in other contexts, as well, such as the default vars for configuring an instance via a Tails Workstation. It doesn't really matter what this value is, but it will be hardcoded in other aspects of the Qubes staging environment, and so any changes down the road will require substantial developer intervention in order to accommodate. So "sdadmin" is the most conservative approach, and hopefully the most future-proof, as well.

These are still more permissive than I'd like, but better than the first draft. More docs research required on the RPC policy syntax. The policies here *do* prohibit management of other VMs on the Qubes machine (e.g. `sd-whonix`, which is managed via the SecureDrop Workstation repo), which is better than what we had before.

Also added ReST-friendly codeblocks for the new apt install tasks for the Qubes Admin API packages.

Keep the docs DRY! Linking out to the Trusty ISO download and verify docs, via a new section reference. Also snuck in a unrelated typo fix in the Trusty docs.

For practicality, reusing an existing SSH key in the sd-dev VM if found. Spruced up the language a bit in places to make it more candid.

@emkll

Pointed out by @emkll during review. We recently merged simple dev env setup docs targeting Qubes, so we don't need the duplicate content in the staging docs any longer.

@joshuathayer

Originally included by @joshuathayer, but aggressive concision edits on my part removed it. During review, @emkll requested the additional clarity, which sounds well warranted, so adding back.

conorsch · 2018-07-11T23:33:41Z

Workaround (or fix) is to install the acl package on the mon server and running the converge scenario once again.

Great catch on this one. We don't need that package in prod or other VMs, so I'm loathe to install via the prod config logic simply to satisfy the Qubes VMs. After a bit of digging, that error relates to privilege escalation via Ansible, and one of the recommended resolutions is:

Use pipelining. When pipelining is enabled, Ansible doesn’t save the module to a temporary file on the client. Instead it pipes the module to the remote python interpreter’s stdin. Pipelining does not work for python modules involving file transfer (for example: copy, fetch, template), or for non-python modules.

That's precisely what we do with our Ansible config, in the ansible.cfg file, but notably the Qubes scenario wasn't including that config file, so the pipelining option was off, triggering the error. I'll add a reference to that same config file to the scenario, so that we're testing with the same connection for Ansible that we use elsewhere. That should obviate the need to install acl, as well (although that's also a recommended resolution in the docs).

We want the pipelining option in particular, but in general it's sound practice to reuse the config so we're testing what's running in prod in as many different contexts as possible. The dynamic inventory statement in the ansible.cfg is not used, due to precedence of the Molecule inventory file.

@emkll

By default, Molecule sets the "create" action to call the "prepare" action once during an instance's lifecycle [0]. We don't need the prepare action, so let's override the create sequence to avoid deprecation warnings displaying in the console during normal, fully working provisioning runs of the Qubes staging environment. Also snipped out the corresponding "prepare" action in the Qubes staging documentation, pointed out by @emkll during review. [0] https://molecule.readthedocs.io/en/latest/configuration.html

conorsch · 2018-07-12T00:39:42Z

@emkll Nearly all comments addressed, please re-review. The sole comment unaddressed is:

An issue with reboot_if_first_install (more comments inline, but does not impact the run, just fails at the end)

Took at look at this, and turns out we encountered this problem in the Molecule scenario for AWS/CI, as well, in which we added a var to detect whether we're on AWS, then overrode the reboot logic with a custom implementation specific to that scenario. I'd prefer not to do the same for the Qubes staging setup.

The next best bet is to convert the included task to a role, preserving conditional logic. This will resolve the failure to find the file that you reported. I've tested this locally and it works well-ish, since the additional reboot requires developer intervention in the form of another qvm-start sd-app sd-mon, but as mentioned above, that's an acceptable shortcoming for now. However, that logic change will affect the playbooks, meaning it's inappropriate for a docs- PR, as I've submitted.

Willing to PR into this PR, if that is a reasonable workflow for you during review. Otherwise, I suggest merging as-is (with the documentation fixes already in place), and I can follow-up with a separate, tiny PR that will include a staging CI run to validate the tweak to logic doesn't break the non-Qubes config. I slightly prefer the latter option, since that would allow me to clean up the duplicated reboot logic in the molecule/aws/ scenario, as well, but that's a stretch goal, so I'm willing to forgo it to keep the wheels turning.

For clarity, see proposed patch on the task to role conversion described above in this gist: https://gist.github.com/conorsch/49830560f8fce0835ab4b477e8387f56

emkll

Thanks for the quick fixes @conorsch ! I confirm that changing the ANSIBLE_CONFIG env variable fixes the issue with the key copy.
As for the reboot error, I agree, we shouldn't block merge on this, given the scope of this PR ( and the scenario actually works). I have opened a ticket to track the outstanding issue (#3629), as to better scope QA to those changes (which will likely require careful qa for both the qubes but also other scenarios).

conorsch requested review from kushaldas, msheiny, redshiftzero and emkll July 9, 2018 16:25

emkll suggested changes Jul 11, 2018

View reviewed changes

emkll reviewed Jul 11, 2018

View reviewed changes

joshuathayer and others added 20 commits July 11, 2018 12:09

Adds initial docs for Qubes staging setup

d850bcb

starts to change sd hostnames in docs

b84298d

Documents setting the static IPs for the Qubes staging VMs

c4f0a59

Using qvm-prefs to force the static IP on the base VMs.

Cleaned up RPC docs

3a0a92b

Few edits throughout, focusing on docs clarity.

Condenses command blocks in Qubes staging docs

7f959c1

Trying to slot all the related commands into one codeblock, so it's easy for devs to follow along.

Consolidates firewall rules a bit

4007dd8

Cleans up the dev env docs references

96aedd5

Readability cleanups in the Qubes staging docs

f3e4375

Removes some extraneous notes from the WIP draft of the docs, and includes some links out to Qubes official docs for more info.

Minor documentation updates

44fef2d

Formatting tweaks on API grants changes

e62b47d

Also added ReST-friendly codeblocks for the new apt install tasks for the Qubes Admin API packages.

Defer to existing ISO download and verification docs

5a07701

Keep the docs DRY! Linking out to the Trusty ISO download and verify docs, via a new section reference. Also snuck in a unrelated typo fix in the Trusty docs.

Rewording throughout Qubes docs for clarity

dc03364

For practicality, reusing an existing SSH key in the sd-dev VM if found. Spruced up the language a bit in places to make it more candid.

Links out to dev env docs for Qubes platform

02525c9

Pointed out by @emkll during review. We recently merged simple dev env setup docs targeting Qubes, so we don't need the duplicate content in the staging docs any longer.

Explicitly name target volume for Qubes staging install

8a39958

Originally included by @joshuathayer, but aggressive concision edits on my part removed it. During review, @emkll requested the additional clarity, which sounds well warranted, so adding back.

conorsch force-pushed the docs-qubes-staging-environment branch from 7bc46c9 to 0d02344 Compare July 12, 2018 00:26

emkll mentioned this pull request Jul 12, 2018

Refactor reboot task for qubes-staging molecule scenario #3629

Closed

emkll self-requested a review July 12, 2018 16:14

emkll approved these changes Jul 12, 2018

View reviewed changes

emkll merged commit 68aa4ff into develop Jul 12, 2018

emkll deleted the docs-qubes-staging-environment branch July 12, 2018 16:19

conorsch mentioned this pull request Jul 17, 2018

Converts Vagrantfile to Molecule for staging #3644

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document how to run staging env in Qubes #3624

Document how to run staging env in Qubes #3624

conorsch commented Jul 9, 2018

conorsch commented Jul 9, 2018

codecov-io commented Jul 9, 2018 •

edited

Loading

emkll left a comment •

edited

Loading

emkll Jul 11, 2018

conorsch Jul 11, 2018

emkll Jul 11, 2018

conorsch Jul 11, 2018

emkll Jul 11, 2018

conorsch Jul 11, 2018

emkll Jul 11, 2018

conorsch Jul 11, 2018

conorsch commented Jul 11, 2018

conorsch commented Jul 12, 2018

emkll left a comment

Document how to run staging env in Qubes #3624

Document how to run staging env in Qubes #3624

Conversation

conorsch commented Jul 9, 2018

Status

Description of Changes

Config tests are not wired up

Machines need a nudge to boot after a reboot has triggered

Halting is not a first-class feature

No automatic integration of HidServAuth tokens

Testing

Deployment

Checklist

If you made changes to the server application code:

If you made changes to securedrop-admin:

If you made changes to the system configuration:

If you made non-trivial code changes:

If you made changes to documentation:

conorsch commented Jul 9, 2018

codecov-io commented Jul 9, 2018 • edited Loading

Codecov Report

emkll left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

conorsch commented Jul 11, 2018

conorsch commented Jul 12, 2018

emkll left a comment

Choose a reason for hiding this comment

If you made changes to `securedrop-admin`:

codecov-io commented Jul 9, 2018 •

edited

Loading

emkll left a comment •

edited

Loading