-
Notifications
You must be signed in to change notification settings - Fork 687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document how to run staging env in Qubes #3624
Conversation
On a few recent runs of this setup, I noticed a provisioning error related to a missing AppArmor profile for the tor abstractions. None of the diff here indicates that's caused by these changes, so I'm inviting review immediately, to see if the problem occurs for anyone else. |
Codecov Report
@@ Coverage Diff @@
## develop #3624 +/- ##
========================================
Coverage 85.04% 85.04%
========================================
Files 37 37
Lines 2367 2367
Branches 260 260
========================================
Hits 2013 2013
Misses 290 290
Partials 64 64 Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went through these (very clear) docs and successfully built a SecureDrop staging environment in Qubes. The Qubes API policy works as intended.
Because the playbook updates all packages and given the incremental effort in using template-based VMs, I think standalone VMs are the right choice in this situation. I've encountered two errors on playbook runs during the course of my testing, both occuring when I use the molecule scenarios:
- An issue with
reboot_if_first_install
(more comments inline, but does not impact the run, just fails at the end) - An error at the OSSEC step where the public key is copied over to the mon server as part of the
Add the OSSEC GPG public key to the OSSEC manager keyring
task:
Failed to set permissions on the temporary files Ansible needs to create when becoming an unprivileged user (rc:1, err: chown: changing ownership of '/tmp/ansible-tmp-15325777.69-4139702993444/': Operation not permitted
Workaround (or fix) is to install the acl
package on the mon server and running the converge scenario once again.
docs/development/qubes_staging.rst
Outdated
The development VM, ``sd-dev``, will be based on Debian 9. All the other VMs | ||
will be based on Ubuntu Trusty. | ||
|
||
Create ``sd-dev`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we refer to the (now merged) Qubes dev env docs introduced in #3623 to avoid duplication
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refer to the (now merged) Qubes dev env docs
Absolutely, will rebase and cross-link.
- For DNS, use Qubes's DNS servers: ``10.139.1.1`` and ``10.139.1.2``. | ||
- Hostname: ``sd-trusty-base`` | ||
- Domain name should be left blank | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While it is the default option, it may be useful to specify on which disk to install the operating system, perhaps by stating configure LVM and use Virtual disk 1 (xvda 20.0GB Xen Virtual Block device)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can do; @joshuathayer had originally explicitly documented the target volume, and the declaration got lost in one of the follow-up concision edits.
molecule test -s qubes-staging | ||
|
||
.. note:: | ||
The reboot actions run against the VMs during provisioning will only shutdown |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There also appears to be an issue at the end of the install playbook run where it errors out at the end with the following error:
Unable to retrieve file contents from /home/user/securedrop/molecule/qubes-staging/tasks/reboot_if_first_install.yml
This happens at the very end, and the install is still completely functional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @msheiny wrote a different reboot handler for the AWS scenario. Will take a closer look and see if we can modify the original to work everywhere, otherwise I'll implement a scenario-specific handler.
docs/development/qubes_staging.rst
Outdated
.. code:: sh | ||
|
||
molecule create -s qubes-staging | ||
molecule prepare -s qubes-staging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the prepare step necessary? The molecule create
does call the prepare action after creating the VMs, and warns that there is no prepare playbook found (molecule/qubes-staging/prepare.yml
). Running molecule prepare
explicitly returns a Skipping, instances already prepared
message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, @emkll, the prepare step is indeed not necessary. I had the prepare action in there as a stopgap to do some of the config that we later moved to the base VM setup docs. Will revise!
Using the "delegated" driver for Molecule, meaning that create and destroy actions must be handled explicitly via the playbooks within the scenario (rather than leaning on a convenient API). There are a few shortcomings with the current implementation: * become pass is hardcoded (due to template VMs; should be changed) * VMs only halt, not start, when rebooted * the create/destroy logic is task-based, rather than module-based we should consider writing a "qubes_vm" Ansible module, so the VMs can be managed via e.g. state=started, state=absent, etc. The catch, however, is that the `qubesadmin` Python library is only installable via apt/yum, meaning that the libs won't be available inside the virtualenv without a sys.path hack.
The changes here are *very* permissive, and not appropriate for a production Qubes machine. We can refine them after more testing, to get the minimum set of perms required. Did not link out to the Qubes Admin API docs, but we should definitely do so.
Using qvm-prefs to force the static IP on the base VMs.
Few edits throughout, focusing on docs clarity.
During the first pass on documentation for running the staging environment in Qubes, we used a Markdown file to jot notes quickly. This commit preserves the content of the docs, while converting the format to ReST, so it's visible alongside all the other project documentation.
Trying to slot all the related commands into one codeblock, so it's easy for devs to follow along.
We already have copious documentation on how to install Trusty from an ISO, so let's link out to that to be sure. Tacked on a reference in the Trusty docs so that we can link directly to the install part. Removes the recommendation to edit /etc/hosts on sd-dev, since we can trust the provisioning scripts to manage access to the VMs for us.
Removes some extraneous notes from the WIP draft of the docs, and includes some links out to Qubes official docs for more info.
Clarifying that this is a user account for a human administrator, rather than a machine account used for running the SecureDrop application. We've used "sdadmin" in other contexts, as well, such as the default vars for configuring an instance via a Tails Workstation. It doesn't really matter what this value is, but it will be hardcoded in other aspects of the Qubes staging environment, and so any changes down the road will require substantial developer intervention in order to accommodate. So "sdadmin" is the most conservative approach, and hopefully the most future-proof, as well.
These are still more permissive than I'd like, but better than the first draft. More docs research required on the RPC policy syntax. The policies here *do* prohibit management of other VMs on the Qubes machine (e.g. `sd-whonix`, which is managed via the SecureDrop Workstation repo), which is better than what we had before.
Also added ReST-friendly codeblocks for the new apt install tasks for the Qubes Admin API packages.
Keep the docs DRY! Linking out to the Trusty ISO download and verify docs, via a new section reference. Also snuck in a unrelated typo fix in the Trusty docs.
For practicality, reusing an existing SSH key in the sd-dev VM if found. Spruced up the language a bit in places to make it more candid.
Pointed out by @emkll during review. We recently merged simple dev env setup docs targeting Qubes, so we don't need the duplicate content in the staging docs any longer.
Originally included by @joshuathayer, but aggressive concision edits on my part removed it. During review, @emkll requested the additional clarity, which sounds well warranted, so adding back.
Great catch on this one. We don't need that package in prod or other VMs, so I'm loathe to install via the prod config logic simply to satisfy the Qubes VMs. After a bit of digging, that error relates to privilege escalation via Ansible, and one of the recommended resolutions is:
That's precisely what we do with our Ansible config, in the |
We want the pipelining option in particular, but in general it's sound practice to reuse the config so we're testing what's running in prod in as many different contexts as possible. The dynamic inventory statement in the ansible.cfg is not used, due to precedence of the Molecule inventory file.
By default, Molecule sets the "create" action to call the "prepare" action once during an instance's lifecycle [0]. We don't need the prepare action, so let's override the create sequence to avoid deprecation warnings displaying in the console during normal, fully working provisioning runs of the Qubes staging environment. Also snipped out the corresponding "prepare" action in the Qubes staging documentation, pointed out by @emkll during review. [0] https://molecule.readthedocs.io/en/latest/configuration.html
7bc46c9
to
0d02344
Compare
@emkll Nearly all comments addressed, please re-review. The sole comment unaddressed is:
Took at look at this, and turns out we encountered this problem in the Molecule scenario for AWS/CI, as well, in which we added a var to detect whether we're on AWS, then overrode the reboot logic with a custom implementation specific to that scenario. I'd prefer not to do the same for the Qubes staging setup. The next best bet is to convert the included task to a role, preserving conditional logic. This will resolve the failure to find the file that you reported. I've tested this locally and it works well-ish, since the additional reboot requires developer intervention in the form of another Willing to PR into this PR, if that is a reasonable workflow for you during review. Otherwise, I suggest merging as-is (with the documentation fixes already in place), and I can follow-up with a separate, tiny PR that will include a staging CI run to validate the tweak to logic doesn't break the non-Qubes config. I slightly prefer the latter option, since that would allow me to clean up the duplicated reboot logic in the For clarity, see proposed patch on the task to role conversion described above in this gist: https://gist.github.com/conorsch/49830560f8fce0835ab4b477e8387f56 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick fixes @conorsch ! I confirm that changing the ANSIBLE_CONFIG env variable fixes the issue with the key copy.
As for the reboot error, I agree, we shouldn't block merge on this, given the scope of this PR ( and the scenario actually works). I have opened a ticket to track the outstanding issue (#3629), as to better scope QA to those changes (which will likely require careful qa for both the qubes but also other scenarios).
Status
Ready for review.
Description of Changes
Fixes freedomofpress/securedrop-workstation#105.
Implements a staging environment for use in Qubes, so that devs working on the workstation code do not need to run VMs on a separate laptop in order to contact an SD server for development. The work is the result of collaboration with @joshuathayer. Submitting the PR as a "docs-only" PR because it doesn't change any files used for managing production SecureDrop instances:
There are a few unimplemented features of the staging environment:
Config tests are not wired up
We already have the full config test suite running in CI on every (non-docs) PR. That's sufficient for preventing regressions. Happy to add config tests to the staging environment to the work presented here, but it should wait for the Vagrant -> Molecule conversion (#3208), in which can ideally axe the
testinfra/test.py
wrapper script.Machines need a nudge to boot after a reboot has triggered
When Ansible requests the VM reboot, it powers off, but Qubes won't restart it. You must run
qvm-start sd-app sd-mon
in order to boot the VMs again, and you must run that command within the "wait_for" window, which is five minutes. Technically it's possible for us to detect that we're in Qubes and make theqvm-start
calls dynamically—the Qubes Admin API makes this possible—but doing so would affect Ansible logic that runs against prod instances, so I'd prefer to follow up in a discrete PR targeting that improvement at a later date, if warranted.Halting is not a first-class feature
Molecule assumes we want to destroy a VM after testing it. By default, there's no corresponding action in Molecule to
vagrant halt <vm>
. In the future, we can certainly implement one, with a bit of additional scripting, but I'd prefer to rely on developer feedback before attempting a convenient solution. Developers working in Qubes will need to runqvm-shutdown sd-app sd-mon
in order to preserve the VMs for later use. The Molecule logic is only required the first time (or if server-side code changes are required); thereafter, theqvm-
actions will be sufficient to get the Source and Journalist Interfaces up and running.No automatic integration of HidServAuth tokens
After provisioning the staging environment, the
sd-dev
VM will have the HidServAuth tokens in the usual place:install_files/ansible-base/*-aths
. These will need to be copied manually asconfig.json
for use with development on the SecureDrop Workstation. Given that the VM provisioning and token porting tasks are each one-time, I consider that an acceptable trade-off, especially because we've reduced the number of required laptops from two to one with these changes.Testing
Qubes machine required for testing.
molecule test -s qubes-staging
.qvm-start sd-app sd-mon
indom0
.qvm-prefs sys-whonix
and confirm it fails with "Service call error: request refused".If anything is unclear in the docs, get loud. If anything doesn't work with the staging provisioning process, please post error messaging to aid in debugging.
Worthy of special attention are the Qubes RPC grants, which enable VM management from within
sd-dev
, rather than fromdom0
. Using the Qubes Admin API makes for a much slicker developer workflow, but we should scrutinize any grants we provide, here or hereafter. To the best of my ability, I've restricted the permissions grant management rights of only the staging instances. For further reading:Deployment
None, dev env only.
Checklist
If you made changes to the server application code:
make ci-lint
) and tests (make -C securedrop test
) pass in the development containerIf you made changes to
securedrop-admin
:make -C admin test
) pass in the admin development containerIf you made changes to the system configuration:
If you made non-trivial code changes:
If you made changes to documentation:
make docs-lint
) passed locally