Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local automated upgrade testing #3075

Merged
merged 52 commits into from
May 10, 2018
Merged

Local automated upgrade testing #3075

merged 52 commits into from
May 10, 2018

Conversation

msheiny
Copy link
Contributor

@msheiny msheiny commented Feb 27, 2018

Status

Ready for review

Description of Changes

Changes proposed in this pull request:

Fixes first check-box in #3018

Add two scenarios:

  • vagrant_packager -- builds two vagrant boxes ready for redistribution
  • upgrade --- fires up boxes from the first scenario (pulled from s3), and chucks all locally built deps to a local apt server.

Testing

How should the reviewer test this PR?

  • Caveat - this PR testing will only work under a system with libvirt/kvm. There is another ticket to also add support for virtualbox.

Packager testing:

  • Run make vagrant-package - be prepared to enter a sudo password about 20 min in .
  • you'll find the vagrant box files in molecule/vagrant_packager/build/*.box - you can manually import those if you want to mess around with em
  • Peek at molecule/vagrant_packager/push.yml dont run that now... but glance it over :)

Upgrade testing (local deb packages):

  • make build-debs - make you sure have a version of debian packages that are higher than 0.6 if the build/ directory. If you don't, running this command should give you 0.7.0~rc1
  • Run molecule converge -s upgrade to get .0.6 SD servers up. You'll get passed an onion address at the end. Navigate to that in a web-browser and note the version at the bottom.
  • Run molecule side-effect -s upgrade - Navigate back to the onion address and confirm version has bumped.

Upgrade testing (apt-test proxy):

  1. Run molecule converge -s upgrade to get .0.6 SD servers up.
  2. Run molecule login -s upgrade --host app-staging - To get a shell onto the app server
  3. In the app server, run sudo apt-get update (confirm no ERRORS , you'll see a warning about duplicate repos but thats not an error and is a diff issue)
  4. Run apt-cache policy securedrop-config (same terminal ^^) and you should see all the packages pulled in from above
  5. Now back at the securedrop root dir run QA_APTTEST=yes molecule converge -s upgrade -- --diff -t apt
  6. Follow step 2 -> 4 again. This time you should see versions of securedrop-config pulled in from apt-test (this should have a few versions back and look like below):
vagrant@app-staging:~$ apt-cache policy securedrop-config                                                
securedrop-config:
  Installed: 0.1.0+0.6
  Candidate: 0.1.1+0.7.0~rc3
  Version table:
     0.1.1+0.7.0~rc3 0
        500 https://apt.freedom.press/ trusty/main amd64 Packages
     0.1.1+0.7.0~rc2 0
        500 https://apt.freedom.press/ trusty/main amd64 Packages
     0.1.1+0.7.0~rc1 0
        500 https://apt.freedom.press/ trusty/main amd64 Packages
 *** 0.1.0+0.6 0
        500 https://apt.freedom.press/ trusty/main amd64 Packages
        100 /var/lib/dpkg/status
     0.1.0+0.5.2 0
        500 https://apt.freedom.press/ trusty/main amd64 Packages
     0.1.0+0.5.1 0
        500 https://apt.freedom.press/ trusty/main amd64 Packages

Deployment

Any special considerations for deployment? Consider both:

None - This only affects developer upgrade testing environment.

Checklist

If you made changes to the app code:

None

If you made changes to the system configuration:

None

If you made changes to documentation:

None

@msheiny msheiny requested a review from conorsch as a code owner February 27, 2018 22:02
@msheiny msheiny requested a review from a user February 27, 2018 22:02
@msheiny msheiny changed the title Local automated build Local automated upgrade testing Feb 27, 2018
@codecov-io
Copy link

codecov-io commented Mar 1, 2018

Codecov Report

Merging #3075 into develop will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff            @@
##           develop    #3075   +/-   ##
========================================
  Coverage    85.79%   85.79%           
========================================
  Files           34       34           
  Lines         2154     2154           
  Branches       238      238           
========================================
  Hits          1848     1848           
  Misses         250      250           
  Partials        56       56

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fb3772f...4800743. Read the comment docs.

@ghost
Copy link

ghost commented Mar 1, 2018

In the test instructions there should also be pip install -r securedrop/requirements/develop-requirements.txt to get python-vagrant

@ghost
Copy link

ghost commented Mar 1, 2018

@msheiny at this stage I'm not doing a proper review, just discovering the great work you did 💯. And dumping a few inconsequential remarks on the way ;-)

Both */library/molecule_vagrant.py are mostly the same and could be unified.

@msheiny
Copy link
Contributor Author

msheiny commented Mar 1, 2018

Both */library/molecule_vagrant.py are mostly the same and could be unified.

That is a very good point. I managed to get some of that upstreamed to molecule... I need to do the rest of the tweaks.

@ghost
Copy link

ghost commented Mar 1, 2018

some of that upstreamed to molecule

Nice ! URL ?

state: running
tags: always

- name: Ensure tor is running
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/tor is running/supervisor is running/

@msheiny
Copy link
Contributor Author

msheiny commented Mar 1, 2018

Nice ! URL ?

ansible/molecule#1134

@conorsch
Copy link
Contributor

conorsch commented Mar 5, 2018

The upgrade_test scenario works well for me. I've got the 0.6~rc3 debs built locally, and the scenario completes without incident. On the packaging task, however, I get a reliable failure:

    TASK [postfix : Create procmail log file.] *************************************
    fatal: [mon-staging]: FAILED! => {"changed": false, "checksum": "da39a3ee5e6b4b0d3255bfef95601890afd80709", "gid": 0, "group": "root", "mode": "0644", "msg": "chown failed: failed to look up user ossec", "owner": "root", "path": "/var/log/procmail.log", "size": 0, "state": "file", "uid": 0}

Retrying with 0.5.2 debs, rebuilding now...

@conorsch
Copy link
Contributor

conorsch commented Mar 5, 2018

Even with 0.5.2 debs, still seeing:

TASK [postfix : Create procmail log file.] *************************************
    fatal: [mon-staging]: FAILED! => {"changed": false, "checksum": "da39a3ee5e6b4b0d3255bfef95601890afd80709", "gid": 0, "group": "root", "mode": "0644", "msg": "chown failed: failed to look up user ossec", "owner": "root", "path": "/var/log/procmail.log", "size": 0, "state": "file", "uid": 0}

@msheiny msheiny force-pushed the LocalAutomatedBuild branch from ca952d6 to f250640 Compare March 6, 2018 22:15
@msheiny
Copy link
Contributor Author

msheiny commented Mar 6, 2018

Rebased on develop

@msheiny
Copy link
Contributor Author

msheiny commented Mar 6, 2018

Even with 0.5.2 debs, still seeing:

Trying to reproduce now... Just to clarify, this is on the make vagrant_package ? Thats installing straight from 0.5.2 out of the SD apt repo. Hrmmmmm

@msheiny
Copy link
Contributor Author

msheiny commented Mar 6, 2018

@conorsch can you get me more data on that error? I'm not able to recreate locally...

@msheiny
Copy link
Contributor Author

msheiny commented Mar 7, 2018

@kushaldas can you take a look at molecule/vagrant_packager/package.py here specifically for the mypy syntax I started using?

subprocess.check_call(sysprep_cmd.split())

def vagrant_metadata(self, img_location):
# type: (str) -> dict
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually wrong. We will have to install mypy_extensions and use TypedDict for the return type of the function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know, @kushaldas. Currently the typelint check isn't running against this file. Even adding it locally to the Makefile target, I don't see any errors reported.

@ghost
Copy link

ghost commented Mar 12, 2018

Is this what I should do to get packages for both versions under test?

  • git checkout origin/master
  • make build-debs # so 0.5.2 packages are in the build directory
  • git checkout origin/develop
  • make build-debs # so 0.6~rc2 packages are in the build directory

@msheiny
Copy link
Contributor Author

msheiny commented Mar 14, 2018

Is this what I should do to get packages for both versions under test?

@dachary can you clarify your question ? I don't quite understand. I'm assuming you are referring to the upgrade_test scenario .. in that case you don't really need 0.5.2 packages present - that's not a good upgrade test since the base vagrant image is also SD 0.5.2 . But you got the gist of the testing strategy to get different packages in place.

@ghost
Copy link

ghost commented Mar 14, 2018

can you clarify your question ?

I'm confused about which version I should run make build-debs for. And also how I specify the origin version of the upgrade and the destination version of the upgrade?

@msheiny msheiny force-pushed the LocalAutomatedBuild branch from 37edb45 to 472aa4c Compare May 7, 2018 15:05
Our unique changes that were added to support suppressing NFS were
recently merged upstream in molecule 2.13:

* ansible/molecule#1235
* ansible/molecule#1233
@msheiny msheiny force-pushed the LocalAutomatedBuild branch 2 times, most recently from 0ab9386 to 8a298dd Compare May 7, 2018 21:14
@msheiny
Copy link
Contributor Author

msheiny commented May 7, 2018

heyyyyy @conorsch this PR is getting really huge and unwieldy ... everytime I rebase I'm hitting more weird conflicts that I then have to go in and fix ... discovering some really weird issues that are out of scope for this PR (see this last commit 8a298dd for example.. no idea why that was only coming up now for files I didnt touch).

Sorry to nudge ya here but can you give another go through when you have some downtime? Keep in mind this PR really only affects CI and a local dev environment edge-case for specific Linux users so I kindly ask that you keep review pegged to the the upgrade scenario and if thats good we can try to solve the issues you were seeing in your environment in another PR related to the vagrant_packager?

The sooner I can get this in I can also start working on the server-side story to try and get this running in CI ❤️

@conorsch
Copy link
Contributor

conorsch commented May 8, 2018

Taking another look, @msheiny. Might ping you for some real-time collab if I run into any snags during re-review.

@msheiny
Copy link
Contributor Author

msheiny commented May 8, 2018

ping @conorsch i slightly updated the testing steps here. There is another scenario to test proxy thru to apt-test

@conorsch
Copy link
Contributor

conorsch commented May 8, 2018

Ran through the entire workflow again today. The upgrade flow is quite sound. Built local debs from the release/0.7 branch, then provisioned the base machines for upgrade. Confirmed the VMs were running 0.6, as expected from the prod repo. After applying the side-effect, the machines were running 0.7.0~rc3, as expected. So far so good!

Still could not run the make vagrant-package target without error. First, I had to apply this diff for the scenario to run successfully:

diff --git a/molecule/vagrant_packager/molecule.yml b/molecule/vagrant_packager/molecule.yml
index 21f530716..f1ece2ec0 100644
--- a/molecule/vagrant_packager/molecule.yml
+++ b/molecule/vagrant_packager/molecule.yml
@@ -39,8 +39,8 @@ provisioner:
     name: ansible-lint
   inventory:
     links:
-      group_vars: ../../../install_files/ansible-base/group_vars
-      host_vars: ../../../install_files/ansible-base/host_vars
+      group_vars: ../../install_files/ansible-base/group_vars
+      host_vars: ../../install_files/ansible-base/host_vars
   env:
     ANSIBLE_ROLES_PATH: ".molecule/sd-orig/install_files/ansible-base/roles"
   options:

After doing so, encountered the same error, reported above during prior reviews:

TASK [postfix : Configure Postfix service.] ************************************
    changed: [mon-staging]
    
    TASK [postfix : Create procmail log file.] *************************************
    fatal: [mon-staging]: FAILED! => {"changed": false, "checksum": "da39a3ee5e6b4b0d3255bfef95601890afd80709", "gid": 0, "group": "root", "mode": "0644", "msg": "chown failed: failed to look up user ossec", "owner": "root", "path": "/var/log/procmail.log", "size": 0, "state": "file", "uid": 0}
    
    NO MORE HOSTS LEFT *************************************************************
    
    NO MORE HOSTS LEFT *************************************************************
    
    PLAY RECAP *********************************************************************
    app-staging                : ok=64   changed=38   unreachable=0    failed=0
    mon-staging                : ok=70   changed=43   unreachable=0    failed=1 

@msheiny, let's sync on the error above tomorrow in real time to debug. The effort should be timeboxed, though: if we can't solve it to our mutual satisfaction within an hour, I propose we separate out the package logic and work on it separately. The upgrade logic is highly valuable, and should be merged soon to unblock others on the QA workflow. The package logic is less critical, given that it need only be run once per release.

@msheiny
Copy link
Contributor Author

msheiny commented May 9, 2018

I propose we separate out the package logic and work on it separately

I really don't want to do this. The package logic only affects you and me at the moment. No one else is going to be running it (not even CI). This is very similar to the condition when we initially merged the docker logic when it was undocumented and had failing tests.

Sometimes its easier to keep merging undocumented and isolated components early as possible and continue to iterate and improve. I believe this is one of those scenarios. It's been really painful to continue to rebase this PR. I keep finding unrelated bugs and having to fix them and it has really snow-balled into a huge ordeal. I'd strongly prefer to merge as is, assuming you do not find any issues with the upgrade testing component.

msheiny added 7 commits May 9, 2018 11:28
Seems that with the latest molecule bump, the relative pathing
calculation has changed in respect to the ansible host/group vars
directories. Previously it was checking from the ephemeral dir, now its
from the scenario dir.
Not sure why this wasnt needed before :| This will soon be obsolete as
soon as we are testing against post 0.7 (which includes the ssh over
local net features).
Especially want to make sure we avoid vagrant images and additional
build folders
Also added `.python3` temp folder to the ignore list
These are out of scope on this PR for me to address and for the sake of
brevity I'm going to exclude them for now since they are not recent
changes.
This applies strictly to the upgrade test scenario, if you have an
environment variable `QA_APTTEST` set to yes/true then your upgrade test
will be getting packages from apt-test instead of from local apt
packages.
In the bump of molecule we also brought up the version of testinfra
which introduced a bug in detecting open udp ports. They switched to
using `ss` instead of `netstat` when its available. I took the path of
least resistance, opened a bug report, and made a hacky work-around
using `lsof`.

Bug report -> pytest-dev/pytest-testinfra#311
@msheiny msheiny force-pushed the LocalAutomatedBuild branch from 3f7eb02 to aa396fb Compare May 9, 2018 15:29
@msheiny
Copy link
Contributor Author

msheiny commented May 9, 2018

First, I had to apply this diff for the scenario to run successfully:

Rebased and added a fix for that issue you saw ;)

msheiny added 2 commits May 9, 2018 15:23
With the bump to molecule, the ephemeral directory spot has changed. For
now, lets utilize the old directory space since its already git/docker ignored. We
just need to ensure that we intentionally create that directory.

There was also a change in how the vagrant boxes are named. The
`.molecule` prefix is gone and replaced with the scenario name.
Without this SSHd will not be listening on all interfaces upon reboot
@msheiny msheiny force-pushed the LocalAutomatedBuild branch from fd03c60 to 4800743 Compare May 9, 2018 19:25
@conorsch
Copy link
Contributor

conorsch commented May 10, 2018

Spent some time with the one-and-only @msheiny debugging the vagrant-package workflow today. The failure related to chowning to the ossec user, reported several times above, turned out to be caused by the entire ossec role being skipped. Was not able to reproduce the problem on a fresh clone of the repo, which indicates fairly clearly it's a problem with my local env. Perhaps adding a clean action as proposed in #2395 would help.

Once we clarified that issue, I was able to confirm working end-to-end scenario flows for both make vagrant-package and molecule converge -s upgrade. There's still a substantial shortcoming in that none of these additions are documented, but I'm willing to follow up on that work as we knock out the related issues tracked in #3018.

Thanks for your patient assistance on this, @msheiny. Let's get it in!

@msheiny msheiny merged commit b1ead0d into develop May 10, 2018
@msheiny msheiny deleted the LocalAutomatedBuild branch May 10, 2018 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants