Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nested Virtualization of Staging Env in CI #3909

Merged
merged 22 commits into from
Dec 4, 2018
Merged

Conversation

msheiny
Copy link
Contributor

@msheiny msheiny commented Oct 30, 2018

Status

Ready for review

Description of Changes

Fixes #3702

Changes proposed in this pull request:

  • Drops AWS staging logic
  • Introduces GCE staging logic
  • Now with 💯 more GRSEC tests (previously there were none)

Testing

How should the reviewer test this PR?

To replicate CI env locally:

  • Get access to Google service account credentials (talk to @msheiny 😍 )
  • export that service account as an environment variable called GOOGLE_CREDENTIALS
  • source ci env file (. devops/gce-nested/ci-env.sh)
  • Start the environment (devops/gce-nested/gce-start.sh)
  • Start the test (devops/gce-nested/gce-runner.sh)
  • Kill the test (devops/gce-nested/gce-stop.sh)

1 GRSEC tests is failing and has been xfailed. Making a new ticket to track fix for that. The problem is reproducible locally and existed before introduction of this PR.

Deployment

Any special considerations for deployment? Consider both:

Only affects CI

@msheiny msheiny requested a review from conorsch as a code owner October 30, 2018 19:11
@msheiny msheiny force-pushed the 3702_NestedVirtCI branch 8 times, most recently from 3671c55 to ac6e936 Compare October 31, 2018 02:04
@msheiny
Copy link
Contributor Author

msheiny commented Oct 31, 2018

ARGG EYE SPOT a FLAKY TEST in https://circleci.com/gh/freedomofpress/securedrop/19160

re: #3553

@msheiny msheiny force-pushed the 3702_NestedVirtCI branch 2 times, most recently from 082e89c to 1425c0f Compare October 31, 2018 03:48
Copy link
Contributor

@conorsch conorsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a pass through and modified the shell scripts somewhat, mostly for readability. Logic is solid! Flagging that we're blocked on merge of #3913, otherwise tests will fail here. Major asks here before merge are:

After those changes are up, happy to take another look. Eager to get this one in.

@msheiny
Copy link
Contributor Author

msheiny commented Nov 3, 2018

Update those docs! https://docs.securedrop.org/en/release-0.10.0/development/testing_continuous_integration.html still talks about AWS and lack of grsec testing—no longer true!

I'll make those docs changes in a separate PR with a docs-* prefix. Iterating on the various nits that I'm sure will come up when updating documentation is really annoying to do when the feedback loop from the staging tests are so long. Or I'd be willing to start collab in something like etherpad first.

@conorsch
Copy link
Contributor

conorsch commented Nov 3, 2018

I'll make those docs changes in a separate PR

A PR into this PR would work; that way we can skip the long-running staging logic, and still make sure that the dev docs are accurate, since the CI changes and accompanying docs will land in develop together.

@conorsch
Copy link
Contributor

Rebased on top of 36cbb21 (latest develop), to re-run CI, including changes in particular from #3913.

@codecov-io
Copy link

codecov-io commented Nov 26, 2018

Codecov Report

Merging #3909 into develop will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff            @@
##           develop    #3909   +/-   ##
========================================
  Coverage    84.62%   84.62%           
========================================
  Files           43       43           
  Lines         2739     2739           
  Branches       296      296           
========================================
  Hits          2318     2318           
  Misses         354      354           
  Partials        67       67

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9b4924c...4edcaf4. Read the comment docs.

@@ -160,8 +143,7 @@ def test_apt_autoremove(Command):
assert "The following packages will be REMOVED" not in c.stdout


@pytest.mark.skipif(os.environ.get('FPF_GRSEC', 'true') == "false",
reason="Need to skip in environment w/o grsec")
@pytest.mark.xfail(strict=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you comment the reason for xfail-ing this instead of skipping? And if possible, can you narrow the scope of the xfail to the exception we expect to see?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume it's #3916, but you're right that we should note that explicitly. While handling the docs update here (as discussed during sprint planning today), I'll clarify. Thanks for flagging, @heartsucker.

This doesn't really confirm to upstream ansible's usage of libcloud..
which has been driving me insane. There also isnt an upstream module for
this capability.. so decided to write it myself :) Hooray for
"not-invented-here" !!!
Introducing this to get around the wacky limitations of gcloud sdk so
that what local users test will perfectly match production
Utilize a gcloud sdk container to run all operations. This is really not
necessary for CI BUT in order to provide a uniform testing environment
against local development and CI this was necessary. Otherwise if a
local developer used their machine loaded gcloud sdk, there is a
possiblility of collusion of permissions and things launching with
elevated permissions. This is not necessarilly a dangerous situation
just one that doesnt adequately mimic CI enough...

So tl;dr Lets make CI and local dev CI testing as close as possible
Wanted to get this closer in proximity to all the other CI GCE related
tooling (since thats what its applicable to). Made some core changes so
that this will be the same script that runs in CI and for local ops
developers while testing.
I really didnt want to go this route since molecule adds surrounding
tooling fun.. but yeah.. here it is for the sake of striping things down
to simplicity.
msheiny and others added 6 commits November 29, 2018 14:14
Remove AWS references and detection code. Flop those over for GCE usage.
These are exactly the types of errors we want to catch in CI!! yay! This
is a legitimate failure though - I'm able to reproduce locally. Lets
flag as an XFAIL an fix in another PR.
Useful for interpretation and formatting of test results by CircleCI
shellcheck -- lots of ignores. a lot of these errors dont apply. Some of them do but
they really seem to be things like double quoting (which ive always
found problematic), printf usage, sourcing warnings.
Removed the shellcheck exemptions at the top of the files to get
warnings displayed again, then improved based on that output.
Reduced the "source" calls from 2 to 1, and in general made the scripts
a bit more readable.

Sprinkled comments liberally throughout, to bless future maintainers.
@conorsch
Copy link
Contributor

Rebased on top of 9b4924c (latest develop), will bolt on docs shortly.

@conorsch
Copy link
Contributor

conorsch commented Dec 1, 2018

Just noticing that we lost the "Don't run staging CI if branch begins with docs-" logic... if I find more issues like that, may punt to follow-up tickets in order to get this in.

Rightfully pointed out by @heartsucker during review. Cross-linked to
relevant issue. As to why we're using xfail rather than skip, the
"strict" setting on xfail will alert us if the tests unexpectedly start
passing (ain't that the dream?).
Cleaned up the docs a bit to reference the new script paths, as well as
excise outdated info such as "we don't run kernel tests in CI". We do!

The Makefile targets aren't necessary, but do provide a bit of
convenience to developers.

Updated the "ci-go" logic in particular, in order to preserve the "Don't
run staging CI if branch name starts with `docs-`" logic.
@conorsch
Copy link
Contributor

conorsch commented Dec 1, 2018

Resolved the outstanding concerns, except for:

devops/jenkins/TorNightlyPipeline will start to fail post merge if not updated; it still references the old AWS logic, as well

It's true the AWS references should not be there, but in order to resolve, we need override support on the GCE logic to provide custom vars. Probably a wrapper script is best; I'd prefer to observe the Jenkins failures (which are only shown internally, since they run on a nightly basis, and will not break CI on PRs to this repo). @msheiny feel free to chime in here, but I suggest we merge as-is and follow-up on refining the Jenkins logic on the backend.

Waiting to approve until CI passes, since I made changes to the logic as part of final review.

Make sure the environment is destroyed in a second step, and mask errors
on that second step. Required since the switch to make targets for CI,
as part of PR review on the introduction of the new GCE CI logic.
@msheiny
Copy link
Contributor Author

msheiny commented Dec 3, 2018

devops/jenkins/TorNightlyPipeline will start to fail post merge if not updated; it still references the old AWS logic, as well

Just to flag here -- that Jenkins job is no longer being run at all. So it can be deleted and purged from the branch whenever.

@conorsch
Copy link
Contributor

conorsch commented Dec 3, 2018

Thanks for confirming, @msheiny! When I dug around in the interface, looks like it's been dormant for a while, so I opted to ignore. If it's cleaner, happy to excise in this PR, if it's going to remain a dead codepath for a while.

@msheiny
Copy link
Contributor Author

msheiny commented Dec 3, 2018

If it's cleaner, happy to excise in this PR, if it's going to remain a dead codepath for a while.

I have another pending branch in infra to squash the job from jenkins (its still there but the trigger to run it is disabled so it never runs). I can pull an accompanying PR in SD to kill it but i'm fine if you want to purge it here instead.

@conorsch conorsch merged commit 6d0d246 into develop Dec 4, 2018

- run:
name: Run static security testing on source code
command: make bandit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was removing these steps (static analysis and checking for CVEs) intentional?

ok if not (I will add them again), just was attempting to make some unrelated CI changes over in xenial-pgp-journalist and noticed this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that was not intentional, @redshiftzero — my oversight during review. I'll open an issue to track the re-add, fine if it comes in via your Xenial-related CI changes.

redshiftzero added a commit that referenced this pull request Jan 15, 2019
redshiftzero added a commit that referenced this pull request Jan 15, 2019
redshiftzero added a commit that referenced this pull request Jan 15, 2019
zenmonkeykstop pushed a commit to zenmonkeykstop/securedrop that referenced this pull request Feb 9, 2019
@heartsucker heartsucker deleted the 3702_NestedVirtCI branch February 27, 2019 10:36
kushaldas pushed a commit that referenced this pull request Sep 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Switch CircleCI staging environment to GCE + libvirt VMs
5 participants