-
Notifications
You must be signed in to change notification settings - Fork 687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nested Virtualization of Staging Env in CI #3909
Conversation
3671c55
to
ac6e936
Compare
ARGG EYE SPOT a FLAKY TEST in https://circleci.com/gh/freedomofpress/securedrop/19160 re: #3553 |
082e89c
to
1425c0f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a pass through and modified the shell scripts somewhat, mostly for readability. Logic is solid! Flagging that we're blocked on merge of #3913, otherwise tests will fail here. Major asks here before merge are:
- Update those docs! https://docs.securedrop.org/en/release-0.10.0/development/testing_continuous_integration.html still talks about AWS and lack of grsec testing—no longer true!
- Makefile still contains a bunch of references to the old AWS logic; should be removed
devops/jenkins/TorNightlyPipeline
will start to fail post merge if not updated; it still references the old AWS logic, as well
After those changes are up, happy to take another look. Eager to get this one in.
469c0c0
to
2ef7bb5
Compare
I'll make those docs changes in a separate PR with a |
A PR into this PR would work; that way we can skip the long-running staging logic, and still make sure that the dev docs are accurate, since the CI changes and accompanying docs will land in develop together. |
2ef7bb5
to
6db41b6
Compare
Codecov Report
@@ Coverage Diff @@
## develop #3909 +/- ##
========================================
Coverage 84.62% 84.62%
========================================
Files 43 43
Lines 2739 2739
Branches 296 296
========================================
Hits 2318 2318
Misses 354 354
Partials 67 67 Continue to review full report at Codecov.
|
@@ -160,8 +143,7 @@ def test_apt_autoremove(Command): | |||
assert "The following packages will be REMOVED" not in c.stdout | |||
|
|||
|
|||
@pytest.mark.skipif(os.environ.get('FPF_GRSEC', 'true') == "false", | |||
reason="Need to skip in environment w/o grsec") | |||
@pytest.mark.xfail(strict=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you comment the reason for xfail
-ing this instead of skipping? And if possible, can you narrow the scope of the xfail
to the exception we expect to see?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume it's #3916, but you're right that we should note that explicitly. While handling the docs update here (as discussed during sprint planning today), I'll clarify. Thanks for flagging, @heartsucker.
This doesn't really confirm to upstream ansible's usage of libcloud.. which has been driving me insane. There also isnt an upstream module for this capability.. so decided to write it myself :) Hooray for "not-invented-here" !!!
Introducing this to get around the wacky limitations of gcloud sdk so that what local users test will perfectly match production
Utilize a gcloud sdk container to run all operations. This is really not necessary for CI BUT in order to provide a uniform testing environment against local development and CI this was necessary. Otherwise if a local developer used their machine loaded gcloud sdk, there is a possiblility of collusion of permissions and things launching with elevated permissions. This is not necessarilly a dangerous situation just one that doesnt adequately mimic CI enough... So tl;dr Lets make CI and local dev CI testing as close as possible
Wanted to get this closer in proximity to all the other CI GCE related tooling (since thats what its applicable to). Made some core changes so that this will be the same script that runs in CI and for local ops developers while testing.
I really didnt want to go this route since molecule adds surrounding tooling fun.. but yeah.. here it is for the sake of striping things down to simplicity.
Remove AWS references and detection code. Flop those over for GCE usage.
These are exactly the types of errors we want to catch in CI!! yay! This is a legitimate failure though - I'm able to reproduce locally. Lets flag as an XFAIL an fix in another PR.
Useful for interpretation and formatting of test results by CircleCI
shellcheck -- lots of ignores. a lot of these errors dont apply. Some of them do but they really seem to be things like double quoting (which ive always found problematic), printf usage, sourcing warnings.
Removed the shellcheck exemptions at the top of the files to get warnings displayed again, then improved based on that output. Reduced the "source" calls from 2 to 1, and in general made the scripts a bit more readable. Sprinkled comments liberally throughout, to bless future maintainers.
6db41b6
to
374675f
Compare
Rebased on top of 9b4924c (latest develop), will bolt on docs shortly. |
Just noticing that we lost the "Don't run staging CI if branch begins with |
Rightfully pointed out by @heartsucker during review. Cross-linked to relevant issue. As to why we're using xfail rather than skip, the "strict" setting on xfail will alert us if the tests unexpectedly start passing (ain't that the dream?).
0955f37
to
fd2b5fb
Compare
Cleaned up the docs a bit to reference the new script paths, as well as excise outdated info such as "we don't run kernel tests in CI". We do! The Makefile targets aren't necessary, but do provide a bit of convenience to developers. Updated the "ci-go" logic in particular, in order to preserve the "Don't run staging CI if branch name starts with `docs-`" logic.
Resolved the outstanding concerns, except for:
It's true the AWS references should not be there, but in order to resolve, we need override support on the GCE logic to provide custom vars. Probably a wrapper script is best; I'd prefer to observe the Jenkins failures (which are only shown internally, since they run on a nightly basis, and will not break CI on PRs to this repo). @msheiny feel free to chime in here, but I suggest we merge as-is and follow-up on refining the Jenkins logic on the backend. Waiting to approve until CI passes, since I made changes to the logic as part of final review. |
Make sure the environment is destroyed in a second step, and mask errors on that second step. Required since the switch to make targets for CI, as part of PR review on the introduction of the new GCE CI logic.
Just to flag here -- that Jenkins job is no longer being run at all. So it can be deleted and purged from the branch whenever. |
Thanks for confirming, @msheiny! When I dug around in the interface, looks like it's been dormant for a while, so I opted to ignore. If it's cleaner, happy to excise in this PR, if it's going to remain a dead codepath for a while. |
I have another pending branch in infra to squash the job from jenkins (its still there but the trigger to run it is disabled so it never runs). I can pull an accompanying PR in SD to kill it but i'm fine if you want to purge it here instead. |
|
||
- run: | ||
name: Run static security testing on source code | ||
command: make bandit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was removing these steps (static analysis and checking for CVEs) intentional?
ok if not (I will add them again), just was attempting to make some unrelated CI changes over in xenial-pgp-journalist
and noticed this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that was not intentional, @redshiftzero — my oversight during review. I'll open an issue to track the re-add, fine if it comes in via your Xenial-related CI changes.
Accidentally removed during: #3909
Accidentally removed during: #3909
Accidentally removed during: #3909
Accidentally removed during: freedomofpress#3909
Accidentally removed during: #3909
Status
Ready for review
Description of Changes
Fixes #3702
Changes proposed in this pull request:
Testing
How should the reviewer test this PR?
To replicate CI env locally:
GOOGLE_CREDENTIALS
. devops/gce-nested/ci-env.sh
)devops/gce-nested/gce-start.sh
)devops/gce-nested/gce-runner.sh
)devops/gce-nested/gce-stop.sh
)1 GRSEC tests is failing and has been
xfail
ed. Making a new ticket to track fix for that. The problem is reproducible locally and existed before introduction of this PR.Deployment
Any special considerations for deployment? Consider both:
Only affects CI