-
Notifications
You must be signed in to change notification settings - Fork 687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add staging-with-rebase-focal to CI and fix testinfra tests #5638
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for opening @kushaldas as well as #5635, #5636 and #5637 to track specific failures.
As part of the commitment for this sprint of a timebox around #5509 , I recommend we proceed as follows, since we currently have no way of reliably tracking failure count on Focal:
- Amend this PR to add a CircleCI task for
staging-with-rebase-focal
(it's OK if we merge withstaging-with-rebase-focal
at the end of our timebox) it seems best we track failures/improvements over time by having CI run it instead of a developer running these locally.
(note that the existing GCP CI base image should already have Vagrant 20.04 boxes included, so the change should be relatively straightforward) - Continue addressing the test failures in this working branch (including the issues you've opened, and whichever other testinfra failures present in Focal
4e8d788
to
5a110a0
Compare
Right now, if you login to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes @kushaldas , left some inline comments. I've also appended a commit to clean up the test logic (lsof version is different, so the UDP check would always fail on Focal).
There is, however, still an issue with the ossec service on mon
, and the service does not start due to one of the services (ossec@maild.service
) that is failing. By running systemctl list-units --all
, you can observe the units that are not started, and then check the status of them. The only failed service is ossec@maild.service
, and then by checking its status, the root cause of the failure is clear:
ossec-maild [dns]: ERROR: connect() failed.
ossec-maild: ERROR: DNS failure for smtpserver
ossec-maild: ERROR: No socket.
ossec-maild: ERROR: Error Sending email to 127.0.0.1 (smtp server)
This might be related to #5655, perhaps we should pull in the fix for that issue in this PR, what do you think?
src: ossec@.service | ||
dest: "/etc/systemd/system/ossec@.service" | ||
|
||
- name: Enable the service |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you should use the ansible built-in systemd task:
- name : Enable ossec service to run
systemd:
name: "{{ item }}"
enabled: yes
masked: no
with_items: "{{ some var in defaults/main.yml }}"
service: | ||
name: ossec | ||
state: restarted | ||
command: systemctl restart ossec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe systemd
ansible built-in here as well
@emkll I think the PR is in a better state now, will know for sure after all CI jobs finish :) |
I still have the SMTP issues:
@emkll any tips about which service should I look at? |
I am rebuilding |
Wondering if ossec/ossec-hids#1891 is involved here? |
`sbin/dhclient` string is the correct path in Xenial and on Focal. In Xenial the file is at `/sbin/dhclient`. In Focal the file is at `/usr/sbin/dhclient`
This will run the Focal staging job in the CI.
94d81f7
to
253d45b
Compare
Rebased on latest |
Good news, the box update provided in #5658 has resolved the issue tracked in #5642 , one last test is failing, the service not running/listening to the port. Bad news, some ossec units are not staying up, including remoted which receives the logs. While I haven't uncovered the root cause, here are some findings:
Could there be, perhaps, some contention with the existing init scripts that are set by default in the server |
Here is the journalctl log from
|
More strange log difference between service and the systemd:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kushaldas this is working for me, and can confirm logs are flowing between app and mon on Focal, and CI is green 🎉
Opened #5660 and #5661 to track unrelated Focal follow-up issues .
I left a comment inline that should be worth addressing. it's unclear to me from your commit message if a part of the issue observed was some sort of init/systemd contention. Regardless, the change provided won't persist upgrades, if we don't make any changes.
We should ensure that changes to DNS (moving away from the default systemd-resolvd and back to resolv.conf) given that it will break unattended upgrades (as identified in #5655), that would be especially problematic. @conorsch what do you think?
|
||
- name: Remove the old style /etc/init.d/ossec file | ||
file: | ||
path: "/etc/init.d/ossec" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These files are provided by ossec-{agent,server} packages:
.
├── etc
│ ├── init.d
│ │ └── ossec
│ └── ossec-init.conf
This means that on subsequent ossec package upgrades, this init file will be re-added, unless we do specific tasks at build-time to remove it. If these steps are strictly required for functioning, we should address this in the build logic as to not provide the init.d file in focal (instead of doing it in ansible)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means that on subsequent ossec package upgrades, this init file will be re-added, unless we do specific tasks at build-time to remove it. If these steps are strictly required for functioning, we should address this in the build logic as to not provide the init.d file in focal (instead of doing it in ansible)
I think we should remove it from the server package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely agree, let's remove the Ansible task and clean up that file at the packaging level. Be mindful of side-effects for Xenial.
Codecov Report
@@ Coverage Diff @@
## develop #5638 +/- ##
===========================================
- Coverage 85.40% 85.32% -0.09%
===========================================
Files 50 50
Lines 3679 3679
Branches 460 460
===========================================
- Hits 3142 3139 -3
- Misses 438 440 +2
- Partials 99 100 +1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a pass through with local VMs, looking good! Agreed with the packaging logic cleanup requested by @emkll. Overall the DNS changes appear reliable, although let's keep the customizations to a minimum. For instance, it appears that on Focal we don't need /etc/resolvconf/resolv.conf.d/base at all, so let's not write to it.
|
||
- name: Remove the old style /etc/init.d/ossec file | ||
file: | ||
path: "/etc/init.d/ossec" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely agree, let's remove the Ansible task and clean up that file at the packaging level. Be mindful of side-effects for Xenial.
when: ansible_distribution_release == 'focal' | ||
tags: | ||
- dns | ||
- hardening |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No objections to disabling systemd-resolved
, sticking with resolvconf which we've been using for a while will be fairly straightforward. Since we just copied the same dns_base
source a few lines above, let's use that task to write the file. Sounds like on Xenial we want the /etc/resolvconf/resolve.conf.d/ path, whereas on Focal we should write it directly to /etc/resolv.conf
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a commit to consolidate the logic here a bit: under Focal, the old /etc/resolvconf/resolve.conf.d/
path is no longer written to, and the tests now inspect the correct file based on distro.
We remove the /etc/init.d/ossec file and using the systemd service file in the ossec-server package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a couple of areas that are using service instead of systemctl, all of which appear to be still working:
-
Ossec registration logic, which is still working, as the service is actually being started and the app server is successfully registered https://github.com/freedomofpress/securedrop/blob/fix_dhclient_path_for_testinfra/install_files/ansible-base/roles/ossec/tasks/register.yml#L40
-
https://github.com/freedomofpress/securedrop/blob/fix_dhclient_path_for_testinfra/install_files/ossec-server/etc/init.d/ossec (if we keep split xenial/focal logic, this should be preserved for Xenial)
-
https://github.com/freedomofpress/securedrop/blob/fix_dhclient_path_for_testinfra/install_files/ossec-agent/etc/init.d/ossec (if we keep split xenial/focal logic, this should be preserved for Xenial)
If we are to move to systemd for Focal (which I think is a good idea, as you've suggested), we still need to preserve init.d/service support under Xenial (unless we carefully handle the service enabling/starting under Xenial as well). This means that we will need to conditionally handle the service logic here (init in Xenial and systemd in Focal).
It could perhaps be helpful to take a step back and map out these changes to better understand implications, but at at strict minimum, prior to merging this PR, we should ensure no changes to Xenial, in other words, the systemctl logic you have introduced for Focal support should not be applied to Xenial hosts as to not break existing installs during upgrades.
We verify that Xenial uses sysv script, and Focal is using the ossec.service file to start the service in the mon server.
8bbf202
to
a2aa941
Compare
Under Focal, we were writing the nameserver info to two (2) files, but only testing one of them. Using a vars-based approach now, and the test logic now looks in the correct spot for Focal.
Same as we've done for ossec-server, let's make sure that ossec-agent is also managed via systemd when running under Focal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking solid! Banged on the resolvconf settings, and the approach you took to use a flat file in /etc/resolv.conf
is definitely the best choice, among many (https://manpages.ubuntu.com/manpages/focal/man8/systemd-resolved.8.html). Overall the diff on this PR is rather small, given how many issues it closes 🙂
I've got just one remaining request prior to merge: given the sysv -> systemd transition for ossec-server, let's also make the same update for ossec-agent, so we can reuse the same bootstrapping logic and even reuse the same tests to verify state on both machines. I took the liberty of pushing a new commit which copies over the ossec-server test to ossec-agent, which should cause CI to fail again
when: ansible_distribution_release == 'focal' | ||
tags: | ||
- dns | ||
- hardening |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a commit to consolidate the logic here a bit: under Focal, the old /etc/resolvconf/resolve.conf.d/
path is no longer written to, and the tests now inspect the correct file based on distro.
I made the change to have
@conorsch @emkll Can you please have a look and see if you can find the reason? I could not figure out why it was being able to start the service before. Still looking. |
The is the service file generated and used by the
|
Found the issue: Our |
OSSEC server and agent requires two different service files. Details at https://kushaldas.in/posts/story-of-debugging-exit-0.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Testinfra passes against focal and xenial scenarios with same number of pass/skip/xfail, no fails
- OSSEC alerts flow (tested using test ossec alert in JI) in focal
- OSSEC alerts continue to flow after app and mon reboot in focal
one nit re: string for GPG output in testinfra but I don't view it as a blocker.
pub 4096R/00F4AD77 2016-10-20 [expires: 2021-06-30] | ||
Key fingerprint = 2224 5C81 E3BA EB41 38B3 6061 310F 5612 00F4 AD77 | ||
uid SecureDrop Release Signing Key""" | ||
fpf_gpg_pub_key_info = "2224 5C81 E3BA EB41 38B3 6061 310F 5612 00F4 AD77" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having the full output seems more secure, especially if the testinfra tests are being used to verify a prod system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The output is different in both Xenial
and Focal
, that is why I took this path.
changes made as per request
Codecov ReportAll modified and coverable lines are covered by tests ✅
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## develop #5638 +/- ##
===========================================
- Coverage 85.40% 85.32% -0.09%
===========================================
Files 50 50
Lines 3679 3679
Branches 460 460
===========================================
- Hits 3142 3139 -3
- Misses 438 440 +2
- Partials 99 100 +1 ☔ View full report in Codecov by Sentry. |
Status
Ready for review
Description of Changes
Fixes #5636 #5635 #5637 #5655 #5510
systemd-resolved
on Focal and uses/etc/resolv.conf
file.dhclient
path for both Xenial and Focal testingsysv
script on Xenial, andossec.service
systemd style service file on Focal.ossec
service started on Xenial and Focal.https://kushaldas.in/posts/story-of-debugging-exit-0.html describes why do we need two different service files for
ossec-sever
andossec-agent
.Testing
molecule test -s libvirt-staging-focal
should not have any test failure ontest_apparmor.py
, there will 4 other test failure due to different bugs.Checklist
If you made changes to the system configuration:
If you made non-trivial code changes:
Choose one of the following: