Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Beat E2E tests issues #3325

Merged
merged 3 commits into from
Jun 30, 2020
Merged

Conversation

david-kow
Copy link
Contributor

@david-kow david-kow commented Jun 26, 2020

This PR fixes three Beats E2E tests issues:

  1. SCC for OCP has capabilities added allowing for Auditbeat/Packetbeat/Journalbeat to run. This is a bit different than our non-OCP approach where we currently have a separate PSP for each. I think this is fine.
[2020-06-26T01:55:35.054Z] {"reason":"FailedCreate","message":"Error creating: pods \"test-ab-cfg-bzhr-beat-auditbeat-\" is forbidden: unable to validate against any security context constraint: [provider restricted: .spec.securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used provider restricted: .spec.securityContext.hostPID: Invalid value: true: Host PID is not allowed to be used spec.volumes[0]: Invalid value: \"hostPath\": hostPath volumes are not allowed to be used spec.volumes[1]: Invalid value: \"hostPath\": hostPath volumes are not allowed to be used spec.volumes[4]: Invalid value: \"hostPath\": hostPath volumes are not allowed to be used spec.volumes[6]: Invalid value: \"hostPath\": hostPath volumes are not allowed to be used spec.volumes[7]: Invalid value: \"hostPath\": hostPath volumes are not allowed to be used spec.volumes[8]: Invalid value: \"hostPath\": hostPath volumes are not allowed to be used spec.volumes[9]: Invalid value: \"hostPath\": hostPath volumes are not allowed to be used spec.containers[0].securityContext.securityContext.runAsUser: Invalid value: 0: must be in the ranges: [1000550000, 1000559999] capabilities.add: Invalid value: \"AUDIT_CONTROL\": capability may not be added capabilities.add: Invalid value: \"AUDIT_READ\": capability may not be added capabilities.add: Invalid value: \"AUDIT_WRITE\": capability may not be added spec.containers[0].securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used spec.containers[0].securityContext.hostPID: Invalid value: true: Host PID is not allowed to be used provider beat: .spec.securityContext.hostPID: Invalid value: true: Host PID is not allowed to be used capabilities.add: Invalid value: \"AUDIT_READ\": capability may not be added capabilities.add: Invalid value: \"AUDIT_WRITE\": capability may not be added capabilities.add: Invalid value: \"AUDIT_CONTROL\": capability may not be added spec.containers[0].securityContext.hostPID: Invalid value: true: Host PID is not allowed to be used]","kind":"DaemonSet","name":"test-ab-cfg-bzhr-beat-auditbeat","namespace":"e2e-ftrer-mercury"}
  1. Add automountServiceAccountToken to all Beat configs. Lower version Beats depended on /var/run/secrets/kubernetes.io/serviceaccount/namespace to exists in the pod.
Exiting: error initializing processors: Unable to get in cluster configuration: open /var/run/secrets/kubernetes.io/serviceaccount/namespace: no such file or directory
  1. Beats need more permissions than just kibana_admin or kibana_role to do the dashboard setup below certain Beat version.
{"type":"response","@timestamp":"2020-06-26T07:22:01Z","tags":["api"],"pid":6,"method":"post","statusCode":403,"req":{"url":"/api/kibana/dashboards/import?exclude=index-pattern&force=true","method":"post","headers":{"host":"test-ab-cfg-wftc-kb-http.e2e-mercury.svc:5601","user-agent":"Go-http-client/1.1","content-length":"37053","accept":"application/json","content-type":"application/json","kbn-xsrf":"1","accept-encoding":"gzip"},"remoteAddress":"10.0.88.2","userAgent":"10.0.88.2"},"res":{"statusCode":403,"responseTime":46,"contentLength":9},"message":"POST /api/kibana/dashboards/import?exclude=index-pattern&force=true 403 46ms - 9.0B"}
{"type":"log","@timestamp":"2020-06-26T07:22:01Z","tags":["debug","http","server","Kibana","cookie-session-storage"],"pid":6,"message":"Error: Unauthorized"}

@david-kow david-kow added >test Related to unit/integration/e2e tests :beats labels Jun 26, 2020
allowHostPorts: false
allowPrivilegeEscalation: true
allowPrivilegedContainer: true
allowedCapabilities: []
allowedCapabilities:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a comment to explain why these are necessary?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto for allowHostPID because it's not clear to me why it needs that, and also ditto for do we need to update the openshift docs to include these changes (my guess is yes)

@@ -68,12 +72,40 @@ func getBeatKibanaRoles(associated commonv1.Associated) (string, error) {
)
}

if strings.Contains(beat.Spec.Type, ",") {
return "", fmt.Errorf("beat type %s should not contain a comma", beat.Spec.Type)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's a general thing about the Beats CRD it should probably live outside the association controller, and be also enforced by the Beat controller & webhook?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's already forbidden by the Type regex requirements. One here is just double checking. Do you think it would be worth to check it in a validation too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's in the OpenAPI validation it should never get through, so I think we can omit the check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We seem to have an issue with that. But even without it, I'd like to keep another check close to the code just to make sure we are safe in case it'll get copied over somewhere not behind that regex check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah if the OpenAPI validation doesn't work we should still just check for it in the validation code we run at the beginning of the reconcile/webhook IMO rather than having type validation code scattered about

config/e2e/filebeat.yaml Outdated Show resolved Hide resolved
@@ -196,6 +196,7 @@ processors:
hostPID: true # Required by auditd module
dnsPolicy: ClusterFirstWithHostNet
hostNetwork: true
automountServiceAccountToken: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only necessary on some versions from your description, correct? Is it worth adding a comment here and updating the recipes/docs as well with this info? We call out other OpenShift specific configs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anyasabo, @sebgl: This is not OpenShift specific actually, only Beat version specific. I've added a comment, but I'd rather not add too much in the docs/samples for couple of reasons:

  • if we do it for one setting, why we don't do it for all of them? it would a bit of work
  • our docs start becoming Beats on k8s docs instead of just ECK docs
  • we use those configs to test across all versions/environments, so we might end up with if statements in the comments :) we already turn off some things for old kind version, because it doesn't work there

In general I'd like to think about us testing Beats as a proof we can deploy them correctly and they work to some degree. I believe high fidelity documentation about what and why could be a rabbit hole for us and should live in Beat docs. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah gotcha my reading comprehension is poor. I think it's a blurry line for when it is useful to have beats documentation in our repo, but am good with your decision here 👍

david-kow and others added 2 commits June 29, 2020 15:56
Co-authored-by: Michael Morello <michael.morello@gmail.com>
@david-kow david-kow requested review from barkbay, anyasabo and sebgl June 29, 2020 15:21
@david-kow
Copy link
Contributor Author

jenkins test this please

@david-kow david-kow requested a review from pebrc June 30, 2020 05:04
Copy link
Contributor

@barkbay barkbay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When running the tests this one failed at leat once (I ran the test with 7.5.2):

    --- FAIL: TestBeatKibanaRef/Verify_dashboards_installed (300.00s)
        utils.go:84: 
            	Error Trace:	utils.go:84
            	Error:      	Received unexpected error:
            	            	expected  Metricbeat dashboard [true], found dashboards [false]
            	Test:       	TestBeatKibanaRef/Verify_dashboards_installed

It just happened once, I have not been able to reproduce, so I'm not sure what to think about it...

Is there any scenario where the dashboards may not be installed ? (also re. this test there are a lot of Pods in a CrashloopBackoff state, but I guess it's because Kibana is not ready yet ?)

@david-kow david-kow merged commit a5f6392 into elastic:master Jun 30, 2020
@david-kow david-kow deleted the fix_beat_e2e_tests branch June 30, 2020 08:55
@david-kow
Copy link
Contributor Author

@barkbay It shouldn't happen. Logs from the Pod would be the most helpful I think.

Crashing is due to two factors, but both should be temporary:

  • Kibana not being up
  • Beat user for Kibana not created in ES yet

Maybe create a separate issue for flaky test?

david-kow added a commit to david-kow/cloud-on-k8s that referenced this pull request Jul 1, 2020
* Fix Beat E2E tests

* Update config/e2e/filebeat.yaml

Co-authored-by: Michael Morello <michael.morello@gmail.com>

* PR fixes

Co-authored-by: Michael Morello <michael.morello@gmail.com>
david-kow added a commit that referenced this pull request Jul 1, 2020
* Fix Beat E2E tests

* Update config/e2e/filebeat.yaml

Co-authored-by: Michael Morello <michael.morello@gmail.com>

* PR fixes

Co-authored-by: Michael Morello <michael.morello@gmail.com>

Co-authored-by: Michael Morello <michael.morello@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>test Related to unit/integration/e2e tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants