Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fleet and elastic agent doesn't work without a ca.crt #4790

Closed
Raboo opened this issue Aug 23, 2021 · 6 comments · Fixed by #4807
Closed

fleet and elastic agent doesn't work without a ca.crt #4790

Raboo opened this issue Aug 23, 2021 · 6 comments · Fixed by #4807
Labels
>bug Something isn't working v1.8.0

Comments

@Raboo
Copy link

Raboo commented Aug 23, 2021

Bug Report

*What did you do?

I have a elasticsearch + kibana deployment that uses Letsencrypt ACME certificate manged via cert-manager.

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: elasticsearch0
  namespace: elastic0
spec:
  secretName: elasticsearch0-tls
  issuerRef:
    kind: ClusterIssuer
    name: letsencrypt
  dnsNames:
  - 'elasticsearch0.k8s.mydomain.net'
--- 
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: elastic0
  namespace: elastic0
spec:
...
  http:
    service:
      spec:
        type: LoadBalancer
    tls:
      certificate:
        secretName: elasticsearch0-tls

Deploying fleet server and elastic agent when using Letsencrypt ACME certificate manged via cert-manager doesn't work because the generated certificate secret doesn't contain a ca.crt and is expected by the current manifests. The tls.crt field in the cert-manager generated secrets contains the entire chain, from the leaf cert to the intermediate cert (not the root cert). cert-manager/cert-manager#1571 (comment)

What did you expect to see?
I expect fleet server and elastic agent to work without a private ca or self-signed tls setup.
I.e. to work with letsencrypt acme certificates generated with the help of cert-manager.

What did you see instead? Under which circumstances?
The pods fail to start because they are trying to copy ca.crt from a secret that doesn't have a ca.crt.
The pod manifest looks something like this

apiVersion: v1
kind: Pod
metadata:
  name: fleet-server0-agent-76cbcdd5b5-swcdm
  namespace: elastic0
spec:
  ...
  containers:
  - command:
    - /usr/bin/env
    - bash
    - -c
    - |
      #!/usr/bin/env bash
      set -e
      cp /mnt/elastic-internal/elasticsearch-association/elastic0/elastic0/certs/ca.crt /etc/pki/ca-trust/source/anchors/
      update-ca-trust
      /usr/bin/tini -- /usr/local/bin/docker-entrypoint -e
...

And startup fails with this message

cp: cannot stat '/mnt/elastic-internal/elasticsearch-association/elastic0/elastic0/certs/ca.crt': No such file or directory 

Environment

  • ECK version: 1.7.0
  • elastic stack version: 7.14.0
  • Kubernetes information:
    • On premise
    • Kubernetes distribution: Rancher v2.5.8
    • Kubernetes Version: v1.20.6
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.3", GitCommit:"ca643a4d1f7bfe34773c74f79527be4afd95bf39", GitTreeState:"clean", BuildDate:"2021-07-15T20:58:09Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.6", GitCommit:"8a62859e515889f07e3e3be6a1080413f17cf2c3", GitTreeState:"clean", BuildDate:"2021-04-15T03:19:55Z", GoVersion:"go1.15.10", Compiler:"gc", Platform:"linux/amd64"}
@botelastic botelastic bot added the triage label Aug 23, 2021
@pebrc pebrc added the >bug Something isn't working label Aug 23, 2021
@botelastic botelastic bot removed the triage label Aug 23, 2021
@pebrc
Copy link
Collaborator

pebrc commented Aug 23, 2021

What we could do here is make the trust update script that we inject when running in Fleet mode conditional on the existence of the ca.crt file. IIUC the update-ca-trust step that we want to do for self-signed CA's is completely unnecessary for already globally trusted CAs like Let's encrypt.

@Raboo
Copy link
Author

Raboo commented Aug 24, 2021

Yes, something like this would probably fix the problem.

set +e
[[ -f /mnt/elastic-internal/elasticsearch-association/elastic0/elastic0/certs/ca.crt ]] && cp /mnt/elastic-internal/elasticsearch-association/elastic0/elastic0/certs/ca.crt /etc/pki/ca-trust/source/anchors/ \
&& update-ca-trust
set -e

@david-kow
Copy link
Contributor

Hey @Raboo, thanks for the report. As you said, when Elasticsearch (and/or Kibana) Secret used for their TLS configuration doesn't contain ca.crt Agent in Fleet mode fails to start.

Bug(s)

There are two issues at play here, both are caused by not handling a scenario where Secret used by Kibana or Elasticsearch for TLS configuration don't contain ca.crt with CA to trust. Agent controller assumes this key is present and, when it's not, ca.crt file is not present where the container command and elastic-agent expects them:

  1. The container command copies ca.crt to /etc/pki/ca-trust/source/anchors and uses update-ca-trust to establish trust towards that CA for the Pod. When that file is not present, the copy will fail causing:
    cp: cannot stat '/mnt/elastic-internal/elasticsearch-association/logging/elasticsearch/certs/ca.crt': No such file or directory
  2. Agent controller prepares a configuration file (fleet-setup.yml) that elastic-agent reads to configure itself. When the configuration file contains paths to CAs (for Kibana or Elasticsearch), elastic-agent fails when it tries to read them and they are not present:
    Error: 1 error: open /mnt/elastic-internal/kibana-association/logging/kibana/certs/ca.crt: no such file or directory reading <nil>

Fix

ca.crt should be expected not to be present and, in such a case, not be provided in fleet-setup.yml. The fix is targetted for ECK 1.8.0.

CA copying and update-ca-trust should be removed altogether when Kibana starts allowing to configure CA for the Beats under Elastic Agent management.

Temporary workaround

The issue can be avoided by copying ca.crt only if it's present and removing the "ca" parts of the fleet-setup.yml file. It can be achieved by mounting fleet-setup-config Secret as a different file and writing a new one with the correct contents, as below. This workaround should work correctly for both Fleet Server and Elastic Agents in Fleet mode.

        containers:
        - name: agent
          volumeMounts: 
          - mountPath: /usr/share/elastic-agent/fleet-setup-base.yml
            name: fleet-setup-config # this volume is added by ECK, we only need to change the mount
            subPath: fleet-setup.yml
          command:
          - /usr/bin/env
          - bash
          - -c
          - |
            #!/usr/bin/env bash
            set -e
            grep -v "ca: /mnt/elastic-internal/.*\(elasticsearch\|kibana\).*" fleet-setup-base.yml > fleet-setup.yml
            /usr/bin/tini -- /usr/local/bin/docker-entrypoint -e

@Raboo
Copy link
Author

Raboo commented Sep 2, 2021

@david-kow Thanks, the work-around works.

But now I hit the next problem with using a publicly trusted ca. Perhaps I should create a new issue for this?

All the services provided by ECK defaults to internal hostnames.

Kibana Fleet setup failed: http POST request to https://kibana0-kb-http.elastic0.svc:443/api/fleet/setup fails: fail to execute the HTTP POST request: Post "https://kibana0-kb-http.elastic0.svc:443/api/fleet/setup": x509: certificate is valid for kibana0.k8s.deltaprojects.net, not kibana0-kb-http.elastic0.svc. Response: 

Error: http POST request to https://kibana0-kb-http.elastic0.svc:443/api/fleet/setup fails: fail to execute the HTTP POST request: Post "https://kibana0-kb-http.elastic0.svc:443/api/fleet/setup": x509: certificate is valid for kibana0.k8s.deltaprojects.net, not kibana0-kb-http.elastic0.svc. Response: 

But for all other services like kibana, beats and such I've been able to specify a different host for elasticsearch and kibana.
I tried this for fleet, but this wasn't accepted

  config:
    fleet_server:
      elasticsearch:
        host: https://elasticsearch0.k8s.mydomain.net:9200
    kibana:
      fleet:
        host: https://kibana0.k8s.mydomain.net

Also tried

  config:
    fleet-setup:
      fleet_server:
        elasticsearch:
          host: https://elasticsearch0.k8s.mydomain.net:9200
      kibana:
        fleet:
          host: https://kibana0.k8s.mydomain.net

None of these variants works and I haven't seen any syntax in the documentation.

For example this works for beats

  config:
    output:
      elasticsearch:
        hosts: https://elasticsearch0.k8s.mydomain.net:9200
    setup:
      kibana:
        host: https://kibana0.k8s.mydomain.net:443

@david-kow
Copy link
Contributor

Hey @Raboo. The hosts for Kibana and Elasticsearch are initially coming from setup-fleet.yml which is not really friendly right now when it comes to modifications. You can use environmental variables to set your own hosts, but setup-fleet.yml contents take precedence, so you'd need to remove hosts from there first. Available environmental variables are documented.fleet-setup.yml is not, but it's entries map directly to the documented variables.

  1. Change grep from workaround above to grep -v "ca: /mnt/elastic-internal/.*\(elasticsearch\|kibana\).*" fleet-setup-base.yml | grep -v "host: .*" > fleet-setup.yml
  2. Add the below to the agent container to set your hosts in Fleet Server:
env:
- name: FLEET_SERVER_ELASTICSEARCH_HOST
   value: https://elasticsearch0.k8s.mydomain.net:9200
- name: KIBANA_FLEET_HOST
   value: https://kibana0.k8s.mydomain.net:443
  1. Add KIBANA_FLEET_HOST similarly to above to your Elastic Agent.

@Raboo
Copy link
Author

Raboo commented Sep 3, 2021

Awesome, that works

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug Something isn't working v1.8.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants