Data Transport Cert Secret Size Overrun With Big Scale Out #6954

rtluckie · 2023-06-23T23:49:29Z

Bug Report

What did you do?

Attempted to scale the data replicas to 250.

What did you expect to see?

successful scale up

What did you see instead? Under which circumstances?

It appears that the ECK operator will overflow the max k8s secret size (1MB) for the transport certs if you scale the data nodes to >250.
The operator gets stuck in a scale up loop while it tries to reconcile the cert secret. Even after scaling down the operator does not seem to recover.

"Secret "elasticsearch-XXX-es-data-es-transport-certs" is invalid: data: Too long: must have at most 1048576 bytes" error

Failed remediations

issue transport cert as documents here
issue wildcard transport certs as documents here

Environment

ECK version: 2.8.0
Kubernetes information:
- Cloud: GKE v1.26.3-gke.1000
kubectl version: v1.27.2
Resource definition:

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: elasticsearch-myapp
spec:
  version: 8.6.1
  http:
    tls:
      selfSignedCertificate:
        disabled: true
  nodeSets:
  - config:
      action:
        auto_create_index: false
      node.roles:
      - master
    count: 3
    name: election
    podTemplate:
      metadata:
        annotations:
          linkerd.io/inject: enabled
        labels:
          ec.ai/component: elasticsearch
          ec.ai/component_group: myapp-service
      spec:
        affinity:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
              - matchExpressions:
                - key: cloud.google.com/gke-spot
                  operator: DoesNotExist
          podAntiAffinity:
            preferredDuringSchedulingIgnoredDuringExecution:
            - podAffinityTerm:
                labelSelector:
                  matchExpressions:
                  - key: elasticsearch.k8s.elastic.co/cluster-name
                    operator: In
                    values:
                    - elasticsearch-myapp
                topologyKey: topology.kubernetes.io/zone
              weight: 100
            requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                - key: elasticsearch.k8s.elastic.co/cluster-name
                  operator: In
                  values:
                  - elasticsearch-myapp
              topologyKey: kubernetes.io/hostname
        automountServiceAccountToken: true
        containers:
        - name: elasticsearch
          resources:
            limits:
              cpu: "2"
              memory: 5Gi
            requests:
              cpu: "1"
              memory: 5Gi
        initContainers:
        - command:
          - sh
          - -c
          - sysctl -w vm.max_map_count=262144
          image: busybox:1.28
          name: sysctl
          securityContext:
            privileged: true
        - command:
          - sh
          - -c
          - bin/elasticsearch-plugin install --batch analysis-icu
          name: analysis-icu
        - command:
          - sh
          - -c
          - bin/elasticsearch-plugin install --batch repository-gcs
          name: repository-gcs
        priorityClassName: app-critical-preempting
        serviceAccount: myapp-elasticsearch
        serviceAccountName: myapp-elasticsearch
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 8Gi
        storageClassName: standard-rwo
  - config:
      action:
        auto_create_index: false
      node.roles:
      - data
    count: 200
    name: data
    podTemplate:
      metadata:
        annotations:
          linkerd.io/inject: enabled
        labels:
          ec.ai/component: elasticsearch
          ec.ai/component_group: myapp-service
      spec:
        affinity:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
              - matchExpressions:
                - key: node_pool
                  operator: In
                  values:
                  - n2d-custom-8-65536
          podAntiAffinity:
            preferredDuringSchedulingIgnoredDuringExecution:
            - podAffinityTerm:
                labelSelector:
                  matchExpressions:
                  - key: elasticsearch.k8s.elastic.co/cluster-name
                    operator: In
                    values:
                    - elasticsearch-myapp
                topologyKey: topology.kubernetes.io/zone
              weight: 100
            requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                - key: elasticsearch.k8s.elastic.co/cluster-name
                  operator: In
                  values:
                  - elasticsearch-myapp
              topologyKey: kubernetes.io/hostname
        automountServiceAccountToken: true
        containers:
        - name: elasticsearch
          resources:
            limits:
              cpu: "7"
              memory: 56Gi
            requests:
              cpu: "7"
              memory: 56Gi
        initContainers:
        - command:
          - sh
          - -c
          - sysctl -w vm.max_map_count=262144
          image: busybox:1.28
          name: sysctl
          securityContext:
            privileged: true
        - command:
          - sh
          - -c
          - bin/elasticsearch-plugin install --batch analysis-icu
          name: analysis-icu
        - command:
          - sh
          - -c
          - bin/elasticsearch-plugin install --batch repository-gcs
          name: repository-gcs
        priorityClassName: app-high-preempting
        serviceAccount: myapp-elasticsearch
        serviceAccountName: myapp-elasticsearch
        tolerations:
        - effect: NoSchedule
          key: n2d-custom-8-65536
          operator: Equal
          value: "true"
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 500Gi
        storageClassName: standard-rwo

Logs:

Continuous loop of reconciliation failures and timeout accompanied by the following.

 Secret "elasticsearch-myapp-es-data-es-transport-certs.v1" is invalid: data: Too long: must have at most 1048576 character

The text was updated successfully, but these errors were encountered:

pebrc · 2023-06-24T13:01:52Z

One thing you can do to work around this limitation is to create multiple node sets with the data role and scale each of those up until you start running into the size limitation of k8s secrets which seems to be around 150-200 nodes. You can then keep adding node sets until reach the desired scale. See this issue for more context on the current model of one secret for transport certificates per node set.

nullren · 2024-03-20T18:27:25Z

@pebrc are there any plans to address this? it's been several years since the workaround was implemented. we run a very large deployment of many ES clusters (of which this operator has been fantastically helpful), so when adding some of our more larger clusters, i bumped into this error. quite a surprise, you can imagine.

barkbay · 2024-03-21T11:09:53Z

I'm wondering if we could stop reconciling that Secret if we use a CSI driver to manage the certificates for example? (Or give an option to the user skip the reconciliation of that Secret?)

pebrc · 2024-03-22T08:55:47Z

@barkbay I think that's a good idea.

@nullren we don't have concrete plans to address this right now. Did the workaround, using multiple node sets instead of one big one, have drawbacks for you that made you want to stick with a single node set?

nullren · 2024-04-03T23:07:13Z

@barkbay I think that's a good idea.

@nullren we don't have concrete plans to address this right now. Did the workaround, using multiple node sets instead of one big one, have drawbacks for you that made you want to stick with a single node set?

The work around did "work", but it is a whole lot of unnecessary complexity for something we don't even use (we disable security and dont use the certs at all as we use our own network framework on k8s). There's just a lot of extra tooling we have to update to ensure that node sets "data-0", "data-1", ..., "data-N" are all found and reconciled correctly. Still finding some bugs due to this.

Related to #6954 It offers users a workaround for the problem with too many certificates in the transport certificate secret. They can configure external transport cert provisioning and disable self-signed transport certificates. When using a solution eg. like cert-manager's csi-driver as [documented here ](https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-transport-settings.html#k8s-transport-third-party-tools) this should allow for larger node sets of more than 250 nodes. The large cluster scenario is certainly an an edge case but on smaller clusters the disabling of certificate provisioning might still be attractive [reducing the amount of work the operator has to do in this area.](#1841) Note the new option to disable the self-signed transport certificates below: ```yaml apiVersion: elasticsearch.k8s.elastic.co/v1 kind: Elasticsearch metadata: name: es spec: version: 8.6.2 transport: tls: certificateAuthorities: configMapName: trust selfSignedCertificates: disabled: true # <<<< new option nodeSets: - name: mixed count: 3 config: xpack.security.transport.ssl.key: /usr/share/elasticsearch/config/cert-manager-certs/tls.key xpack.security.transport.ssl.certificate: /usr/share/elasticsearch/config/cert-manager-certs/tls.crt node.store.allow_mmap: false podTemplate: spec: containers: - name: elasticsearch env: - name: PRE_STOP_ADDITIONAL_WAIT_SECONDS value: "5" volumeMounts: - name: transport-certs mountPath: /usr/share/elasticsearch/config/cert-manager-certs volumes: - name: transport-certs csi: driver: csi.cert-manager.io readOnly: true volumeAttributes: csi.cert-manager.io/issuer-name: ca-cluster-issuer csi.cert-manager.io/issuer-kind: ClusterIssuer csi.cert-manager.io/dns-names: "${POD_NAME}.${POD_NAMESPACE}.svc.cluster.local" ``` The option does not remove existing certificates from the secret so that the cluster keeps working during the transition if this option is turned on on an existing cluster. I also opted to remove the symlinking of certificates into the `emptyDir` config volume. I tried to figure out why we did this in the first place and am not sure. The reason I could think of was that we wanted to have static and predictable certificate and key file names across all nodes (`transport.tls.crt` and `transport.tls.key`) But we can just use the `POD_NAME` environment variable to link directly into the mounted certificate secret volume. The reason to change this behaviour now is again to support the transition between externally provisioned certs and self-signed certs provisioned by ECK: if a user flips the switch to disable and then re-enable the self-signed certs, but does this accidentally without also configuring the config settings for the transport layer there is an edge case where an Elasticsearch pod will crashloop and cannot recover if we use symlinking: 1. disable self-signed transport certs 2. scale the cluster up by one or more nodes 3. new nodes won't come up because certs are missing (user error) 4. user tries to recover by re-enabling self-signed certs 5. ES keeps bootlooping on the new nodes because the symlink is missing By removing the symlinking the node can recover as soon as the certificates appear in the filesystem. --------- Co-authored-by: Michael Morello <michael.morello@gmail.com> Co-authored-by: Michael Montgomery <mmontg1@gmail.com>

pebrc · 2024-07-23T09:45:04Z

We have implemented an option to turn off the ECK managed self-signed certificates in #7925 which is going to ship with the next release of ECK. This should cover the case you mentioned @nullren. This means we now have two workarounds for large clusters:

Either:

split a node set into mulitple node sets
or
disable the transport certs and provision them externally (e.g. with cert-manager)

My vote would be to close this issue unless there are additional concerns we did not address with these changes.

nullren · 2024-07-24T17:52:31Z

@pebrc that works for me. Thank you!

botelastic bot added the triage label Jun 23, 2023

pebrc added the >enhancement label Jul 7, 2023

botelastic bot removed the triage label Jul 7, 2023

naemono added the v2.14.0 label May 16, 2024

pebrc self-assigned this Jun 27, 2024

pebrc mentioned this issue Jul 1, 2024

Add option to disable self-signed transport certs #7925

Merged

pebrc closed this as completed Jul 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Transport Cert Secret Size Overrun With Big Scale Out #6954

Data Transport Cert Secret Size Overrun With Big Scale Out #6954

rtluckie commented Jun 23, 2023 •

edited

Loading

pebrc commented Jun 24, 2023

nullren commented Mar 20, 2024

barkbay commented Mar 21, 2024

pebrc commented Mar 22, 2024

nullren commented Apr 3, 2024

pebrc commented Jul 23, 2024

nullren commented Jul 24, 2024

Data Transport Cert Secret Size Overrun With Big Scale Out #6954

Data Transport Cert Secret Size Overrun With Big Scale Out #6954

Comments

rtluckie commented Jun 23, 2023 • edited Loading

Bug Report

pebrc commented Jun 24, 2023

nullren commented Mar 20, 2024

barkbay commented Mar 21, 2024

pebrc commented Mar 22, 2024

nullren commented Apr 3, 2024

pebrc commented Jul 23, 2024

nullren commented Jul 24, 2024

rtluckie commented Jun 23, 2023 •

edited

Loading