Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNS resolution of hosNetwork pods (e.g. Restric Backup Addon) #1178

Closed
toschneck opened this issue Nov 30, 2020 · 3 comments · Fixed by #1179
Closed

DNS resolution of hosNetwork pods (e.g. Restric Backup Addon) #1178

toschneck opened this issue Nov 30, 2020 · 3 comments · Fixed by #1179
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@toschneck
Copy link
Member

What happened:
As using kubeone as seed cluster provisioner (on vSphere), we applied the restrict addon with the target to use the in cluster minio service minio.minio.svc.cluster.local. Unfortunately this didn't worked, because the in cluster DNS name didn't get resolved.

backup job yaml

apiVersion: v1
kind: Secret
metadata:
  name: s3-credentials
  namespace: kube-system
type: Opaque
data:
  AWS_ACCESS_KEY_ID: xxxxxxxxxxxxxx
  AWS_SECRET_ACCESS_KEY: xxxxxxxxxxxxxxxxxxxxxx
---
apiVersion: v1
kind: Secret
metadata:
  name: restic-config
  namespace: kube-system
type: Opaque
data:
  password: xxxxxxxxxxxxxxxxxxxxxxxx
---
apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: etcd-s3-backup
  namespace: kube-system
spec:
  concurrencyPolicy: Forbid
  failedJobsHistoryLimit: 1
  schedule: '@every 30m'
  successfulJobsHistoryLimit: 0
  suspend: false
  jobTemplate:
    spec:
      template:
        spec:
          hostNetwork: true
          nodeSelector:
            node-role.kubernetes.io/master: ""
          tolerations:
          - key: node-role.kubernetes.io/master
            effect: NoSchedule
            operator: Exists
          restartPolicy: OnFailure
          volumes:
          - name: etcd-backup
            emptyDir: {}
          - name: host-pki
            hostPath:
              path: /etc/kubernetes/pki
          initContainers:
          - name: snapshoter
            image: {{ Registry "gcr.io" }}/etcd-development/etcd:v3.4.3
            imagePullPolicy: IfNotPresent
            command:
            - /bin/sh
            - -c
            - |-
              set -euf
              mkdir -p /backup/pki/kubernetes
              mkdir -p /backup/pki/etcd
              cp -a /etc/kubernetes/pki/etcd/ca.crt /backup/pki/etcd/
              cp -a /etc/kubernetes/pki/etcd/ca.key /backup/pki/etcd/
              cp -a /etc/kubernetes/pki/ca.crt /backup/pki/kubernetes
              cp -a /etc/kubernetes/pki/ca.key /backup/pki/kubernetes
              cp -a /etc/kubernetes/pki/front-proxy-ca.crt /backup/pki/kubernetes
              cp -a /etc/kubernetes/pki/front-proxy-ca.key /backup/pki/kubernetes
              cp -a /etc/kubernetes/pki/sa.key /backup/pki/kubernetes
              cp -a /etc/kubernetes/pki/sa.pub /backup/pki/kubernetes
              etcdctl snapshot save /backup/etcd-snapshot.db
            env:
            - name: ETCDCTL_API
              value: "3"
            - name: ETCDCTL_DIAL_TIMEOUT
              value: 3s
            - name: ETCDCTL_CACERT
              value: /etc/kubernetes/pki/etcd/ca.crt
            - name: ETCDCTL_CERT
              value: /etc/kubernetes/pki/etcd/healthcheck-client.crt
            - name: ETCDCTL_KEY
              value: /etc/kubernetes/pki/etcd/healthcheck-client.key
            - name: ETCD_HOSTNAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            volumeMounts:
            - mountPath: /backup
              name: etcd-backup
            - mountPath: /etc/kubernetes/pki
              name: host-pki
              readOnly: true
          containers:
          - name: uploader
            image: {{ Registry "docker.io" }}/restic/restic:0.9.6
            imagePullPolicy: IfNotPresent
            command:
            - /bin/sh
            - -c
            - |-
              set -euf
              restic snapshots -q || restic init -q
              restic backup --tag=etcd --host=${ETCD_HOSTNAME} /backup
              restic forget --prune --keep-last 48
            env:
            - name: ETCD_HOSTNAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            - name: RESTIC_REPOSITORY
              value: "s3:http://minio.minio.svc.cluster.local:9000/kubermatic-etcd-backups"
            - name: RESTIC_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: restic-config
                  key: password
            - name: AWS_DEFAULT_REGION
              value: "<<AWS_DEFAULT_REGION>>"
            - name: AWS_ACCESS_KEY_ID
              valueFrom:
                secretKeyRef:
                  key: AWS_ACCESS_KEY_ID
                  name: s3-credentials
            - name: AWS_SECRET_ACCESS_KEY
              valueFrom:
                secretKeyRef:
                  key: AWS_SECRET_ACCESS_KEY
                  name: s3-credentials
            volumeMounts:
            - mountPath: /backup
              name: etcd-backup

Due to some debugging I find out that the /etc/resolv.conf of the job pod didn't contain the search domains. After some research, it seams that using the hostNetwork: true could somehow cause that DNS resolution doesn't work. Maybe also the combination with flannel is a issue. Some ref. upstream issues:

backup pod, without searchdomain:

# cat /etc/resolv.conf 
nameserver 10.2.0.1
search localdomain

normal POD

cat /etc/resolv.conf 
nameserver 169.254.20.10
search default.svc.cluster.local svc.cluster.local cluster.local localdomain
options ndots:5

As I don't think this a normal behavior, we should may investigate the DNS Resolution issue.

What is the expected behavior:

  • The provided backup addon should work with in cluster svc
  • In cluster DNS names should get resolved also for hostnetwork=true

How to reproduce the issue:

Anything else we need to know?
Issue happend in two different environments on vsphere. At my Lab setup (https://github.com/kubermatic-labs/kubermatic-demo/tree/master/vsphere) + customer

Information about the environment:
KubeOne version (kubeone version):

{
  "kubeone": {
    "major": "1",
    "minor": "1",
    "gitVersion": "v1.1.0",
    "gitCommit": "3e84d523a75cd178a7801a0fccf2b3195db3a376",
    "gitTreeState": "",
    "buildDate": "2020-11-17T23:13:15+01:00",
    "goVersion": "go1.15.2",
    "compiler": "gc",
    "platform": "linux/amd64"
  },
  "machine_controller": {
    "major": "1",
    "minor": "19",
    "gitVersion": "v1.19.0",
    "gitCommit": "",
    "gitTreeState": "",
    "buildDate": "",
    "goVersion": "",
    "compiler": "",
    "platform": "linux/amd64"
  }
}

Operating system: ubuntu
Provider you're deploying cluster on: vsphere
Operating system you're deploying on: ubuntu 18.04

workaround
For the backup location itself a the service IP of the minio svc could be used: kubectl get svc -n minio. Unfortunately this only as long stable as the service won't get redeployed

@toschneck toschneck added the kind/bug Categorizes issue or PR as related to a bug. label Nov 30, 2020
@xmudrii
Copy link
Member

xmudrii commented Nov 30, 2020

@toschneck Can you try setting dnsPolicy on the pod to ClusterFirstWithHostNet? The Kubernetes docs state that dnsPolicy should be set to ClusterFirstWithHostNet if hostNetwork is set to true.

If that solves the problem, we can create a PR to add this to the manifest.

@toschneck
Copy link
Member Author

will try it and let you know

@toschneck
Copy link
Member Author

toschneck commented Dec 1, 2020

@xmudrii it seams to work:

k logs --all-containers test-backup-job-gdbp8 -f
{"level":"info","ts":1606845603.9692698,"caller":"snapshot/v3_snapshot.go:110","msg":"created temporary db file","path":"/backup/etcd-snapshot.db.part"}
{"level":"warn","ts":"2020-12-01T18:00:03.988Z","caller":"clientv3/retry_interceptor.go:116","msg":"retry stream intercept"}
{"level":"info","ts":1606845603.9892063,"caller":"snapshot/v3_snapshot.go:121","msg":"fetching snapshot","endpoint":"127.0.0.1:2379"}
{"level":"info","ts":1606845605.7280579,"caller":"snapshot/v3_snapshot.go:134","msg":"fetched snapshot","endpoint":"127.0.0.1:2379","took":1.758665093}
{"level":"info","ts":1606845605.7292426,"caller":"snapshot/v3_snapshot.go:143","msg":"saved","path":"/backup/etcd-snapshot.db"}
Snapshot saved at /backup/etcd-snapshot.db
Fatal: unable to open config file: Stat: The specified key does not exist.
Is there a repository at the following location?
s3:http://minio.minio.svc.cluster.local:9000/kubermatic-etcd-backups
created new cache in /root/.cache/restic

Files:           9 new,     0 changed,     0 unmodified
Dirs:            0 new,     0 changed,     0 unmodified
Added to the repo: 34.631 MiB

processed 9 files, 34.631 MiB in 0:03
snapshot af93cda6 saved
Applying Policy: keep the last 48 snapshots snapshots
keep 1 snapshots:
ID        Time                 Host                         Tags        Reasons        Paths
----------------------------------------------------------------------------------------------
af93cda6  2020-12-01 18:00:09  tobi-kubeone-vsphere-1-cp-1  etcd        last snapshot  /backup
----------------------------------------------------------------------------------------------
1 snapshots

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants