deploy/kubernetes/base/node.yaml: Fixing deployment on Bottlerocket. #247

tatodorov · 2020-09-09T19:46:20Z

Updated the path where are written efs-utils.conf, efs-utils.crt and
privateKey.pem files to /var/amazon/efs which is one of the few
writeable file systems in Bottlerocket.

Is this a bug fix or adding new feature?
This resolves issue #246.

What is this PR about? / Why do we need it?
This allows aws-efs-csi-driver to be deployed also on Bottlerocket OS where /etc is read-only.

What testing is done?
aws-efs-csi-driver was successfully deployed on Amazon Linux 2 and Bottlerocket OS.

on Bottlerocket OS

NAME                              STATUS   ROLES    AGE   VERSION   INTERNAL-IP       EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION   CONTAINER-RUNTIME
ip-192-168-102-136.ec2.internal   Ready    <none>   14m   v1.17.9   192.168.102.136   <none>        Bottlerocket OS 1.0.1   5.4.50           containerd://1.3.7+unknown
ip-192-168-102-151.ec2.internal   Ready    <none>   14m   v1.17.9   192.168.102.151   <none>        Bottlerocket OS 1.0.1   5.4.50           containerd://1.3.7+unknown
ip-192-168-102-72.ec2.internal    Ready    <none>   14m   v1.17.9   192.168.102.72    <none>        Bottlerocket OS 1.0.1   5.4.50           containerd://1.3.7+unknown

kubectl -n kube-system get pods -l app=efs-csi-node -o wide
NAME                 READY   STATUS    RESTARTS   AGE   IP                NODE                              NOMINATED NODE   READINESS GATES
efs-csi-node-bpxd9   3/3     Running   0          20m   192.168.102.136   ip-192-168-102-136.ec2.internal   <none>           <none>
efs-csi-node-mgzpk   3/3     Running   0          19m   192.168.102.151   ip-192-168-102-151.ec2.internal   <none>           <none>
efs-csi-node-mwfm7   3/3     Running   0          20m   192.168.102.72    ip-192-168-102-72.ec2.internal    <none>           <none>

Logs from efs-plugin:

kubectl -n kube-system logs efs-csi-node-bpxd9 efs-plugin
0909 20:33:43.690761       1 mount_linux.go:163] Cannot run systemd-run, assuming non-systemd OS
I0909 20:33:43.690825       1 mount_linux.go:164] systemd-run failed with: exit status 1
I0909 20:33:43.690834       1 mount_linux.go:165] systemd-run output:
I0909 20:33:43.691006       1 driver.go:87] Starting watchdog
I0909 20:33:43.691089       1 efs_watch_dog.go:174] Copying /etc/amazon/efs/efs-utils.conf since it doesn't exist
I0909 20:33:43.691155       1 efs_watch_dog.go:174] Copying /etc/amazon/efs/efs-utils.crt since it doesn't exist
I0909 20:33:43.691443       1 driver.go:93] Staring subreaper
I0909 20:33:43.691458       1 driver.go:96] Listening for connections on address: &net.UnixAddr{Name:"/csi/csi.sock", Net:"unix"}
I0909 20:33:45.219276       1 node.go:242] NodeGetInfo: called with args
I0909 20:35:07.764987       1 node.go:226] NodeGetCapabilities: called with args
I0909 20:35:10.413111       1 node.go:226] NodeGetCapabilities: called with args
I0909 20:35:10.414248       1 node.go:51] NodePublishVolume: called with args volume_id:"fs-2f7ccfac:/default/test-efs-pod" target_path:"/var/lib/kubelet/pods/36fdc3f8-e12a-44e8-8c3b-ef75f266ccc2/volumes/kubernetes.io~csi/test-efs-pod-data/mount" volume_capability:<mount:<> access_mode:<mode:SINGLE_NODE_WRITER > > volume_context:<key:"encryptInTransit" value:"true" >
I0909 20:35:10.414314       1 node.go:167] NodePublishVolume: creating dir /var/lib/kubelet/pods/36fdc3f8-e12a-44e8-8c3b-ef75f266ccc2/volumes/kubernetes.io~csi/test-efs-pod-data/mount
I0909 20:35:10.414341       1 node.go:172] NodePublishVolume: mounting fs-2f7ccfac:/default/test-efs-pod at /var/lib/kubelet/pods/36fdc3f8-e12a-44e8-8c3b-ef75f266ccc2/volumes/kubernetes.io~csi/test-efs-pod-data/mount with options [tls]
I0909 20:35:10.414355       1 mount_linux.go:135] Mounting cmd (mount) with arguments ([-t efs -o tls fs-2f7ccfac:/default/test-efs-pod /var/lib/kubelet/pods/36fdc3f8-e12a-44e8-8c3b-ef75f266ccc2/volumes/kubernetes.io~csi/test-efs-pod-data/mount])

NOTE

This PR resolves the deployment of aws-efs-csi-driver to Bottlerocket OS, but there is still an issue to attach EFS to a Pod when running on Bottlerocket OS.

Log from efs-plugin

Sep 10 14:56:08 ip-192-168-102-149.ec2.internal kubelet[3687]: E0910 14:56:08.234142    3687 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/csi/efs.csi.aws.com^fs-2f7ccfac:/default/test-efs-pod podName: nodeName:}" failed. No retries permitted until 2020-09-10 14:58:10.234068793 +0000 UTC m=+2597.763747074 (durationBeforeRetry 2m2s). Error: "MountVolume.SetUp failed for volume \"test-efs-pod-data\" (UniqueName: \"kubernetes.io/csi/efs.csi.aws.com^fs-2f7ccfac:/default/test-efs-pod\") pod \"test-efs-pod-0\" (UID: \"1fdd7f4e-46ef-48c5-a1be-f71864d51449\") : kubernetes.io/csi: mounter.SetupAt failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Sep 10 14:56:26 ip-192-168-102-149.ec2.internal kubelet[3687]: E0910 14:56:26.914296    3687 kubelet.go:1681] Unable to attach or mount volumes for pod "test-efs-pod-0_default(1fdd7f4e-46ef-48c5-a1be-f71864d51449)": unmounted volumes=[test-efs-pod-data], unattached volumes=[test-efs-pod-data test-efs-pod-token-2zppv]: timed out waiting for the condition; skipping pod
Sep 10 14:56:26 ip-192-168-102-149.ec2.internal kubelet[3687]: E0910 14:56:26.915660    3687 pod_workers.go:191] Error syncing pod 1fdd7f4e-46ef-48c5-a1be-f71864d51449 ("test-efs-pod-0_default(1fdd7f4e-46ef-48c5-a1be-f71864d51449)"), skipping: unmounted volumes=[test-efs-pod-data], unattached volumes=[test-efs-pod-data test-efs-pod-token-2zppv]: timed out waiting for the condition

Log from kubelet

Sep 10 06:59:16 ip-192-168-102-157.ec2.internal kubelet[3589]: I0910 06:59:16.953435    3589 reconciler.go:209] operationExecutor.VerifyControllerAttachedVolume started for volume "test-efs-pod-data" (UniqueName: "kubernetes.io/csi/efs.csi.aws.com^fs-2f7ccfac:/default/test-efs-pod") pod "test-efs-pod-0" (UID: "31bec37a-0b16-441a-b323-70be44dcd57c")
Sep 10 06:59:16 ip-192-168-102-157.ec2.internal kubelet[3589]: I0910 06:59:16.954345    3589 reconciler.go:209] operationExecutor.VerifyControllerAttachedVolume started for volume "test-efs-pod-token-fjjz9" (UniqueName: "kubernetes.io/secret/31bec37a-0b16-441a-b323-70be44dcd57c-test-efs-pod-token-fjjz9") pod "test-efs-pod-0" (UID: "31bec37a-0b16-441a-b323-70be44dcd57c")
Sep 10 06:59:17 ip-192-168-102-157.ec2.internal kubelet[3589]: I0910 06:59:17.056503    3589 clientconn.go:104] parsed scheme: ""
Sep 10 06:59:17 ip-192-168-102-157.ec2.internal kubelet[3589]: I0910 06:59:17.056760    3589 clientconn.go:104] scheme "" not registered, fallback to default scheme
Sep 10 06:59:17 ip-192-168-102-157.ec2.internal kubelet[3589]: I0910 06:59:17.056874    3589 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{/var/lib/kubelet/plugins/efs.csi.aws.com/csi.sock 0  <nil>}] <nil>}
Sep 10 06:59:17 ip-192-168-102-157.ec2.internal kubelet[3589]: I0910 06:59:17.056967    3589 clientconn.go:577] ClientConn switching balancer to "pick_first"
Sep 10 06:59:17 ip-192-168-102-157.ec2.internal kubelet[3589]: I0910 06:59:17.063981    3589 csi_attacher.go:310] kubernetes.io/csi: attacher.MountDevice STAGE_UNSTAGE_VOLUME capability not set. Skipping MountDevice...
Sep 10 06:59:17 ip-192-168-102-157.ec2.internal kubelet[3589]: I0910 06:59:17.064015    3589 operation_generator.go:587] MountVolume.MountDevice succeeded for volume "test-efs-pod-data" (UniqueName: "kubernetes.io/csi/efs.csi.aws.com^fs-2f7ccfac:/default/test-efs-pod") pod "test-efs-pod-0" (UID: "31bec37a-0b16-441a-b323-70be44dcd57c") device mount path "/var/lib/kubelet/plugins/kubernetes.io/csi/pv/test-efs-pod-data/globalmount"

on Amazon Linux 2 (EKS optimized)

NAME                              STATUS   ROLES    AGE   VERSION              INTERNAL-IP       EXTERNAL-IP   OS-IMAGE         KERNEL-VERSION                  CONTAINER-RUNTIME
ip-192-168-102-132.ec2.internal   Ready    <none>   53s   v1.17.9-eks-4c6976   192.168.102.132   <none>        Amazon Linux 2   4.14.193-149.317.amzn2.x86_64   docker://19.3.6
ip-192-168-102-142.ec2.internal   Ready    <none>   82s   v1.17.9-eks-4c6976   192.168.102.142   <none>        Amazon Linux 2   4.14.193-149.317.amzn2.x86_64   docker://19.3.6
ip-192-168-102-91.ec2.internal    Ready    <none>   91s   v1.17.9-eks-4c6976   192.168.102.91    <none>        Amazon Linux 2   4.14.193-149.317.amzn2.x86_64   docker://19.3.6

kubectl -n kube-system get pods -l app=efs-csi-node -o wide
NAME                 READY   STATUS    RESTARTS   AGE   IP                NODE                              NOMINATED NODE   READINESS GATES
efs-csi-node-dwfxh   3/3     Running   0          76s   192.168.102.142   ip-192-168-102-142.ec2.internal   <none>           <none>
efs-csi-node-sg2sn   3/3     Running   0          47s   192.168.102.132   ip-192-168-102-132.ec2.internal   <none>           <none>
efs-csi-node-xfbqx   3/3     Running   0          85s   192.168.102.91    ip-192-168-102-91.ec2.internal    <none>           <none>

Logs from efs-plugin:

kubectl -n kube-system logs efs-csi-node-sg2sn efs-plugin
I0909 19:44:01.135535       1 mount_linux.go:163] Cannot run systemd-run, assuming non-systemd OS
I0909 19:44:01.135652       1 mount_linux.go:164] systemd-run failed with: exit status 1
I0909 19:44:01.135664       1 mount_linux.go:165] systemd-run output: Failed to create bus connection: No such file or directory
I0909 19:44:01.135819       1 driver.go:87] Starting watchdog
I0909 19:44:01.135905       1 efs_watch_dog.go:174] Copying /etc/amazon/efs/efs-utils.conf since it doesn't exist
I0909 19:44:01.136091       1 efs_watch_dog.go:174] Copying /etc/amazon/efs/efs-utils.crt since it doesn't exist
I0909 19:44:01.137521       1 driver.go:93] Staring subreaper
I0909 19:44:01.137538       1 driver.go:96] Listening for connections on address: &net.UnixAddr{Name:"/csi/csi.sock", Net:"unix"}
I0909 19:44:03.078723       1 node.go:242] NodeGetInfo: called with args
I0909 19:44:17.031525       1 node.go:226] NodeGetCapabilities: called with args
I0909 19:44:17.632663       1 node.go:226] NodeGetCapabilities: called with args
I0909 19:44:17.638825       1 node.go:51] NodePublishVolume: called with args volume_id:"fs-2f7ccfac:/default/test-efs-pod" target_path:"/var/lib/kubelet/pods/befebc5f-ff88-40b1-bab5-8ac778cbf258/volumes/kubernetes.io~csi/test-efs-pod-data/mount" volume_capability:<mount:<> access_mode:<mode:SINGLE_NODE_WRITER > > volume_context:<key:"encryptInTransit" value:"true" >
I0909 19:44:17.638922       1 node.go:167] NodePublishVolume: creating dir /var/lib/kubelet/pods/befebc5f-ff88-40b1-bab5-8ac778cbf258/volumes/kubernetes.io~csi/test-efs-pod-data/mount
I0909 19:44:17.638953       1 node.go:172] NodePublishVolume: mounting fs-2f7ccfac:/default/test-efs-pod at /var/lib/kubelet/pods/befebc5f-ff88-40b1-bab5-8ac778cbf258/volumes/kubernetes.io~csi/test-efs-pod-data/mount with options [tls]
I0909 19:44:17.638970       1 mount_linux.go:135] Mounting cmd (mount) with arguments ([-t efs -o tls fs-2f7ccfac:/default/test-efs-pod /var/lib/kubelet/pods/befebc5f-ff88-40b1-bab5-8ac778cbf258/volumes/kubernetes.io~csi/test-efs-pod-data/mount])

Updated the path where are written efs-utils.conf, efs-utils.crt and privateKey.pem files to /var/amazon/efs which is one of the few writeable file systems in Bottlerocket. This resolves issue #246.

k8s-ci-robot · 2020-09-09T19:46:23Z

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.

If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
If you signed the CLA as a corporation, please sign in with your organization's credentials at https://identity.linuxfoundation.org/projects/cncf to be authorized.
If you have done the above and are still having issues with the CLA being reported as unsigned, please log a ticket with the Linux Foundation Helpdesk: https://support.linuxfoundation.org/
Should you encounter any issues with the Linux Foundation Helpdesk, send a message to the backup e-mail support address at: login-issues@jira.linuxfoundation.org

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

k8s-ci-robot · 2020-09-09T19:46:28Z

Welcome @tatodorov!

It looks like this is your first PR to kubernetes-sigs/aws-efs-csi-driver 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/aws-efs-csi-driver has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2020-09-09T19:46:28Z

Hi @tatodorov. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2020-09-09T19:46:28Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: tatodorov
To complete the pull request process, please assign d-nishi
You can assign the PR to them by writing /assign @d-nishi in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

wongma7 · 2020-10-21T18:52:47Z

We rely on this path being persisted in case the driver gets restarted. I am not 100% sure about the details but have observed that if /etc/amazon/efs/privateKey.pem is not persisted and the driver gets restarted then mounts can hang. The private key i used to make tls connections with EFS via stunnel.

We do not decide the path, efs-utils does https://github.com/aws/efs-utils/search?q=%2Fetc%2Famazon%2Fefs&type=code

jqmichael · 2020-10-21T18:58:25Z

Since the PR only changes path in the Volume, it shouldn't impact path on the EFS driver container, which is where efs-util is running.

wongma7 · 2020-10-21T18:58:26Z

@juqing pointed out that hostPath path is not the same as mountPath within the container, so the container can still write to its /etc/amazon/efs/ which is bind mounted to wherever on the host.

However I am still reluctant to merge this as is because it would break updates if a user has their privateKey stored on host at /etc/amazon/efs, then after the update the driver tries to read it from host at /var/amazon/efs. Thinking of a solution....

wongma7 · 2020-10-21T19:07:22Z

I think we need a bottlerocket overlay

faarshad · 2020-10-21T19:46:23Z

I think we need a bottlerocket overlay

Fyi, some folks who are doing migration from regular Amazon Linux 2(AL2) to Bottlerocket will need to run both AL2 and Bottlerocket in parallel for some duration thus needing to deploy the csi driver in both. A separate overlay might full-fill bottlerocket deployment needs but will not be generic!

wongma7 · 2020-10-21T20:08:11Z

If bottlerocket/al2 nodes have a well-known label distinguishing them we could put nodeSelectors in the overlays and instruct users to deploy two DaemonSets. Or instruct users to label the nodes themselves and set nodeSelectors for two DaemonSets accordingly. I want to minimize disruption for users but I think it is unavoidable that they will have to do something special

faarshad · 2020-10-21T21:02:04Z

I am favoring the idea of users setting specific node-labels on each of the two node types and the overlay being generic enough to use nodeSelector based on the set nodeLabels.

jqmichael · 2020-12-02T23:29:20Z

We had an offline chat.

In summary, for the upgrade case where there're already files present at /etc/amazon/efs on the host, driver will keep writing to the same directory. Otherwise, driver will start writing to /var/amazon/efs on the host. Within the container, all files will still be read from or written to /etc/amazon/efs.

The specific plan is to mount /var/amazon/efs, the new folder, to the driver container as is and /etc/amazon/efs, the original folder, to the container as /etc/amazon/efs-x. At driver startup, it will check if there're files already present at /etc/amazon/efs-x, if so, it's the upgrade case, it will add a symlink /etc/amazon/efs in the container to /etc/amazon/efs-x. Otherwise it's a fresh host/cluster, driver will add the same symlink /etc/amazon/efs to /var/amazon/efs.

CC: @wongma7

wongma7 · 2021-01-07T22:24:08Z

/close

superseded by #286 which I am working on getting into the next release (don't think I will make this week but shooting for next week)

Thanks all for productive discussion.

k8s-ci-robot · 2021-01-07T22:24:15Z

@tatodorov: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

deploy/kubernetes/base/node.yaml: Fixing deployment on Bottlerocket.

a9eb197

Updated the path where are written efs-utils.conf, efs-utils.crt and privateKey.pem files to /var/amazon/efs which is one of the few writeable file systems in Bottlerocket. This resolves issue #246.

k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Sep 9, 2020

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Sep 9, 2020

k8s-ci-robot requested review from d-nishi and jsafrane September 9, 2020 19:46

k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Sep 9, 2020

This was referenced Nov 23, 2020

How to install additional softwares in the Bottlerocket instance ? bottlerocket-os/bottlerocket#1222

Closed

[EKS] unable to deploy the aws-efs-csi-driver bottlerocket-os/bottlerocket#1111

Closed

webern mentioned this pull request Dec 7, 2020

change config dir location #286

Merged

wongma7 closed this Jan 7, 2021

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deploy/kubernetes/base/node.yaml: Fixing deployment on Bottlerocket. #247

deploy/kubernetes/base/node.yaml: Fixing deployment on Bottlerocket. #247

tatodorov commented Sep 9, 2020 •

edited

Loading

k8s-ci-robot commented Sep 9, 2020

k8s-ci-robot commented Sep 9, 2020

k8s-ci-robot commented Sep 9, 2020

k8s-ci-robot commented Sep 9, 2020

wongma7 commented Oct 21, 2020

jqmichael commented Oct 21, 2020

wongma7 commented Oct 21, 2020

wongma7 commented Oct 21, 2020

faarshad commented Oct 21, 2020

wongma7 commented Oct 21, 2020

faarshad commented Oct 21, 2020

jqmichael commented Dec 2, 2020 •

edited

Loading

wongma7 commented Jan 7, 2021

k8s-ci-robot commented Jan 7, 2021

deploy/kubernetes/base/node.yaml: Fixing deployment on Bottlerocket. #247

deploy/kubernetes/base/node.yaml: Fixing deployment on Bottlerocket. #247

Conversation

tatodorov commented Sep 9, 2020 • edited Loading

k8s-ci-robot commented Sep 9, 2020

k8s-ci-robot commented Sep 9, 2020

k8s-ci-robot commented Sep 9, 2020

k8s-ci-robot commented Sep 9, 2020

wongma7 commented Oct 21, 2020

jqmichael commented Oct 21, 2020

wongma7 commented Oct 21, 2020

wongma7 commented Oct 21, 2020

faarshad commented Oct 21, 2020

wongma7 commented Oct 21, 2020

faarshad commented Oct 21, 2020

jqmichael commented Dec 2, 2020 • edited Loading

wongma7 commented Jan 7, 2021

k8s-ci-robot commented Jan 7, 2021

tatodorov commented Sep 9, 2020 •

edited

Loading

jqmichael commented Dec 2, 2020 •

edited

Loading