Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change config dir location #286

Merged
merged 1 commit into from
Jan 7, 2021

Conversation

webern
Copy link
Contributor

@webern webern commented Dec 7, 2020

Is this a bug fix or adding new feature?

Fixes #246
Replaces #247

Fixes bottlerocket-os/bottlerocket#1111

What is this PR about? / Why do we need it?

change config dir location

Before this change, the driver fails to work on Bottlerocket because it
tries to write files into the host directory /etc/amazon/efs. On
Bottlerocket that directory is not writable by the container.

We want to move the host mount point to /var/amazon/efs, but we need to
support backward compatibility with hosts that may already have configs
written to /etc/amazon/efs.

To solve this, we mount both host paths and check for the presence of
config files at container runtime. If we find files in the old location,
we symlink to that location, otherwise we symlink to the new location.

What testing is done?

I built my changes to a container image, pushed to a private ECR repo, and changed yaml files accordingly to use my changes. Throughout, I used a canary application from these instructions https://docs.aws.amazon.com/eks/latest/userguide/efs-csi.html which mounts an EFS volume and logs to it every five seconds.

  • First, I deployed the current production version to a cluster that had one AL2 node and one Bottlerocket node.
    • I observed that the deployment failed on Bottlerocket and succeeded on AL2.
    • I ssh'ed to the AL2 box and confirmed that files had been written to /etc/amazon/efs.
    • I confirmed that the canary application was writing logs to the EFS mount, but only on AL2, Bottlerocket was broken.
  • Next I deployed my updated version of the efs driver.
    • I observed that pods on both hosts were now healthy and writing to their EFS mounts.
    • I ssh'ed to AL2 and found that /etc/amazon/efs was still being used and /var/amazon/efs was empty.
    • I ssh'ed to Bottlerocket and found that /etc/amazon/efs was empty and `/var/amazon/efs/ was being used.
  • I rebooted both nodes and found that when the pods resumed, they were still able to write to their EFS mounts.
  • I also deployed my changed driver into a clean cluster and observed that both AL2 and Bottlerocket used /var/amazon/efs and that mounts were working.

@k8s-ci-robot
Copy link
Contributor

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Dec 7, 2020
@k8s-ci-robot
Copy link
Contributor

Welcome @webern!

It looks like this is your first PR to kubernetes-sigs/aws-efs-csi-driver 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/aws-efs-csi-driver has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot
Copy link
Contributor

Hi @webern. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Dec 7, 2020
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Dec 7, 2020
@webern
Copy link
Contributor Author

webern commented Dec 8, 2020

webern force-pushed the webern:bottlerocket branch from 78d281c to d31f20d

This push fixes the author and a couple of comments. I'm still figuring out the paperwork for the CLA.

@webern
Copy link
Contributor Author

webern commented Dec 11, 2020

webern force-pushed the webern:bottlerocket branch from d31f20d to dcee1e2

Hopefully this push will get me passed the CLA?

@wongma7
Copy link
Contributor

wongma7 commented Dec 11, 2020

/check-cla
/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Dec 11, 2020
@wongma7
Copy link
Contributor

wongma7 commented Dec 11, 2020

/check-cla

@webern
Copy link
Contributor Author

webern commented Dec 11, 2020

I signed it

@webern
Copy link
Contributor Author

webern commented Dec 11, 2020

/check-cla

@webern
Copy link
Contributor Author

webern commented Dec 11, 2020

I signed it

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Dec 11, 2020
@webern
Copy link
Contributor Author

webern commented Dec 11, 2020

/check-cla

@webern
Copy link
Contributor Author

webern commented Dec 11, 2020

/retest

@webern
Copy link
Contributor Author

webern commented Dec 12, 2020

Hmmm, OK, I may need some help understanding why the e2e tests are failing. Or maybe a walkthrough of what's actually happening. e.g. does a container with my Go changes get built and used in the test?

@webern webern force-pushed the bottlerocket branch 2 times, most recently from 7fc9f51 to e7cc126 Compare December 14, 2020 21:56
@webern
Copy link
Contributor Author

webern commented Dec 14, 2020

webern force-pushed the webern:bottlerocket branch from dcee1e2 to 7fc9f51

This push brings the helm chart inline with the other podspec.

webern force-pushed the webern:bottlerocket branch from 7fc9f51 to e7cc126

This push raises code coverage to 100% for the new function.

@webern
Copy link
Contributor Author

webern commented Dec 15, 2020

/retest

}
}

// TestInitConfigDirPreferredError asserts that a symlink an error is returned when a symlink cannot be created to the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asserts that a symlink an error is returned...

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 6, 2021
@wongma7
Copy link
Contributor

wongma7 commented Jan 6, 2021

Ok, got it, since the controller makes no reference to any config directories, the new function fails like this:

I0106 05:34:47.978762       1 config_dir.go:72] Creating symlink from '/etc/amazon/efs' to '/var/amazon/efs'
F0106 05:34:47.978839       1 main.go:60] config directory '/var/amazon/efs' does not exist: 'stat /var/amazon/efs: no such file or directory'

I guess it should be oky to remove the check and just create preferredDir /var/amazon/efs if it doesn't exist? In the node setting we can assume that it will be created by hostpath DirectoryOrCreate, in controller setting we don't care about persisting configs at all.

@kbasv since the controller should only briefly mount efs volumes to delete them is it necessary for its config and such to be persistent? I am saying here no, but could you please confirm.

@kbasv
Copy link

kbasv commented Jan 6, 2021

@wongma7 Controller doesn't require persistent configs

@webern
Copy link
Contributor Author

webern commented Jan 6, 2021

I guess it should be oky to remove the check and just create preferredDir /var/amazon/efs if it doesn't exist? In the node setting we can assume that it will be created by hostpath DirectoryOrCreate, in controller setting we don't care about persisting configs at all.

OK, If I understand correctly, we also need to run in an environment where the mounts that I was expecting are not there. In that case we probably want the contents of /etc/amazon/efs-static-files to be available at /etc/amazon/efs. I think the simplest thing to do would be to create a symlink /etc/amazon/efs -> /etc/amazon/efs-static-files when neither of the mounted directories are present.

@wongma7
Copy link
Contributor

wongma7 commented Jan 6, 2021

@webern yes that makes sense to me

@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Jan 7, 2021
@webern webern force-pushed the bottlerocket branch 2 times, most recently from ec2d8df to 5ccb581 Compare January 7, 2021 01:31
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 7, 2021
@webern
Copy link
Contributor Author

webern commented Jan 7, 2021

This has most of my changes. A simplification of the config dir function to account for runtimes where legacy and preferred host paths are not mounted:

webern force-pushed the webern:bottlerocket branch from 2e6f31a to 216e3f8 9 minutes ago

I missed a couple of refactor opportunities in the tests and added them in this push:

webern force-pushed the webern:bottlerocket branch from 216e3f8 to ec2d8df 2 minutes ago

I rebased and fixed conflicts in main.go:

webern force-pushed the webern:bottlerocket branch from ec2d8df to 5ccb581 2 minutes ago

@webern
Copy link
Contributor Author

webern commented Jan 7, 2021

webern force-pushed the webern:bottlerocket branch from 5ccb581 to 45acb17 20 seconds ago

I can't make os.CreateDirAll fail reliably in any environment, so I deleted that test. Unfortunately I can't give you 100% coverage as that error return cannot reasonably be exercised.

@webern
Copy link
Contributor Author

webern commented Jan 7, 2021

@webern yes that makes sense to me

Actually the right solution was to create an empty dir since the configs will be copied into it.

defer cleanup(t, dir)

// create legacy dir and a fake pre-existing conf file
// create an empty legacy dir
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit extra comment here

@wongma7
Copy link
Contributor

wongma7 commented Jan 7, 2021

/lgtm

beautiful, thank you

Next steps for me:

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 7, 2021
Before this change, the driver failed to work on Bottlerocket because it
tries to write files into the host directory /etc/amazon/efs. On
Bottlerocket that directory is not writable by the container.

We want to move the host mount point to /var/amazon/efs, but we need to
support backward compatibility with hosts that may already have configs
written to /etc/amazon/efs.

To solve this, we mount both host paths and check for the presence of
config files at container runtime. If we find files in the old location,
we symlink to that location, otherwise we symlink to the new location.
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 7, 2021
@webern
Copy link
Contributor Author

webern commented Jan 7, 2021

I fixed the incorrect comment. We needed a retest anyway. The 12 failed tests looked like timeouts, which is what I remember failed tests looking like in the past when a retest passed. Unfortunately fixing that comment removed the lgtm flag.

@wongma7
Copy link
Contributor

wongma7 commented Jan 7, 2021

/lgtm

Yes, just checked logs too and this time the containers all became ready so I don't think it has anything to do with this PR. will repro locally tomorrow if happens again

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 7, 2021
@k8s-ci-robot k8s-ci-robot merged commit 6ad12c1 into kubernetes-sigs:master Jan 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to deploy aws-efs-csi-driver on bottlerocket [EKS] unable to deploy the aws-efs-csi-driver
5 participants