Skip to content

Commit

Permalink
Address Review Comments
Browse files Browse the repository at this point in the history
  • Loading branch information
mbaijal committed Dec 11, 2021
1 parent 8840d39 commit f7d8ba9
Showing 1 changed file with 12 additions and 91 deletions.
103 changes: 12 additions & 91 deletions examples/aws/storage-efs/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Deploying Kubeflow with AWS EFS as Persistent Storage

This guide describes how to deploy Kubeflow on AWS EKS using EFS as the Persistent Storage. <complete intro>
This guide describes how to use Amazon EFS as Persistent storage with Kubeflow.

## 1.0 Prerequisites

Expand All @@ -21,18 +21,7 @@ We recommend installing the EFS CSI Driver v1.3.4 directly from the [the aws-efs
kubectl apply -k "github.com/kubernetes-sigs/aws-efs-csi-driver/deploy/kubernetes/overlays/stable/?ref=tags/v1.3.4"
```

You should see the following resources get created -
```
serviceaccount/efs-csi-controller-sa created
serviceaccount/efs-csi-node-sa created
clusterrole.rbac.authorization.k8s.io/efs-csi-external-provisioner-role created
clusterrolebinding.rbac.authorization.k8s.io/efs-csi-provisioner-binding created
deployment.apps/efs-csi-controller created
daemonset.apps/efs-csi-node created
csidriver.storage.k8s.io/efs.csi.aws.com configured
```

Additionally, you can confirm that EFS CSI Driver was installed into the default kube-system namespace for you. You can check using the following command -
You can confirm that EFS CSI Driver was installed into the default kube-system namespace for you. You can check using the following command -
```
kubectl get csidriver
Expand All @@ -41,7 +30,7 @@ efs.csi.aws.com false false Persistent 5d17h
```

## 3.0 Create the IAM Policy for the CSI Driver
The driver requires IAM permission to talk to Amazon EFS to manage the volume on user's behalf. Here, we will be creating/annotating the Service Account `efs-csi-controller-sa` with an IAM Role which has the required permissions.
The CSI driver's service account (created during installation) requires IAM permission to make calls to AWS APIs on your behalf. Here, we will be annotating the Service Account `efs-csi-controller-sa` with an IAM Role which has the required permissions.

1. Download the IAM policy document from GitHub as follows -

Expand Down Expand Up @@ -71,84 +60,16 @@ eksctl create iamserviceaccount \

4. You can verify by describing the specified service account to check if it has been correctly annotated -
```
kubectl describe -n kube-system serviceaccount efs-csi-controller-sa -n kube-system
kubectl describe -n kube-system serviceaccount efs-csi-controller-sa
```

## 4.0 Create an Instance of the EFS Filesystem
This section creates a new EFS volume using for your cluster. Please refer to the official [AWS Documents](https://docs.aws.amazon.com/eks/latest/userguide/efs-csi.html) for more details.

1. Retrieve the VPC ID that your cluster is in and store it in a variable for use in a later step.
```
vpc_id=$(aws eks describe-cluster \
--name $CLUSTER_NAME\
--query "cluster.resourcesVpcConfig.vpcId" \
--output text)
```

2. Retrieve the CIDR range for your cluster's VPC and store it in a variable for use in a later step.
```
cidr_range=$(aws ec2 describe-vpcs \
--vpc-ids $vpc_id \
--query "Vpcs[].CidrBlock" \
--output text)
```

3. Create a security group with an inbound rule that allows inbound NFS traffic for your Amazon EFS mount points.

a. Create a security group.
```
security_group_id=$(aws ec2 create-security-group \
--group-name MyEfsSecurityGroup \
--description "My EFS security group" \
--vpc-id $vpc_id \
--output text)
```

b. Create an inbound rule that allows inbound NFS traffic from the CIDR for your cluster's VPC.
```
aws ec2 authorize-security-group-ingress \
--group-id $security_group_id \
--protocol tcp \
--port 2049 \
--cidr $cidr_range
```

4. Create an Amazon EFS file system for your Amazon EKS cluster.
```
file_system_id=$(aws efs create-file-system \
--region $CLUSTER_REGION \
--performance-mode generalPurpose \
--query 'FileSystemId' \
--output text)
```

## 5.0 Create Mount Targets for your cluster
1. [Optional] If you are re-using an existing EFS Volume, you will first have to delete any old mount targets. You can use the following commands for this -
```
aws efs describe-mount-targets --file-system-id $file_system_id
aws efs delete-mount-target --mount-target-id <each-id>
```

2. Determine the IDs of the subnets in your VPC and which Availability Zone the subnet is in.
```
aws ec2 describe-subnets \
--filters "Name=vpc-id,Values=$vpc_id" \
--query 'Subnets[*].{SubnetId: SubnetId,AvailabilityZone: AvailabilityZone,CidrBlock: CidrBlock}' \
--output table
```

4. Add mount targets for the subnets that your nodes are in. If are more than 1 nodes in the cluster, you'd run the command once for a subnet in each AZ that you had a node in, replacing subnet-EXAMPLEe2ba886490 with the appropriate subnet ID from the previous command
```
aws efs create-mount-target \
--file-system-id $file_system_id \
--security-groups $security_group_id \
--subnet-id <subnet-EXAMPLEe2ba886490>
```
Please refer to the official [AWS EFS CSI Document](https://docs.aws.amazon.com/eks/latest/userguide/efs-csi.html#efs-create-filesystem) for detailed instructions on creating an EFS filesystem.


## 6.0 Using EFS Storage in Kubeflow
## 5.0 Using EFS Storage in Kubeflow

## 6.1 Provisioning Options
## 5.1 Provisioning Options
### Option 1 - Static Provisioning
[Using this sample from official AWS Docs](https://github.com/kubernetes-sigs/aws-efs-csi-driver/tree/master/examples/kubernetes/multiple_pods) we have provided the required spec files in the sample subdirectory but you can create the PVC another way.

Expand All @@ -157,9 +78,9 @@ aws efs create-mount-target \
aws efs describe-file-systems --query "FileSystems[*].FileSystemId" --output text
```

2. Now edit the sample/pv.yaml and edit the `volumeHandle` to point to your EFS filesystem
2. Now edit the last line of the sample/pv.yaml file to specify the `volumeHandle` field to point to your EFS filesystem.

3. The `PersistentVolume` and `StorageClass` are cluster scoped resources but the `PersistentVolumeClaim` needs to be in the namespace you will be accessing it from. Be sure to replace the `kubeflow-user-example-com` namespace specified in the `sample/pvc.yaml` file if you are using a different one.
3. The `PersistentVolume` and `StorageClass` are cluster scoped resources but the `PersistentVolumeClaim` needs to be in the namespace you will be accessing it from. Be sure to replace the `kubeflow-user-example-com` namespace specified in the `sample/pvc.yaml` file with the namespace for the kubeflow user.

4. Now create the required persistentvolume, persistentvolumeclaim and storageclass resources as -
```
Expand All @@ -168,11 +89,11 @@ kubectl apply -f examples/aws/storage-efs/sample/pvc.yaml
kubectl apply -f examples/aws/storage-efs/sample/sc.yaml
```

## 6.2 Check your Setup
## 5.2 Check your Setup
Port Forward as needed and Login to the Kubeflow dashboard.
Now, Check the `Volumes` tab in Kubeflow and you should be able to see your PVC is available for use within Kubeflow. In the following two sections we will be using this PVC to create a notebook server with Amazon EFS mounted as the workspace volume, download training data into this filesystem and then deploy a TFJob to train a model using this data.

## 6.3 Using EFS volume as workspace or data volume for a notebook server
## 5.3 Using EFS volume as workspace or data volume for a notebook server

Spin up a new Kubeflow notebook server and specify the name of the PVC to be used as the workspace volume or the data volume and specify your desired mount point. For our example here, we are using the `AWS Optimized Tensorflow 2.6 CPU image` provided in the notebook configuration options. Additionally, use the existing `efs-claim` volume as the workspace volume at the default mount point `/home/jovyan`. The server might take a few minutes to come up.

Expand All @@ -187,7 +108,7 @@ kubectl apply -f examples/aws/storage-efs/sample/set-permission-job.yaml
```
If you use EFS for other purposes (e.g. sharing data across pipelines), you don’t need this step.

## 6.4 Using EFS volume for a TrainingJob using TFJob Operator
## 5.4 Using EFS volume for a TrainingJob using TFJob Operator
The following section re-uses the PVC and the Tensorflow Kubeflow Notebook created in the previous steps to download a dataset to the EFS Volume. Then we spin up a TFjob which runs a image classification job using the data from the shared volume.
Source: https://www.tensorflow.org/tutorials/load_data/images

Expand Down

0 comments on commit f7d8ba9

Please sign in to comment.