Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storage Options: Using Amazon EFS as Persistent Volume with Kubeflow 1.3 #32

Merged
merged 7 commits into from
Dec 14, 2021

Conversation

mbaijal
Copy link
Contributor

@mbaijal mbaijal commented Nov 30, 2021

Which issue is resolved by this Pull Request:
Adds the option to use Amazon EFS as Storage with Kubeflow.

Description of your changes:
This is an initial PR with the following changes -

  1. Points to the latest stable version of the Amazon EFS CSI Driver in the upstream repo - https://github.com/kubernetes-sigs/aws-efs-csi-driver/tree/release-1.3/deploy/kubernetes
  2. Adds the README with complete steps to create the required IAM Policy, create an instance of the EFS Volume and mount targets, install the driver using the manifest.
  3. The examples/aws/storage-efs/sample directory includes 4 spec files to demonstrate creating a persistentVolume, PersistentVolumeClaim, StorageClass and the README has instructions on how to use the same for static provisioning and usage via the Kubeflow Notebooks.
  4. Also Includes a file to edit the directory permissions as required to be able to use it via Notebooks.
  5. Adds a sample TFjob which uses data from the mounted EFS Volume

Testing Done:

  1. Tested I can use as a workspace volume in a kubeflow notebook across multiple clusters.
  2. Tested that I can run a TFJob using data already downloaded to the EFS Volume using the previous notebook.
  3. Both samples provided in the README

TBD:

  1. Add Steps for dynamic provisioning
  2. Unit Tests

Checklist:

  • Unit tests pass:
    Make sure you have installed kustomize == 3.2.1
    1. make generate-changed-only
    2. make test

Copy link
Member

@goswamig goswamig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work, have few comments.

I also noticed that perviously the entire driver deploy section was getting copied here.
Is there a way we can avoid this ?

@surajkota
Copy link
Contributor

#26

@mbaijal mbaijal changed the title Storage Options: Using Amazon EFS as Persistent Volume with Kubeflow Storage Options: Using Amazon EFS as Persistent Volume with Kubeflow 1.3 Dec 1, 2021
@mbaijal
Copy link
Contributor Author

mbaijal commented Dec 1, 2021

I also noticed that perviously the entire driver deploy section was getting copied here.
Is there a way we can avoid this ?

Not Sure I follow this comment, could you elaborate a little.

Copy link
Member

@goswamig goswamig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you also test the dynamic provision?

Copy link
Contributor

@surajkota surajkota left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apologies for the number of comments. I was watching a reinvent session and took sweet time to review this

examples/aws/storage-efs/README.md Outdated Show resolved Hide resolved
examples/aws/storage-efs/README.md Outdated Show resolved Hide resolved
examples/aws/storage-efs/README.md Outdated Show resolved Hide resolved
examples/aws/storage-efs/README.md Outdated Show resolved Hide resolved
examples/aws/storage-efs/README.md Outdated Show resolved Hide resolved
examples/aws/storage-efs/sample/pvc.yaml Outdated Show resolved Hide resolved
examples/aws/storage-efs/README.md Outdated Show resolved Hide resolved
examples/aws/storage-efs/README.md Outdated Show resolved Hide resolved
Copy link
Contributor

@surajkota surajkota left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks better now, few comments

examples/aws/storage-efs/README.md Outdated Show resolved Hide resolved
examples/aws/storage-efs/README.md Outdated Show resolved Hide resolved
examples/aws/storage-efs/README.md Outdated Show resolved Hide resolved
examples/aws/storage-efs/README.md Outdated Show resolved Hide resolved
examples/aws/storage-efs/README.md Outdated Show resolved Hide resolved
examples/aws/storage-efs/README.md Outdated Show resolved Hide resolved
examples/aws/storage-efs/README.md Outdated Show resolved Hide resolved
examples/aws/storage-efs/README.md Outdated Show resolved Hide resolved
examples/aws/storage-efs/README.md Outdated Show resolved Hide resolved
examples/aws/storage-efs/README.md Outdated Show resolved Hide resolved

1. Use the `$file_system_id` you recorded before or use the following command to get the efs filesystem id -
```
aws efs describe-file-systems --query "FileSystems[*].FileSystemId" --output text
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assign to variable so it can be used in next command.
also, please add --region in the command

recorded before? as in previous step?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually those steps have been moved out of the README, let me reword this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This results in multiple ids and there is no direct way to filter these. This step will have to be slightly manual if the specified $file_system_id variable if not already populated from section 4.

### 2. Build and Push the Docker image
In the `training-sample` directory, we have provided a sample training script and Dockerfile which you can use as follows to build a docker image-
```
export dockerImage=image-classification:no-data
Copy link
Contributor

@surajkota surajkota Dec 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: export vars are caps in other readmes

also, callout to replace the docker image uri?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean callout to use their own image if they wanted to ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I mean they will have to change the URI if they have to push to their repo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I actually removed the hardcoded defaults altogether, they didnt add any value.

## 5.0 Using EFS Storage in Kubeflow

## 5.1 Provisioning Options
### Option 1 - Static Provisioning
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the option 2 for dynamic provisioning ?

Copy link
Contributor Author

@mbaijal mbaijal Dec 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in standup (and somewhere on this PR) I was planning to add that in a separate PR since this one has been long pending and that needs additional testing.

Comment on lines 90 to 92
kubectl apply -f storage-efs/sample/pv.yaml
kubectl apply -f storage-efs/sample/pvc.yaml
kubectl apply -f storage-efs/sample/sc.yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This order does not make sense to me.
you are referring referring storageClassName: efs-sc in pv and pvc files.

I think the right order is sc, pv and pvc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this before on this PR -
#32 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It still does not make sense that you are applying something in future but referencing it in present.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

kubectl apply -f storage-efs/sample/sc.yaml
```

## 5.2 Check your Setup
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I missed that but you don't need to see dashboard to verify the volumes are up.

I think command on get pvc should tell you the status of pvc whether it has been bounded or not

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either option works and we are anyway logging onto the dashboard to create a notebook, any reason to prefer one over the other ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess its fine to check the dashboard too..but like how you're verifying the csi driver, you can verify the pvc is bounded or not. This maintains the consistency in your instructions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding both options.

Copy link
Member

@goswamig goswamig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take a closer look, some of the comments were made before as well.

Copy link
Member

@goswamig goswamig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for taking care of all the comments.

@mbaijal mbaijal merged commit a0893bd into awslabs:v1.3-branch Dec 14, 2021
AlexandreBrown pushed a commit to AlexandreBrown/kubeflow-manifests that referenced this pull request Jan 21, 2022
…1.3 (awslabs#32)

* efs-kubeflow-branch-13

* Add TrainingJob Sample

* Refactor the EFS CSI Driver installation

* Address Review Comments

* Directory restructure

* Minor comments addressed

* More comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants