Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AWS SSO] Use service account to access EC2 and S3 on head and worker nodes #1489

Merged
merged 55 commits into from
Dec 28, 2022

Conversation

Michaelvll
Copy link
Collaborator

@Michaelvll Michaelvll commented Dec 5, 2022

This partially fixes #1488

For aws, we now set the worker nodes to have the same IAM role as the head node, so that they can correctly access s3 buckets without the need to upload the user's credential files (TBD: using the same role will allow the worker to access EC2 as well, which may not be necessary, and we can fix it by creating a new s3-access role for that, such as skypilot-service-worker-v1).

Having the IAM:PassRole for the IAM role is required, as the head node has to pass the role skypilot-service-v1 to the worker node.

It also adds mypy check for sky/clouds/*.py

TODO:

  • Backward compatibility test, as the service account name has been changed.
  • Without uploading the credential files to the VM, the spot controller launched on other clouds will not be able to create instance on AWS. [We added back credential uploads, as the VMs on other clouds needs that to access AWS]

For an SSO account, we will show the following hints:
image

Tested:

  • sky launch -c multi-node -t t3.micro ./examples/using_file_mounts.yaml with a private s3 bucket.
  • sky launch -c multi-node-sso -t t3.micro ./examples/using_file_mounts.yaml with a private s3 bucket, using an account set up with SSO and AdministratorAccess
  • tests/run_smoke_tests.sh
  • tests/backward_comaptibility_tests.sh
  • sky launch -t t2.micro -i 0; sky status -r correctly autostopped with SSO account.
  • According to our user, the current PR works with their SSO account.

@Michaelvll Michaelvll marked this pull request as ready for review December 5, 2022 22:39
@Michaelvll Michaelvll mentioned this pull request Dec 7, 2022
@Michaelvll Michaelvll changed the title [AWS] Use service account to access EC2 and S3 on head and worker nodes [AWS SSO] Use service account to access EC2 and S3 on head and worker nodes Dec 11, 2022
@Michaelvll
Copy link
Collaborator Author

Michaelvll commented Dec 12, 2022

Important problem: Seems we have a bug with the tests/backward_comaptibility_tests.sh, that we did not capture the following backward compatibility issue:

git checkout master
sky launch -c min
git checkout ray-autoscaler-role-for-all
sky launch -c min # It stops the original min cluster and launched a new one.

Update: Fixed the compatibility issue by having a new version of the AWSNodeProvider instead of overwriting the original one, and tested with the tests/backward_compatitiblity.sh (added instruction to that test and double check the test works.)

@Michaelvll
Copy link
Collaborator Author

Michaelvll commented Dec 20, 2022

Tested (24e22c9):

  • AWS_PROFILE=admin-user sky launch -c sso -t t3.micro -i 1
  • AWS_PROFILE=admin-user sky launch -c sso-down -t t3.micro -i 1 --down
  • sky status -r
  • AWS_PROFILE=admin-user sky status -r
  • AWS_PROFILE=admin-user skt down sso
  • sky launch -c multi-node -t t3.xlarge -i 1 ./examples/using_file_mounts.yaml with a private s3 bucket.
  • AWS_PROFILE=admin-user sky launch -c multi-node-sso -t t3.xlarge ./examples/using_file_mounts.yaml with a private s3 bucket, using an account set up with SSO and AdministratorAccess
  • AWS_PROFILE=admin-user sky start multi-node-sso

Copy link
Member

@concretevitamin concretevitamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great @Michaelvll; did a pass.

sky/backends/backend_utils.py Outdated Show resolved Hide resolved
sky/clouds/azure.py Outdated Show resolved Hide resolved
sky/clouds/cloud.py Show resolved Hide resolved
sky/skylet/providers/aws/node_provider.py Outdated Show resolved Hide resolved
sky/skylet/providers/aws/node_provider.py Outdated Show resolved Hide resolved
sky/clouds/aws.py Outdated Show resolved Hide resolved
# TODO(zhwu): after we publish sky to PyPI,
# change this to `pip install sky[aws]`
'\n $ pip install .[aws]'
'\n Credentials may also need to be set.' + help_str)
f'\n{prefix} $ pip install .[aws]'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should make PyPI users see the correct message, since users installing from master can be assumed as more "hands-on"/advanced, so they should be able to change the command accordingly?

sky/clouds/aws.py Outdated Show resolved Hide resolved
@@ -368,9 +365,28 @@ def check_credentials(self) -> Tuple[bool, Optional[str]]:
except exceptions.CloudUserIdentityError as e:
return False, str(e)

hints = None
# `aws configure list` does not guarantee this file exists.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems removable?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the credential file checking back for multiple clouds case.

sky/clouds/aws.py Outdated Show resolved Hide resolved
@Michaelvll
Copy link
Collaborator Author

Michaelvll commented Dec 27, 2022

Tested (75b3424):

  • AWS_PROFILE=admin-user sky launch -c sso -t t3.micro -i 1
  • AWS_PROFILE=admin-user sky launch -c sso-down -t t3.micro -i 1 --down
  • sky status -r
  • AWS_PROFILE=admin-user sky status -r
  • AWS_PROFILE=admin-user sky down sso
  • sky launch -c multi-node -t t3.xlarge -i 1 ./examples/using_file_mounts.yaml with a private s3 bucket.
  • AWS_PROFILE=admin-user sky launch -c multi-node-sso -t t3.xlarge ./examples/using_file_mounts.yaml with a private s3 bucket, using an account set up with SSO and AdministratorAccess
  • AWS_PROFILE=admin-user sky start multi-node-sso
  • sky launch -c azure-test --cloud azure echo hi
  • tests/run_smoke_tests.sh
  • AWS_PROFILE=admin-user sky launch -c test-failover echo hi when the identity is expired. It should failover to other clouds.
  • Delete ray/skypilot service account, and AWS_PROFILE=admin-user sky launch -c sso -t t3.micro -i 1
  • Delete ray/skypilot service account, and sky launch -c sso -t t3.micro -i 1
  • tests/backward_compatibility_tests.sh

Copy link
Member

@concretevitamin concretevitamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @Michaelvll!

sky/skylet/providers/aws/config.py Outdated Show resolved Hide resolved
sky/skylet/providers/aws/config.py Outdated Show resolved Hide resolved
sky/clouds/aws.py Outdated Show resolved Hide resolved
@Michaelvll Michaelvll merged commit 9709e71 into master Dec 28, 2022
@Michaelvll Michaelvll deleted the ray-autoscaler-role-for-all branch December 28, 2022 04:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[IAM] Utilize ray-autoscaler-v1 IAM role on the remote VM
3 participants