Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Support AWS multitenancy #2253

Merged
merged 3 commits into from
Apr 15, 2021

Conversation

sedefsavas
Copy link
Contributor

@sedefsavas sedefsavas commented Feb 16, 2021

What this PR does / why we need it:
This PR builds on top of #1919.

  1. Implements the new additions to the multitenancy proposal: 📖 Multitenancy: Restrict controller credential usage by creating a permission resource #2237
  2. Fixes nested role assumption logic.
  3. Adds e2e test

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #1552

Special notes for your reviewer:

Checklist:

  • squashed commits
  • includes documentation
  • adds unit tests
  • release note
  • adds or updates e2e tests

Release note:

Adds initial cluster-scoped resources for supporting multi-tenancy. AWSClusterRolePrincipal and AWSClusterStaticRolePrincipals allow AWS admins to create cluster-scoped IAM role / credential CRDs and delegate access for these to specific namespaces. See documentation for further information.

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 16, 2021
@sedefsavas sedefsavas force-pushed the multitenancy-cont branch 4 times, most recently from 935415f to 2d495dc Compare February 16, 2021 15:55
@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Feb 16, 2021
@sedefsavas sedefsavas changed the title WIP: ✨ Support AWS multitenancy ✨ Support AWS multitenancy Feb 18, 2021
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 18, 2021
@sedefsavas sedefsavas force-pushed the multitenancy-cont branch 3 times, most recently from 4042c67 to e093684 Compare February 19, 2021 22:16
@randomvariable
Copy link
Member

/kind feature
/kind api-change

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Feb 23, 2021
@sedefsavas sedefsavas force-pushed the multitenancy-cont branch 2 times, most recently from d9e06b4 to ff50e78 Compare February 23, 2021 20:15
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 23, 2021
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Feb 24, 2021
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 11, 2021
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 12, 2021
@sedefsavas
Copy link
Contributor Author

/test pull-cluster-api-provider-aws-e2e

@sedefsavas
Copy link
Contributor Author

One test timed out but passing locally, rerunning.

/test pull-cluster-api-provider-aws-e2e

@invidian
Copy link
Member

One test timed out but passing locally, rerunning.

Hopefully they are stable enough that once merged, it won't cause all other PRs to fail all the time, because this is super annoying and also a waste of resources 😞

@sedefsavas
Copy link
Contributor Author

Hopefully they are stable enough that once merged, it won't cause all other PRs to fail all the time, because this is super annoying and also a waste of resources 😞

e2e tests are not run on PRs.
For the specific test that failed: it was a single control plane test and timeout for it was 15 min, I don't think that failure was related to the test, so at this point no need to increase the timeout.

@sedefsavas
Copy link
Contributor Author

/test pull-cluster-api-provider-aws-e2e

@randomvariable
Copy link
Member

randomvariable commented Apr 14, 2021

It's worth going through the logs in the artifacts. For the assume role cluster, I see the following

E0414 02:14:27.427666       1 controller.go:257] controller-runtime/controller "msg"="Reconciler error" "error"="failed to create scope: failed to create aws session: Failed to retrieve identity credentials: AccessDenied: User: arn:aws:iam::583069479333:user/bootstrapper.cluster-api-provider-aws.sigs.k8s.io is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::583069479333:role/multi-tenancy-role-cluster-zy9du7\n\tstatus code: 403, request id: bfdf6127-b503-4df4-877d-2fa87994dcdf\nsigs.k8s.io/cluster-api-provider-aws/pkg/cloud/scope.NewClusterScope\n\t/workspace/pkg/cloud/scope/cluster.go:71\nsigs.k8s.io/cluster-api-provider-aws/controllers.(*AWSClusterReconciler).Reconcile\n\t/workspace/controllers/awscluster_controller.go:121\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.5.14/pkg/internal/controller/controller.go:255\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.5.14/pkg/internal/controller/controller.go:231\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.5.14/pkg/internal/controller/controller.go:210\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.17.9/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.17.9/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/pkg/mod/k8s.io/apimachinery@v0.17.9/pkg/util/wait/wait.go:88\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1357" "controller"="awscluster" "name"="cluster-zy9du7" "namespace"="functional-tests-98mdar"
E0414 02:14:27.482871       1 controller.go:257] controller-runtime/controller "msg"="Reconciler error" "error"="AWS.SimpleQueueService.NonExistentQueue: The specified queue does not exist for this wsdl version.\n\tstatus code: 400, request id: 684521c4-7905-5548-87e6-79342b46754d" "controller"="awscluster" "name"="cluster-zy9du7" "namespace"="functional-tests-98mdar"
I0414 02:14:27.508194       1 awscontrolleridentity_controller.go:93] controllers/AWSControllerIdentity "msg"="Cluster does not use AWSClusterControllerIdentity as identityRef, skipping new instance creation" "AWSClusterControllerIdentity"="cluster-zy9du7" "cluster"="cluster-zy9du7" "namespace"={"Namespace":"functional-tests-98mdar","Name":"cluster-zy9du7"} 
E0414 02:14:27.523789       1 controller.go:257] controller-runtime/controller "msg"="Reconciler error" "error"="failed to create scope: failed to create aws session: Failed to retrieve identity credentials: AccessDenied: User: arn:aws:iam::583069479333:user/bootstrapper.cluster-api-provider-aws.sigs.k8s.io is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::583069479333:role/multi-tenancy-role-cluster-zy9du7\n\tstatus code: 403, request id: fbd0dc53-b5e4-4830-a7ba-2a154521bc63\nsigs.k8s.io/cluster-api-provider-aws/pkg/cloud/scope.NewClusterScope\n\t/workspace/pkg/cloud/scope/cluster.go:71\nsigs.k8s.io/cluster-api-provider-aws/controllers.(*AWSClusterReconciler).Reconcile\n\t/workspace/controllers/awscluster_controller.go:121\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.5.14/pkg/internal/controller/controller.go:255\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.5.14/pkg/internal/controller/controller.go:231\nsigs.k8s.io/controller-runtime/pkg/internal/controller.

which matches the test

INFO: Creating the workload cluster with name "cluster-zy9du7" using the "multitenancy" template (Kubernetes v1.20.4, 824659049816 control-plane machines, 824659049840 worker machines)
INFO: Getting the cluster template yaml
INFO: clusterctl config cluster cluster-zy9du7 --infrastructure (default) --kubernetes-version v1.20.4 --control-plane-machine-count 1 --worker-machine-count 0 --flavor multitenancy
INFO: Applying the cluster template yaml to the cluster
cluster.cluster.x-k8s.io/cluster-zy9du7 created

@randomvariable
Copy link
Member

I'm going to remove "should fail creating cluster if assumer is not in trusted entity policy" test. All we're doing here is checking AWS is working, don't think it makes sense as an e2e test. It'd be worth mocking as a controller test in a follow up PR.

@randomvariable
Copy link
Member

I don't see any allowing of the bootstrapuser to call sts:AssumeRole that is being used in the tests. I'll have a go at refactoring.

@randomvariable
Copy link
Member

I think the main problem here is that env vars are being mutated on the fly. It's safer to duplicate the template and use unique env vars.

@randomvariable
Copy link
Member

Have this refactored locally to use CloudFormation to create all the roles and deleted the sleeps. Running locally before pushing.

Implements the multi-tenancy proposal in
https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/master/docs/proposal/20200506-single-controller-multitenancy.md
and adds the following cluster scoped resources:

AWSClusterStaticIdentity - Static crendentials using a Access Key ID and Secret Key
AWSClusterControllerIdentity - A singleton resource that states a cluster can use inherited crendentials
AWSClusterRoleIdentity - An IAM role definition

These cluster scoped resources support namespace selectors in line with the Services v1 API.

For migration of current cluster resources to this model, there is an experimental controller
`AutoControllerIdentityCreator` that will apply AWSClusterControllerIdentity to all existing resources.
This will be enabled by default until v1alpha4.

Co-authored-by: Andrew Myhre <andrew.myhre@gmail.com>
Co-authored-by: Sedef Savas <ssavas@vmware.com>
Co-authored-by: Naadir Jeewa <jeewan@vmware.com>
@randomvariable
Copy link
Member

Pushed up the changes. Now creating all of the roles at once during CloudFormation instantiation and setting the env vars at the beginning of each Ginkgo node (i.e. process).

There's a few other places we should remove setting env vars in each test, in the EKS suite for example, but should do as follow up.

/test pull-cluster-api-provider-aws-e2e

@sedefsavas
Copy link
Contributor Author

I'm going to remove "should fail creating cluster if assumer is not in trusted entity policy" test. All we're doing here is checking AWS is working, don't think it makes sense as an e2e test. It'd be worth mocking as a controller test in a follow up PR.

We have this test with mocking in identity_test.go.

@njuettner
Copy link
Member

Just a question from my side, will this will merged in v1alpha3 or v1alpha4?

@sedefsavas
Copy link
Contributor Author

@njuettner This is going to be in the upcoming v0.6.5 release (v1alpha3).
We will do the release by this Friday.

@randomvariable
Copy link
Member

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 15, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: randomvariable

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 15, 2021
@k8s-ci-robot k8s-ci-robot merged commit 2bf3644 into kubernetes-sigs:master Apr 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AWS account multitenancy using a single instance of the controller
8 participants