Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

setup separate gcs buckets for different sets of terraform resources #1952

Merged
merged 1 commit into from
May 5, 2021

Conversation

spiffxp
Copy link
Member

@spiffxp spiffxp commented Apr 20, 2021

Allow groups less privileged than k8s-infra-gcp-org-admins to use terraform to manage resources over which they have ownership.

Terraform state can include potentially include sensitive values. Since we have terraform setup to store state in GCS, we need to ensure visibility and access to state matches ownership of (privileges to modify) the resources it describes.

We're using uniform bucket-level access on our GCS buckets to avoid the complexity introduced by per-object ACLs. This means if we want different groups with different privilege levels using terraform to manage different sets of resources, we need to provision a GCS bucket for each group.

The new bucket schema is k8s-infra-tf-{folder}[-{suffix}] where:

  • {folder} is the intended GCP folder for GCP projects managed by this group, access level should be ~owners of folder
  • {suffix} is subset of resources contained somewhere underneath folder, access level should ~editors of those resources
  • k8s-infra-tf-: is 14 chars, leaving 49 before max length for bucket name

So for example I was tempted to create gs://k8s-infra-tf-public and move aaa cluster state there. But we may want to use terraform to manage a whole bunch of other resources beside clusters in kubernetes-public, and we might not want to grant cluster-admins the privileges to acccidentally delete everything in kubernetes-public (e.g. DNS)

The GCP folders don't actually exist yet, but the plan is:

  • public: kubernetes-public (potentially release related projects too)
  • prow: prow-build clusters and e2e projects
  • aws: if there are gcp projects being used to manage aws resources
  • sandbox: temporary projects

The buckets being added are:

  • k8s-infra-tf-aws: to manage aws resources
  • k8s-infra-tf-prow-clusters: to manage prow-build, prow-build-trusted
  • k8s-infra-tf-public-clusters: to manage aaa
  • k8s-infra-tf-sandbox-ii: for the ii team to manage things in sandbox

Organization admins are given storage.admin privileges to all k8s-infra-tf- buckets for break-glass purposes.

Terraform modules were migrated by running terraform init and terraform refresh for each module

/cc @ameukam @thockin

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. wg/k8s-infra approved Indicates a PR has been approved by an approver from all required OWNERS files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 20, 2021
@spiffxp spiffxp force-pushed the tf-buckets branch 2 times, most recently from b967a22 to 4523de0 Compare April 24, 2021 20:20
@spiffxp spiffxp force-pushed the tf-buckets branch 2 times, most recently from db38ea4 to 22538e3 Compare May 4, 2021 14:15
# one-off script to migrate terraforms state files to the appropriate buckets
set -x

function tfstate_cp() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function has a similar logic in Terraform. Once you change the bucket in the provider field, TF will copy the state to this new destination. An example: https://www.terraform.io/docs/cloud/migrate/index.html#step-6-run-terraform-init-to-migrate-the-workspace.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove this script from the PR and update terraform then, once a few in-flight PRs have landed

Still using this as a test run to ensure we org admins can view the different buckets. Still having problems copying aws resources over atm due to use of hmac keys

@spiffxp
Copy link
Member Author

spiffxp commented May 5, 2021

In running terraform apply to ensure bucket changes took, I discovered/applied some undeployed changes from #1737 for k8s-infra-prow-build/prow-build

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  - destroy
-/+ destroy and then create replacement

Terraform will perform the following actions:

  # module.project.google_project_iam_custom_role.service_account_lister will be destroyed
  - resource "google_project_iam_custom_role" "service_account_lister" {
      - deleted     = false -> null
      - description = "Can list ServiceAccounts." -> null
      - id          = "projects/k8s-infra-prow-build/roles/ServiceAccountLister" -> null
      - name        = "projects/k8s-infra-prow-build/roles/ServiceAccountLister" -> null
      - permissions = [
          - "iam.serviceAccounts.list",
        ] -> null
      - project     = "k8s-infra-prow-build" -> null
      - role_id     = "ServiceAccountLister" -> null
      - stage       = "GA" -> null
      - title       = "Service Account Lister" -> null
    }

  # module.project.google_project_iam_member.cluster_admins_as_service_account_lister must be replaced
-/+ resource "google_project_iam_member" "cluster_admins_as_service_account_lister" {
      ~ etag    = "BwWsUJQxxEw=" -> (known after apply)
      ~ id      = "k8s-infra-prow-build/projects/k8s-infra-prow-build/roles/ServiceAccountLister/group:k8s-infra-cluster-admins@kubernetes.io" -> (known after apply)
        member  = "group:k8s-infra-cluster-admins@kubernetes.io"
        project = "k8s-infra-prow-build"
      ~ role    = "projects/k8s-infra-prow-build/roles/ServiceAccountLister" -> "organizations/758905017065/roles/iam.serviceAccountLister" # forces replacement
    }

Plan: 1 to add, 0 to change, 2 to destroy.

@spiffxp
Copy link
Member Author

spiffxp commented May 5, 2021

Bumped into this while trying to terraform plan for k8s-infra-prow-build-trusted/prow-build-trusted

Error: Provider configuration not present

To work with
module.project.google_project_iam_custom_role.service_account_lister its
original provider configuration at provider["registry.terraform.io/-/google"]
is required, but it has been removed. This occurs when a provider
configuration is removed while objects created by that provider still exist in
the state. Re-add the provider configuration to destroy
module.project.google_project_iam_custom_role.service_account_lister, after
which you can remove the provider configuration again.

Modified infra/gcp/clusters/modules/gke-project/versions.tf to comment out the original sources

required_providers {
    google = {
      // source  = "hashicorp/google"
      source = "-/google"
      version = "~> 3.46.0"
    }
    google-beta = {
      // source  = "hashicorp/google-beta"
      source = "-/google-beta"
      version = "~> 3.46.0"
    }
  }

Removed infra/gcp/clusters/k8s-infra-prow-build-trusted/prow-build-trusted/.terraform, re-ran terraform init and terraform plan. No errors this time, so proceeded to terraform apply, deploying same changes from #1737 that were deployed for prow-build above.

Then, went back and undid the source change, and re-did the above dance of removing .terraform, terraform init. The -/google provider returned, as evidenced by terraform providers:

Providers required by configuration:
.
├── provider[registry.terraform.io/hashicorp/google] ~> 3.46.0
├── provider[registry.terraform.io/hashicorp/google-beta] ~> 3.46.0
├── module.project
│   ├── provider[registry.terraform.io/hashicorp/google] ~> 3.46.0
│   └── provider[registry.terraform.io/hashicorp/google-beta] ~> 3.46.0
├── module.prow_build_cluster
│   ├── provider[registry.terraform.io/hashicorp/google] ~> 3.46.0
│   └── provider[registry.terraform.io/hashicorp/google-beta] ~> 3.46.0
└── module.prow_build_nodepool
    ├── provider[registry.terraform.io/hashicorp/google] ~> 3.46.0
    └── provider[registry.terraform.io/hashicorp/google-beta] ~> 3.46.0

Providers required by state:

    provider[registry.terraform.io/hashicorp/google]

    provider[registry.terraform.io/hashicorp/google-beta]

    provider[registry.terraform.io/-/google]

Took a guess and ran terraform refresh to refresh the state file. Did the same dance with terraform init, and no longer see -/google as a provider required by state.

@spiffxp
Copy link
Member Author

spiffxp commented May 5, 2021

Bumped into #2000 while trying to terraform init for kubernetes-public

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create
  ~ update in-place

Terraform will perform the following actions:

  # google_project_iam_binding.readonlymonitoringbinding will be updated in-place
  ~ resource "google_project_iam_binding" "readonlymonitoringbinding" {
        etag    = "BwXAZra+ioU="
        id      = "kubernetes-public/roles/monitoring.viewer"
      ~ members = [
            "group:gke-security-groups@kubernetes.io",
          - "serviceAccount:k8s-infra-monitoring-viewer@kubernetes-public.iam.gserviceaccount.com",
        ]
        project = "kubernetes-public"
        role    = "roles/monitoring.viewer"
    }

  # google_project_iam_member.cluster_node_sa_monitoring_viewer will be created
  + resource "google_project_iam_member" "cluster_node_sa_monitoring_viewer" {
      + etag    = (known after apply)
      + id      = (known after apply)
      + member  = "serviceAccount:gke-nodes-aaa@kubernetes-public.iam.gserviceaccount.com"
      + project = "kubernetes-public"
      + role    = "roles/monitoring.viewer"
    }

Plan: 1 to add, 1 to change, 0 to destroy

Went ahead and ran terraform apply to deploy these changes, and bumped into

Error: Provider produced inconsistent result after apply

When applying changes to
google_project_iam_member.cluster_node_sa_monitoring_viewer, provider
"registry.terraform.io/hashicorp/google" produced an unexpected new value:
Root resource was present, but now absent.

This is a bug in the provider, which should be reported in the provider's own
issue tracker.

@ameukam
Copy link
Member

ameukam commented May 5, 2021

@spiffxp It's possible we may no longer need this resource. GKE Monitoring changed a lot since aaa creation. Also a bump of the terraform provider could fix this. I plan to do it this week.

Allow groups less privileged than k8s-infra-gcp-org-admins to use
terraform to manage resources over which they have ownership.

Terraform state can include potentially include sensitive values.
Since we have terraform setup to store state in GCS, we need to ensure
visibility and access to state matches ownership of (privileges to modify)
the resources it describes.

We're using uniform bucket-level access on our GCS buckets to avoid the
complexity introduced by per-object ACLs. This means if we want different
groups with different privilege levels using terraform to manage different
sets of resources, we need to provision a GCS bucket for each group.

The new bucket schema is "k8s-infra-tf-{folder}[-{suffix}]" where:
- {folder} is the intended GCP folder for GCP projects managed by this
  group, access level should be ~owners of folder
- {suffix} is subset of resources contained somewhere underneath folder,
  access level should  ~editors of those resources

The GCP folders don't actually exist yet, but the plan is:
- public: kubernetes-public (potentially release related projects too)
- prow: prow-build clusters and e2e projects
- aws: if there are gcp projects being used to manage aws resources
- sandbox: temporary projects

The buckets being added are:
- k8s-infra-tf-aws: to manage aws resources
- k8s-infra-tf-prow-clusters: to manage prow-build, prow-build-trusted
- k8s-infra-tf-public-clusters: to manage aaa
- k8s-infra-tf-sandbox-ii: for the ii team to manage things in sandbox

Organization admins are given storage.admin privileges to all buckets
for break-glass purposes.

Terraform modules were migrated by running `terraform init` and
`terraform refresh` for each module
@k8s-ci-robot k8s-ci-robot added area/prow Setting up or working with prow in general, prow.k8s.io, prow build clusters sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels May 5, 2021
@spiffxp spiffxp changed the title [wip] setup separate gcs buckets for different sets of terraform resources setup separate gcs buckets for different sets of terraform resources May 5, 2021
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 5, 2021
@dims
Copy link
Member

dims commented May 5, 2021

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 5, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dims, spiffxp

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit a801699 into kubernetes:main May 5, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.22 milestone May 5, 2021
@spiffxp spiffxp deleted the tf-buckets branch May 5, 2021 16:41
@spiffxp
Copy link
Member Author

spiffxp commented Jun 11, 2021

Related to refactoring infra/gcp, ref: #516

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/prow Setting up or working with prow in general, prow.k8s.io, prow build clusters cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants