Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

infra/gcp: clean up main project #1974

Merged
merged 9 commits into from
May 5, 2021

Conversation

spiffxp
Copy link
Member

@spiffxp spiffxp commented Apr 26, 2021

Briefly:

  • remove projectEditor:kubernetes-public bindings from k8s-infra-tf-* buckets
    • this required redoing ensure_removed_gcs_role_binding to use gsutil iam set instead of gsutil iam ch
  • fix color output in logs
  • remove service accounts from groups and bind them directly to their roles, since we discovered this didn't work for the prow serviceaccounts
  • refactor ensure-main-project.sh and ensure-organization.sh into functions for better testing and separation of code/data (it is not a far leap from bash arrays to yaml specs)
    • added new empower_gke_for_serviceaccount functions to lib
    • handled a dependency cycle between ensure-main-project and ensure-organization for the k8s-infra-gcp-auditor serviceaccount
  • prune services specified for kubernetes-public down to what we directly use, and comment why we use them
    • we use container which uses compute which uses oslogin; drop the latter two since we only need to specify the former
    • bigquery-json is not a service, use bigquery instead
  • fix ensure_serviceaccount_role_binding either never working or hitting a recent API change

See individual commits for details

Fixes #2000

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 26, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: spiffxp

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added wg/k8s-infra approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Apr 26, 2021
@spiffxp spiffxp force-pushed the infra-gcp-main-refactor branch 2 times, most recently from 877bbd1 to 6140ead Compare May 4, 2021 14:39
@k8s-ci-robot k8s-ci-robot added area/prow Setting up or working with prow in general, prow.k8s.io, prow build clusters sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels May 5, 2021
`gsutil ch` specifically doesn't support convenience bindings (e.g.
a role binding with "projectEditors:project-foo" as a member that will
expand to all members with "roles/editor" on project-foo) as it goes against
the principle of least privilege

Annoyingly, this is the case not just for new bindings, but also
attempts to remove existing bindings. Instead, we must follow a
read-modify-write pattern with `gsutil iam get` and `gsutil iam set`

This was really only needed for ensure_removed_gcs_binding but for
parity I redid ensure_gcs_binding to follow the same pattern. If this
turns out to be wildly slower than before, we should revert for the
common case of adding new bindings.
roles/storage.legacyBucketOwner contains storage.buckets.setIamPolicy,
but not storage.objects.get. This means someone with roles/editor on the
project could grant themselves access to read bucket contents that they
aren't supposed to be able to read.

Given that roles/editor has no *.setIamPolicy permissions for other
service resources, this seems like a security gap that should be closed.

Ideally we would do this in _ensure_gcs_bucket. However, removing this
role means removing other (possibly needed) permissions that may be used
by GCP service agent service accounts (e.g. App Engine, GCR, GCE):
- storage.buckets.get
- storage.multipartUploads.(abort|create|list|listParts)
- storage.objects.(create|delete|list)

So until we have time to research what buckets those service agents
specifically need access to, we'll leave them alone and constrain this
policy to "private" gcs buckets that are currently only used by humans
to store terraform state containing potentially sensitive info
ensure_gcs_role_binding prepends "roles/storage." to roles given to it,
just like `gsutil ch` effectively does when its results are viewed
through `gsutil iam get`

So pass "admin" to end up binding "roles/storage.admin"
Basically full circle back to linusa's original fix to color.

The "$*" syntax was hiding args from echo, e.g. color 6 -n "foo" came
out as `echo "-n foo"`, which was messing up some of the output from
ensure-main-project.sh

The original "${@}$(_nocolor)" was failing shellcheck SC2145, which
cautions that "${@}$x" can lead to unspecified behavior. Since the
intent here is to ensure "$(_nocolor)" is the last thing echo prints,
pass it explicitly as another argument: "${@}" "$(_nocolor)"
Either ensure_serviceaccount_role_binding never worked, or something has
changed recently.

For whatever reason, when running without `--project foo`, it is
impossible to use a consistent resource identifier for subcommands of
`gcloud iam service-accounts`:

- add-iam-policy-binding: expects 'projects/p/serviceAccounts/email'
- get-iam-policy: rejects the above, expects 'numeric id' or 'email'

Apparently, for add-iam-policy-binding, the project hosting the
service-account is _not_ parsed out of its email. Instead, the project
will come from:

- the fully specified name `projects/foo/serviceAccounts/email`
- `--project foo` if given
- `gcloud config get-value core/project` otherwise

ref: https://cloud.google.com/sdk/gcloud/reference/iam/service-accounts/add-iam-policy-binding#SERVICE_ACCOUNT

So to allow use of a consistent identifier (email) while reading or
updating service-account iam-policy, update _ensure_resource_role_binding
to optionally accept a project argument. Then parse out and pass the
project for service-accounts only (since they are thus far the only
resource to require this).

Ditto all of the above for remove-iam-policy-binding,
_ensure_removed_resource_role_binding.

I ran into this because I intentionally leave `core/project` set to a personal
project, to ensure I have to be explicit about making changes to shared
projects. When I ran ensure-main-project.sh it couldn't find the
kubernetes-public service accounts in my personal project when trying to
modify workload identity bindings on those service accounts.
As an alternative to empower_ksa_to_svcacct that reduces boilerplate on
the caller.

These functions may better belong in lib_iam, and perhaps a better name
is empower_serviceaccount_for_gke, but this is good enough for now. Can
revisit whenever we feel like replacing uses of empower_ksa_to_svcacct
this was motivated by ensuring that serviceaccounts are bound to their
needed roles directly instead of via group membership

- refactor blocks into functions
- use array of principal:role bindings
- add support for binding serviceaccounts to roles, but skipping if the
  serviceaccount doesn't exist (to avoid dependency cycles)
@spiffxp spiffxp changed the title [wip] infra/gcp: clean up main project infra/gcp: clean up main project May 5, 2021
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 5, 2021
@spiffxp
Copy link
Member Author

spiffxp commented May 5, 2021

I ran ./ensure-main-project.sh and ./ensure-organization.sh as of d288449

I've run these scripts a few times before while working on this PR, so some changes were already picked up in #2001

specifically:

- refactored by:
  - grouping blocks of code into functions
  - parameterizing those functions on PROJECT
  - adding ensure_workload_identity_serviceaccount to handle the sundry
    single-role service accounts intended for use by GKE clusters
- prune main_project_services down to just what we directly use
  - drop compute: direct dep of container
  - drop oslogin: direct dep of compute
  - swap bigquery-json for biquery: the former doesn't exist
  - add comments explaning each of the services we do use
- fix a bunch of shellcheck nits
- add some TODO(spiffxp) lines for role or binding removal as followup
`terraform apply` and `./infra/gcp/ensure-main-project.sh` were getting
in a fight over authoritative IAM bindings for the kubernetes-public
project

IMO terraform shouldn't be setting authoritative bindings for a project
unless it's also managing the project.  Whereas in this case, the module
in question is just for managing a GKE cluster within the project. The
infra/gcp/ensure-main-project.sh script is the source of truth about the
configuration of the kubernetes-public project.

I opted to move the bindings over to ensure-main-project.sh instead of
updating the terraform module to non-authoritatively add IAM members,
since they were related to a group that is managed outside of terraform
vs. service accounts that were created by terraform.

As part of this commit I preemptively ran `terraform state rm` for the
two resources that were removed, so that terraform won't try to destroy
them
@spiffxp
Copy link
Member Author

spiffxp commented May 5, 2021

The most recent commit should address #2000

@spiffxp
Copy link
Member Author

spiffxp commented May 5, 2021

If you want to see the outcomes of these changes, please take a look at my comments on the latest audit PR #2001 (review)

@spiffxp
Copy link
Member Author

spiffxp commented May 5, 2021

/cc @ameukam

@k8s-ci-robot k8s-ci-robot requested a review from ameukam May 5, 2021 20:13
@dims
Copy link
Member

dims commented May 5, 2021

/lgtm

( since "I've run these scripts a few times before while working on this PR" :) )

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 5, 2021
@k8s-ci-robot k8s-ci-robot merged commit a9ad321 into kubernetes:main May 5, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.22 milestone May 5, 2021
# permissions because:
# * The full list is large and has stuff that is inherited listed in it
# * All of our other IAM binding logic calls are additive
function ensure_terraform_state_buckets() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking comment: IMHO we should empower the owners of each bucket with roles/viewerso they can easily run terraform plan.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you once again for the good catch. Apparently roles/viewer is necessary to get storage.buckets.list.

I didn't catch this when testing with my less-privileged account, because it has audit.viewer at the org level and thus roles/viewer on everything implicitly.

They only other off-the-shelf roles that have this are:

  • roles/storage.admin
  • roles/viewer
  • roles/editor
  • roles/owner

I'm tempted to suggest storage.admin, let me look real quick

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it looks like we can't set any of the primitive roles just on the bucket

$ gsutil iam ch "user:spiffxp@google.com:roles/viewer" gs://k8s-infra-clusters-terraform
BadRequestException: 400 Role (roles/viewer) does not exist in the resource's hierarchy.

Which leaves use with roles/viewer on the project, or roles/storage.admin on the bucket as the quick fix here. I'll add to the project for now

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #2005 to address

@spiffxp spiffxp deleted the infra-gcp-main-refactor branch May 5, 2021 21:32
ameukam added a commit to ameukam/k8s.io that referenced this pull request May 6, 2021
Upgrade terraform provider google to 3.1.0 from 2.14.

Upgrade terraform version to 0.13.6. We accidently convert the terraform
to a 0.13.x version when kubernetes#1974
was deployed.

Signed-off-by: Arnaud Meukam <ameukam@gmail.com>
@ameukam
Copy link
Member

ameukam commented Jun 4, 2021

Related to #516

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/prow Setting up or working with prow in general, prow.k8s.io, prow build clusters cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Terraform state is not fully applied
4 participants