Ensure disks removal after removing cluster in GKE #1163

artemnikitin · 2019-06-28T04:22:41Z

Thanks to @barkbay, there was found that after killing clusters in GKE disks from those instances are not deleted automatically. This change introducing a step for all CI jobs running in GKE to remove disks properly.

close #1162

build/ci/delete_unused_disks.py

artemnikitin · 2019-07-01T13:41:44Z

Looks like infra is doing something similar to this. I will contact them to sync our efforts, maybe they can do that on their side.

artemnikitin · 2019-07-03T10:02:47Z

infra solution is only for them.

sebgl · 2019-07-03T10:15:32Z

build/ci/Makefile

@@ -144,3 +144,15 @@ ci-e2e-delete-cluster: vault-gke-creds
    	-e "GKE_SERVICE_ACCOUNT_KEY_FILE=$(GO_MOUNT_PATH)/build/ci/$(GKE_CREDS_FILE)" \
    	cloud-on-k8s-ci-e2e \
    	bash -c "make -C operators set-context-gke delete-gke"


IIUC, everytime we delete a gke cluster, the underlying disks are not deleted. This applies to e2e tests but also to our own clusters, right?
Should we do this everytime, as part of the delete-gke target?

This fix applies to all disks, not only from clusters for e2e tests. No need to explicitly run delete-gke

OK. But isn't there a bug in the delete-gke target (that we use ourselves) if the disks are not deleted?

Yeah sure, if it runs and some unused disks are left, then it's a bug. I only run it before submitting original PR and it cleaned up everything at that moment.

… 28-eck-snapshot-build

artemnikitin · 2019-07-04T08:09:22Z

jenkins test this please

thbkrkr · 2019-07-04T08:10:24Z

I would like to understand what is the cause of this issue.
I just deleted a GKE cluster and watched the disks at the same time and the disks were automatically deleted. The statuses of the disks are immediately passed into DELETING.

> gcloud compute disks list  | grep thb
gke-thb-e2e-cluster-default-pool-5b7e16fd-ppr0                   europe-west1-d  zone            30       pd-ssd       DELETING
gke-thb-e2e-cluster-default-pool-f3c9866f-rw75                   europe-west1-b  zone            30       pd-ssd       DELETING
gke-thb-e2e-cluster-default-pool-106593c7-3r5n                   europe-west1-c  zone            30       pd-ssd       DELETING

What is "killing clusters in GKE" compared to "deleting clusters"?

artemnikitin · 2019-07-04T08:45:41Z

@thbkrkr yeah you right, right now it's started to delete disks right after instances are deleted, but it wasn't the case in the past. Either something changed on GCP side or it's infra job for removing disks (mentioned in https://github.com/elastic/infra/pull/12703) I will try to figure it out

What is "killing clusters in GKE" compared to "deleting clusters"?

It's the same 😄

thbkrkr · 2019-07-04T15:07:31Z

I would like to understand what is the cause of this issue.

I think the reason is that PVs were not deleted before the cluster was deleted.

artemnikitin · 2019-07-08T07:27:43Z

jenkins test this please

* Support for APM server configuration (#1181) * Add a config section to the APM server configuration * APM: Add support for keystore * Factorize ElasticsearchAuthSettings * Update dev setup doc + fix GKE bootstrap script (#1203) * Update dev setup doc + fix GKE bootstrap script * Update wording of container registry authentication * Ensure disks removal after removing cluster in GKE (#1163) * Update gke-cluster.sh * Implement cleanup for unused disks in GCP * Update Makefile * Update CI jobs to do proper cleanup * Normalize the raw config when creating canonical configs (#1208) This aims at counteracting the difference between JSON centric serialization and the use of YAML as the serialization format in canonical config. If not normalizing numeric values like 1 will differ when comparing configs as JSON deserializes integer numbers to float64 and YAML to uint64. * Homogenize logs (#1168) * Don't run tests if only docs are changed (#1216) * Update Jenkinsfile * Simplify notOnlyDocs() * Update Jenkinsfile * Push snapshot ECK release on successful PR build (#1184) * Update makefile's to support snapshots * Add snapshot releases to Jenkins pipelines * Cleanup * Rename RELEASE to USE_ELASTIC_DOCKER_REGISTRY * Update Jenkinsfile * Add a note on EKS inbound traffic & validating webhook (#1211) EKS users must explicitly enable communication from the k8s control plane and nodes port 443 in order for the control plane to reach the validating webhook. Should help with #896. * Update PodSpec with Hostname from PVC when re-using (#1204) * Bind the Debug HTTP server to localhost by default (#1220) * Run e2e tests against custom Docker image (#1135) * Add implementation * Update makefile's * Update Makefile * Rename Jenkisnfile * Fix review comments * Update e2e-custom.yml * Update e2e-custom.yml * Return deploy-all-in-one to normal * Delete GKE cluster only if changes not in docs (#1223) * Add operator version to resources (#1224) * Warn if unsupported distribution (#1228) The operator only works with the official ES distributions to enable the security available with the basic (free), gold and platinum licenses in order to ensure that all clusters launched are secured by default. A check is done in the prepare-fs script by looking at the existence of the Elastic License. If not present, the script exit with a custom exit code. Then the ES reconcilation loop sends an event of type warning if it detects that a prepare-fs init container terminated with this exit code. * Document Elasticsearch update strategy change budget & groups (#1210) Add documentation for the `updateStrategy` section of the Elasticsearch spec. It documents how (and why) `changeBudget` and `groups` are used by ECK, and how both settings can be specified by the user.

artemnikitin added 4 commits June 27, 2019 20:21

Update gke-cluster.sh

241d7fb

Implement cleanup for unused disks in GCP

5a2a440

Update Makefile

e648e4e

Update CI jobs to do proper cleanup

969e270

artemnikitin requested a review from barkbay June 28, 2019 04:22

artemnikitin changed the title ~~28 eck snapshot build~~ Ensure disks removal after removing cluster in GKE Jun 28, 2019

artemnikitin requested a review from sebgl July 1, 2019 06:23

sebgl reviewed Jul 1, 2019

View reviewed changes

build/ci/delete_unused_disks.py Show resolved Hide resolved

sebgl reviewed Jul 1, 2019

View reviewed changes

build/ci/delete_unused_disks.py Show resolved Hide resolved

artemnikitin requested a review from sebgl July 1, 2019 10:09

sebgl reviewed Jul 3, 2019

View reviewed changes

sebgl approved these changes Jul 3, 2019

View reviewed changes

Resolve merge conflict

36ae6d2

artemnikitin mentioned this pull request Jul 3, 2019

Increase timeout for PR job #1185

Merged

Merge branch 'master' of https://github.com/elastic/cloud-on-k8s into…

9499926

… 28-eck-snapshot-build

thbkrkr approved these changes Jul 4, 2019

View reviewed changes

artemnikitin merged commit 899ee31 into elastic:master Jul 8, 2019

artemnikitin deleted the 28-eck-snapshot-build branch July 8, 2019 09:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure disks removal after removing cluster in GKE #1163

Ensure disks removal after removing cluster in GKE #1163

artemnikitin commented Jun 28, 2019

artemnikitin commented Jul 1, 2019

artemnikitin commented Jul 3, 2019

sebgl Jul 3, 2019

artemnikitin Jul 3, 2019

sebgl Jul 3, 2019

artemnikitin Jul 3, 2019

artemnikitin commented Jul 4, 2019

thbkrkr commented Jul 4, 2019 •

edited

Loading

artemnikitin commented Jul 4, 2019

thbkrkr commented Jul 4, 2019

artemnikitin commented Jul 8, 2019

Ensure disks removal after removing cluster in GKE #1163

Ensure disks removal after removing cluster in GKE #1163

Conversation

artemnikitin commented Jun 28, 2019

artemnikitin commented Jul 1, 2019

artemnikitin commented Jul 3, 2019

sebgl Jul 3, 2019

Choose a reason for hiding this comment

artemnikitin Jul 3, 2019

Choose a reason for hiding this comment

sebgl Jul 3, 2019

Choose a reason for hiding this comment

artemnikitin Jul 3, 2019

Choose a reason for hiding this comment

artemnikitin commented Jul 4, 2019

thbkrkr commented Jul 4, 2019 • edited Loading

artemnikitin commented Jul 4, 2019

thbkrkr commented Jul 4, 2019

artemnikitin commented Jul 8, 2019

thbkrkr commented Jul 4, 2019 •

edited

Loading