-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure disks removal after removing cluster in GKE #1163
Ensure disks removal after removing cluster in GKE #1163
Conversation
Looks like |
|
@@ -144,3 +144,15 @@ ci-e2e-delete-cluster: vault-gke-creds | |||
-e "GKE_SERVICE_ACCOUNT_KEY_FILE=$(GO_MOUNT_PATH)/build/ci/$(GKE_CREDS_FILE)" \ | |||
cloud-on-k8s-ci-e2e \ | |||
bash -c "make -C operators set-context-gke delete-gke" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, everytime we delete a gke cluster, the underlying disks are not deleted. This applies to e2e tests but also to our own clusters, right?
Should we do this everytime, as part of the delete-gke
target?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fix applies to all disks, not only from clusters for e2e tests. No need to explicitly run delete-gke
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. But isn't there a bug in the delete-gke
target (that we use ourselves) if the disks are not deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sure, if it runs and some unused disks are left, then it's a bug. I only run it before submitting original PR and it cleaned up everything at that moment.
… 28-eck-snapshot-build
jenkins test this please |
I would like to understand what is the cause of this issue. > gcloud compute disks list | grep thb
gke-thb-e2e-cluster-default-pool-5b7e16fd-ppr0 europe-west1-d zone 30 pd-ssd DELETING
gke-thb-e2e-cluster-default-pool-f3c9866f-rw75 europe-west1-b zone 30 pd-ssd DELETING
gke-thb-e2e-cluster-default-pool-106593c7-3r5n europe-west1-c zone 30 pd-ssd DELETING What is "killing clusters in GKE" compared to "deleting clusters"? |
@thbkrkr yeah you right, right now it's started to delete disks right after instances are deleted, but it wasn't the case in the past. Either something changed on GCP side or it's
It's the same 😄 |
I think the reason is that PVs were not deleted before the cluster was deleted. |
jenkins test this please |
* Support for APM server configuration (#1181) * Add a config section to the APM server configuration * APM: Add support for keystore * Factorize ElasticsearchAuthSettings * Update dev setup doc + fix GKE bootstrap script (#1203) * Update dev setup doc + fix GKE bootstrap script * Update wording of container registry authentication * Ensure disks removal after removing cluster in GKE (#1163) * Update gke-cluster.sh * Implement cleanup for unused disks in GCP * Update Makefile * Update CI jobs to do proper cleanup * Normalize the raw config when creating canonical configs (#1208) This aims at counteracting the difference between JSON centric serialization and the use of YAML as the serialization format in canonical config. If not normalizing numeric values like 1 will differ when comparing configs as JSON deserializes integer numbers to float64 and YAML to uint64. * Homogenize logs (#1168) * Don't run tests if only docs are changed (#1216) * Update Jenkinsfile * Simplify notOnlyDocs() * Update Jenkinsfile * Push snapshot ECK release on successful PR build (#1184) * Update makefile's to support snapshots * Add snapshot releases to Jenkins pipelines * Cleanup * Rename RELEASE to USE_ELASTIC_DOCKER_REGISTRY * Update Jenkinsfile * Add a note on EKS inbound traffic & validating webhook (#1211) EKS users must explicitly enable communication from the k8s control plane and nodes port 443 in order for the control plane to reach the validating webhook. Should help with #896. * Update PodSpec with Hostname from PVC when re-using (#1204) * Bind the Debug HTTP server to localhost by default (#1220) * Run e2e tests against custom Docker image (#1135) * Add implementation * Update makefile's * Update Makefile * Rename Jenkisnfile * Fix review comments * Update e2e-custom.yml * Update e2e-custom.yml * Return deploy-all-in-one to normal * Delete GKE cluster only if changes not in docs (#1223) * Add operator version to resources (#1224) * Warn if unsupported distribution (#1228) The operator only works with the official ES distributions to enable the security available with the basic (free), gold and platinum licenses in order to ensure that all clusters launched are secured by default. A check is done in the prepare-fs script by looking at the existence of the Elastic License. If not present, the script exit with a custom exit code. Then the ES reconcilation loop sends an event of type warning if it detects that a prepare-fs init container terminated with this exit code. * Document Elasticsearch update strategy change budget & groups (#1210) Add documentation for the `updateStrategy` section of the Elasticsearch spec. It documents how (and why) `changeBudget` and `groups` are used by ECK, and how both settings can be specified by the user.
Thanks to @barkbay, there was found that after killing clusters in GKE disks from those instances are not deleted automatically. This change introducing a step for all CI jobs running in GKE to remove disks properly.
close #1162