-
Notifications
You must be signed in to change notification settings - Fork 39.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
openstack: Don't Delete LB in Case of Security Group Reconciliation Errors #82264
openstack: Don't Delete LB in Case of Security Group Reconciliation Errors #82264
Conversation
Hi @multi-io. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
eeee973
to
7a3f15a
Compare
/ok-to-test |
/assign @dims |
@dims Is there anything that blocks merging this? In the out-of-tree openstack cloud provider the fix has already been merged: kubernetes/cloud-provider-openstack#743 |
@multi-io Ack! /approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dims, multi-io The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
…ete_lb_on_errors openstack: Don't Delete LB in Case of Security Group Reconciliation Errors
…2264-upstream-release-1.16 Automated cherry pick of #82264: openstack: do not delete LB in case of security group
/kind bug |
…2264-upstream-release-1.15 Automated cherry pick of #82264: openstack: do not delete LB in case of security group
What type of PR is this?
/kind bug
What this PR does / why we need it:
This fixes the legacy Openstack cloud provider's lbaas control loop so the EnsureLoadBalancer() function no longer deletes the LB if something went wrong when reconciling the LB's security groups. With the current master, if you have an LB service and associated LB already up and running and working fine, and then during a reconcile loop (which shouldn't change anything) e.g. the OpenStack API is down temporarily at the wrong moment (i.e. if it's still up during the LB and listener reconciliation, but then down during the SG reconciliation), then the whole LB will be deleted. We saw this exact thing happen in a real world customer application, which went offline because of if (the LB is recreated shortly after, but likely with a different floating IP).
Deleting the LB in case of errors in a "reconcile" (rather than "create") function seems just wrong, and all the other parts of EnsureLoadBalancer() don't do it either: E.g. if a transient error occurs when creating a listener, we just return it and leave the LB in a half-created state (
kubernetes/staging/src/k8s.io/legacy-cloud-providers/openstack/openstack_loadbalancer.go
Lines 785 to 788 in c7c89f8
kubernetes/pkg/controller/service/service_controller.go
Lines 255 to 256 in 3fe7a57
This PR just fixes the SG reconciliation to follow the same pattern. It seems to me that the current "delete LB in case an an error" approach was originally not part of a "reconcile" function but of a "create" function, where it would've made more sense.
The same bug is present in the new out-of-tree openstack cloud provider; I've submitted a corresponding PR there (kubernetes/cloud-provider-openstack#743). We'd still like to fix this error in-tree as well and also have the fix backported to 1.15 and 1.14 (please?) because our migration to cloud controller manager is still in the early planning stages and will take more time.
Which issue(s) this PR fixes:
Fixes #35056
Release note: