-
Notifications
You must be signed in to change notification settings - Fork 303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fire warning event instead of hard failing if TLS certificate is not present #388
Conversation
pkg/controller/controller.go
Outdated
@@ -445,7 +445,7 @@ func (lbc *LoadBalancerController) toRuntimeInfo(ing *extensions.Ingress) (*load | |||
if annotations.UseNamedTLS() == "" { | |||
tls, err = lbc.tlsLoader.Load(ing) | |||
if err != nil { | |||
return nil, fmt.Errorf("cannot get certs for Ingress %v/%v: %v", ing.Namespace, ing.Name, err) | |||
lbc.ctx.Recorder(ing.Namespace).Eventf(ing, apiv1.EventTypeWarning, "Sync", fmt.Sprintf("Could not get certificates for ingress: %v", err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick self-review/thought: perhaps we should change this message to "Could not get certificate for ingress - serving insecurely" or something?
Additionally, we could gate this behaviour on IsNotFound as well (i.e. if the err is not a 404, then we return the error so the ingress can be retried/re-queued)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make the error message more explicit and distinguish the error?
import "k8s.io/apimachinery/pkg/api/errors"
...
if err != nil {
if errors.IsNotFound(err) {
// TODO: this path should be removed when external certificate managers migrate to a better solution.
const msg = "Could not find TLS certificates. Continuing setup for the load balancer to serve HTTP. Note: this behavior is deprecated and will be removed in a future version of ingress-gce"
lbc.ctx.Recorder(ing.Namespace).Eventf(ing, apiv1.EventTypeWarning, "Sync", msg)
} else {
glog.Errorf(Cannot get certs for Ingress %s/%s: %v", ing.Namespace, ing.Name, err)
return nil, err
}
}
/cc @nicksardo as you reviewed/merged #112 😄 |
Any update here? |
Ping - if there are any changes that need to be made here, i.e. making this somehow configurable via an annotation (something like Not sure what upstream's opinion is, and (as shown by feedback on the PR through reactions!) this is definitely something users are keen to see fixed/altered 😄 |
@munnerz Copying comment from bowei@ in #112 "Would be possible for the cert-manager to detect this and generate a self-signed cert to unblock creation of the Ingress and then do the update to final cert from let's encrypt. I think this is the behavior of ingress-nginx (self-signed cert) It feels wrong to proceed with the LB configuration in a situation when the configuration is not valid." |
Thanks for copying the comment @rramkumar1
@bowei this is not as simple as it first sounds, especially not with the current releases of cert-manager (future releases will hopefully make this a bit easier, but it is still a fairly complex issue requiring some fairly large changes to our codebase).
Whilst I understand this sentiment, I think it is worth considering that this was a breaking behavioural change on ingress-gce's part. The Ingress spec is ill-defined, which leads to situations such as these, however the situation is not made any better by silent breaking changes being made to ingress controllers. ingress-nginx for example, will generate a self signed certificate itself if the referenced secret does not exist (and prior to the change in this repo, the GCLB would have been configured to serve on port 80 and not 443). Both of these behaviours are compatible. In short, it will take considerable work on our side to support this new operating mode, and given the change in ingress-gce was made without a corresponding major version bump, this is extremely painful. This PR aims to 'meet in the middle' - fulfilling the initial goal of #112 ( I'd really hope you'll reconsider this PR - there's fairly significant user interest (as you can see from the reactions!), and I've even had other Googlers reach out to me to express their desire to get this problem fixed (as cert-manager is used in other Google products!). |
Good points, we can keep the existing behavior for now. I would like to be on a path to be able to remove this path in at some point soon. The more states that the ingress can be in, the easier it is to introduce bugs. I will take a look. |
@bowei agreed - it'd be great to resolve this properly. In future we will definitely consider this case however, and attempt to build some patch-work into cert-manager to help alleviate the problem. Is there a timeline on getting this PR merged, and the subsequent fix being rolled out to GKE clusters? I'd hope this would qualify to go into a patch release of GKE, given it was a regression. I'd be happy to backport the fix if that needs to happen too (I imagine it'd be easy enough given its size!) |
Probably in the next couple of weeks. We have some fixes as well. |
Howdy folks - I'm working on a new project and I'm trying to determine if I'll see this in GKE clusters in the next few weeks? Thanks! |
Ping on this - does this PR not need to be merged to make it out into GKE? I notice there was a patch release of GKE cut yesterday, which seems to not include this fix 😟 |
9dab49f
to
4b5faae
Compare
@bowei Updated with requested changes 😄 |
/lgtm @rramkumar1 -- let's pick this into the next patch release. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bowei, munnerz The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Cherrypick #388 on release 1.3 branch
This is a follow up/in relation to my comment made here: #112 (comment)
This switches ingress-gce back to the previous behaviour of 'soft failing' if the specified TLS secret exists. Without this change, we've had many users complain that auto-TLS using cert-manager (and also kube-lego) is broken.
Please see the discussion in #112 and cert-manager/cert-manager#606 for more detail.
The one notable difference, is instead of simply logging a warning using
glog
(which on GKE is not visible to end users), we now fire a Warning event, but importantly still continue afterwards. This is essential for something like cert-manager, as without it users would need to create some kind of self-signed placeholder certificate.For reference, ingress-nginx will actually generate its own self signed certificate, and if the specified Secret cannot be found, it will serve using this. This would also be acceptable for ingress-gce, however is a more considerable change.
I really hope we can get this patch accepted in some form, as users are currently unable to automatically obtain TLS certificates from Let's Encrypt without extra workarounds (i.e. manually specifying Certificate resources).
Most importantly however, #112 was a breaking change that was not communicated, and as a result has caused issues downstream for end-users.
Thanks for taking a look!