Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fire warning event instead of hard failing if TLS certificate is not present #388

Merged
merged 1 commit into from
Sep 12, 2018

Conversation

munnerz
Copy link
Member

@munnerz munnerz commented Jul 4, 2018

This is a follow up/in relation to my comment made here: #112 (comment)

This switches ingress-gce back to the previous behaviour of 'soft failing' if the specified TLS secret exists. Without this change, we've had many users complain that auto-TLS using cert-manager (and also kube-lego) is broken.

Please see the discussion in #112 and cert-manager/cert-manager#606 for more detail.

The one notable difference, is instead of simply logging a warning using glog (which on GKE is not visible to end users), we now fire a Warning event, but importantly still continue afterwards. This is essential for something like cert-manager, as without it users would need to create some kind of self-signed placeholder certificate.

For reference, ingress-nginx will actually generate its own self signed certificate, and if the specified Secret cannot be found, it will serve using this. This would also be acceptable for ingress-gce, however is a more considerable change.

I really hope we can get this patch accepted in some form, as users are currently unable to automatically obtain TLS certificates from Let's Encrypt without extra workarounds (i.e. manually specifying Certificate resources).

Most importantly however, #112 was a breaking change that was not communicated, and as a result has caused issues downstream for end-users.

Thanks for taking a look!

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jul 4, 2018
@@ -445,7 +445,7 @@ func (lbc *LoadBalancerController) toRuntimeInfo(ing *extensions.Ingress) (*load
if annotations.UseNamedTLS() == "" {
tls, err = lbc.tlsLoader.Load(ing)
if err != nil {
return nil, fmt.Errorf("cannot get certs for Ingress %v/%v: %v", ing.Namespace, ing.Name, err)
lbc.ctx.Recorder(ing.Namespace).Eventf(ing, apiv1.EventTypeWarning, "Sync", fmt.Sprintf("Could not get certificates for ingress: %v", err))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick self-review/thought: perhaps we should change this message to "Could not get certificate for ingress - serving insecurely" or something?

Additionally, we could gate this behaviour on IsNotFound as well (i.e. if the err is not a 404, then we return the error so the ingress can be retried/re-queued)

Copy link
Member

@bowei bowei Sep 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make the error message more explicit and distinguish the error?

import "k8s.io/apimachinery/pkg/api/errors"
...
if err != nil {
  if errors.IsNotFound(err) {
   // TODO: this path should be removed when external certificate managers migrate to a better solution.
  const msg = "Could not find TLS certificates. Continuing setup for the load balancer to serve HTTP. Note: this behavior is deprecated and will be removed in a future version of ingress-gce"
   lbc.ctx.Recorder(ing.Namespace).Eventf(ing, apiv1.EventTypeWarning, "Sync", msg)
  } else {
    glog.Errorf(Cannot get certs for Ingress %s/%s: %v", ing.Namespace, ing.Name, err)
    return nil, err
  }
}

@munnerz
Copy link
Member Author

munnerz commented Jul 4, 2018

/cc @bowei @MrHohn

cc'ing you as you both opened the initial issue/PR, so would like to be sure that this patch still fulfils your initial issue/feature request (#41)

@k8s-ci-robot k8s-ci-robot requested review from bowei and MrHohn July 4, 2018 11:46
@munnerz
Copy link
Member Author

munnerz commented Jul 4, 2018

/cc @nicksardo

as you reviewed/merged #112 😄

@thebigredgeek
Copy link

Any update here?

@munnerz
Copy link
Member Author

munnerz commented Jul 20, 2018

Ping - if there are any changes that need to be made here, i.e. making this somehow configurable via an annotation (something like gce.ingress.kubernetes.io/hard-fail-on-missing-secret) then I am happy to make the changes required 😄.

Not sure what upstream's opinion is, and (as shown by feedback on the PR through reactions!) this is definitely something users are keen to see fixed/altered 😄

@rramkumar1
Copy link
Contributor

@munnerz Copying comment from bowei@ in #112

"Would be possible for the cert-manager to detect this and generate a self-signed cert to unblock creation of the Ingress and then do the update to final cert from let's encrypt. I think this is the behavior of ingress-nginx (self-signed cert)

It feels wrong to proceed with the LB configuration in a situation when the configuration is not valid."

@munnerz
Copy link
Member Author

munnerz commented Aug 7, 2018

Thanks for copying the comment @rramkumar1

"Would be possible for the cert-manager to detect this and generate a self-signed cert to unblock creation of the Ingress and then do the update to final cert from let's encrypt. I think this is the behavior of ingress-nginx (self-signed cert)

@bowei this is not as simple as it first sounds, especially not with the current releases of cert-manager (future releases will hopefully make this a bit easier, but it is still a fairly complex issue requiring some fairly large changes to our codebase).

It feels wrong to proceed with the LB configuration in a situation when the configuration is not valid.

Whilst I understand this sentiment, I think it is worth considering that this was a breaking behavioural change on ingress-gce's part. The Ingress spec is ill-defined, which leads to situations such as these, however the situation is not made any better by silent breaking changes being made to ingress controllers.

ingress-nginx for example, will generate a self signed certificate itself if the referenced secret does not exist (and prior to the change in this repo, the GCLB would have been configured to serve on port 80 and not 443). Both of these behaviours are compatible.

In short, it will take considerable work on our side to support this new operating mode, and given the change in ingress-gce was made without a corresponding major version bump, this is extremely painful.

This PR aims to 'meet in the middle' - fulfilling the initial goal of #112 (It might be helpful to make this visible to users.) by firing an Event instead of a log message (which is more visible to users).

I'd really hope you'll reconsider this PR - there's fairly significant user interest (as you can see from the reactions!), and I've even had other Googlers reach out to me to express their desire to get this problem fixed (as cert-manager is used in other Google products!).

@bowei
Copy link
Member

bowei commented Aug 7, 2018

Good points, we can keep the existing behavior for now.

I would like to be on a path to be able to remove this path in at some point soon. The more states that the ingress can be in, the easier it is to introduce bugs.

I will take a look.

@munnerz
Copy link
Member Author

munnerz commented Aug 9, 2018

@bowei agreed - it'd be great to resolve this properly. In future we will definitely consider this case however, and attempt to build some patch-work into cert-manager to help alleviate the problem.

Is there a timeline on getting this PR merged, and the subsequent fix being rolled out to GKE clusters? I'd hope this would qualify to go into a patch release of GKE, given it was a regression.

I'd be happy to backport the fix if that needs to happen too (I imagine it'd be easy enough given its size!)

@bowei
Copy link
Member

bowei commented Aug 9, 2018

Probably in the next couple of weeks. We have some fixes as well.

@brettcurtis
Copy link

Howdy folks - I'm working on a new project and I'm trying to determine if I'll see this in GKE clusters in the next few weeks?

Thanks!

@munnerz
Copy link
Member Author

munnerz commented Sep 6, 2018

Ping on this - does this PR not need to be merged to make it out into GKE?

I notice there was a patch release of GKE cut yesterday, which seems to not include this fix 😟

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Sep 10, 2018
@munnerz
Copy link
Member Author

munnerz commented Sep 10, 2018

@bowei Updated with requested changes 😄

@bowei
Copy link
Member

bowei commented Sep 12, 2018

/lgtm
/approve

@rramkumar1 -- let's pick this into the next patch release.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 12, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bowei, munnerz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 12, 2018
@k8s-ci-robot k8s-ci-robot merged commit ca3a07b into kubernetes:master Sep 12, 2018
k8s-ci-robot added a commit that referenced this pull request Sep 13, 2018
@munnerz munnerz deleted the soft-fail-tls branch January 14, 2020 10:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants