sync the secret if the certificate isn't present #1201
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
When the controller is being started, there's a period where various ingress resources will have their TLS certificate not present. This is shown by showing the "default certificate", for awhile until the controller syncs that secret.
If you have many ingresses ( think like >200 ), it can take minutes for the entire thing to converge. You can see the error message in the logs:
"ssl certificate \"some-cert\" does not exist in local store"
Reproduce
while true ; do curl https://someplace.with.valid.cer.com/ -m 1s ; done
Fix
Since the queue is rate limited, and the only time that certificate syncing is done when the specific ingress of the event is processed. This leaves a big window of invalid certs as NGINX needs to know the entire state of the system (and indeed it makes sites for them, just no certificates).
When the cert isn't found, I added a syncSecret call so that they can be loaded. The result is how I would expect now, instead of invalid cert errors we get timeouts until the site is loaded. This is much more preferable than serving the invalid cert. And it also converges in about 20s instead of minutes.
Other ways to fix
I believe this is this way because for the GCE controller, one ingress == one load balancer, but NGINX is many ingress -> one load balancer instance.