-
-
Notifications
You must be signed in to change notification settings - Fork 261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
readinessProbe w/http-admin does not create valid heath check on GKE loadbalancer #113
Comments
The problem here is a little more complex than that, GCE / GKE ingress has many limitations and among them, there is the problem that it doesn't correctly pick up the Now, the issue is the following:
Switching readiness probe to Merging the two ingress together in a single ingress doesn't fix the issue. A solution to this issue might be healthcheck configuration via Another solution would be to make two separate deployments for |
Wow thank you for the detailed write-up! That sounds really frustrating and should definitely be fixed. I think we can deploy two instances of Hydra to resolve this issue on GKE. However, this would not work with the in-memory database which is what some deployments are currently using. We're however thinking about removing in-memory in favor of SQLite (which also supports in-memory but would use a mount in helm). Is there any other way we can work around this for GKE? Personally I have to say that I had so many issues with the GKE Ingress from being very slow to update to not supporting basic features like path rewrites that we ended up using Nginx ingress on GKE. While this doesn't support some features like Global Forwarding Rules (I think that's the name?) it doesn't actually cause 20minutes downtimes when the GCE ingress is updating :D |
Hello @aeneasr, I'm glad my insights about this issue were useful! I agree that GCE ingress isn't where it should be, it's an obsolete piece of software and its development is going forward at a very slow pace. On the other hand, as you also already mentioned, it is the default ingress on GCP and it supports some Google-specific features that NGINX and other ingresses do not. I am looking forward to the SQLite solution and I think it's a step in the right direction for this specific GKE-related issue.
There is this issue kubernetes/ingress-gce#647 that describes a problem similar to this one. Maybe quickly going through the ticket might give some ideas on how to deal with it. A quick workaround to solve this issue could be what was described in this ticket kubernetes/ingress-gce#674 which is to return a 200 HTTP status on the root path As an additional note, this issue kubernetes/ingress-gce#42 might also be interesting for this issue. |
Yeah, considered the status change on / too, felt wrong. Im thinking 2 hydra instances might be the simplest approach. Thanks for the insights on GCE load balancers! I didn't really know there was a thing (but recall other issues 🤷♀). Thanks PS: Hydra is an awesome! |
A working solution is now available./ping @NoelJames @aeneasr GKE supported versionsNOTE: This solution works only from GKE version VPC-native and Network Endpoint Group (NEG)If you enabled
According to Google, this is a short-term workaround:
SolutionThe following resource has to be manually created before the deployment of the Service / Ingress: apiVersion: cloud.google.com/v1
kind: BackendConfig
metadata:
name: http-public
namespace: hydra
spec:
healthCheck:
checkIntervalSec: 5
timeoutSec: 3
healthyThreshold: 1
unhealthyThreshold: 3
type: HTTP
requestPath: /health/ready
port: 4444 The following annotations have to be added to the cloud.google.com/neg: '{"ingress": true}'
cloud.google.com/backend-config: '{"default": "http-public"}' |
Awesome, thank you for the update! Does that mean that we need to change something in the chart? |
I would like to work on a PR for this specific issue, but in case I won't be able to work on it on a short-term I'd like to lay down some recommendations for whoever would like to propose a PR before I do: LimitationsThere should be a check on the Kubernetes version, as mentioned above this feature was introduced only starting from the version AnnotationsThe following annotations should be configured automatically IMO as they are not trivial and it takes quite some time to find them in the GKE documentation (they are buried in some exotic pages about Load Balancers and Ingresses): cloud.google.com/neg: '{"ingress": true}'
cloud.google.com/backend-config: '{"default": "http-public"}' BackendConfigThe following resource should be added to the resources of the chart and should be toggled by the typical apiVersion: cloud.google.com/v1
kind: BackendConfig
metadata:
name: http-public
namespace: hydra
spec:
healthCheck:
checkIntervalSec: 5
timeoutSec: 3
healthyThreshold: 1
unhealthyThreshold: 3
type: HTTP
requestPath: /health/ready
port: 4444 TemplateThe # The following configuration enabled a custom BackendConfig and HealthCheck on GKE.
# This configuration *must* be enabled if you want to use an Ingress on the "public" endpoint on GKE.
# If you want to enable TLS on this port, please change the protocol to "HTTPS", additionally, you will need to add the annotation "cloud.google.com/app-protocols: '{"4444": "HTTPS"}'" to the Service "public".
# If you are running a VPC-native cluster, please check the issue https://github.com/ory/k8s/issues/113 for current limitations.
backendConfig:
enabled: false
path: /health/ready
port: 4444
protocol: HTTP
interval: 60
timeout: 60
healthyThreshold: 1
unhealthyThreshold: 10 The {{- if .Values.backendConfig.enabled }}
apiVersion: cloud.google.com/v1
kind: BackendConfig
metadata:
name: {{ include "hydra.fullname" . }}
spec:
healthCheck:
checkIntervalSec: {{ .Values.backendConfig.interval }}
timeoutSec: {{ .Values.backendConfig.timeout }}
healthyThreshold: {{ .Values.backendConfig.healthyThreshold }}
unhealthyThreshold: {{ .Values.backendConfig.unhealthyThreshold }}
type: {{ .Values.backendConfig.protocol }}
requestPath: {{ .Values.backendConfig.path }}
port: {{ .Values.backendConfig.port }}
{{- end }} Additional informationAs mentioned before this CDR does not work properly for VPC-native clusters, therefore I find it appropriate to point out this issue, the documentation of Google, or a warning that mentions the workaround necessary, i.e. an additional firewall rule that has to be configured manually:
Finally, if you want to enable TLS, you have to follow these steps: The second point has cost me an entire day of GKE documentation and tests. |
Awesome, thank you for the great write-up! This will certainly help with implementation. I'll also not be able to work in this in the near future so if anyone wants to pick this up please do :) My only suggestion would be to probably make |
Thank you @christian-roggia for the detailed answer and follow ups! Can tell it saved me a bunch of time :) |
I am closing this issue as it has not received any engagement from the community or maintainers in a long time. That does not imply that the issue has no merit. If you feel strongly about this issue
We are cleaning up issues every now and then, primarily to keep the 4000+ issues in our backlog in check and to prevent maintainer burnout. Burnout in open source maintainership is a widespread and serious issue. It can lead to severe personal and health issues as well as enabling catastrophic attack vectors. Thank you to anyone who participated in the issue! 🙏✌️ |
Describe the bug
On gke , load balancer fails because the correct health check for public is not found.
To Reproduce
My fan out ingress contains multiple hosts, the hydra section looks like this
Expected behavior
A valid health check would be found
Environment
Additional context
Changing the readiness probe in deployment.yaml like so. Fixes the issue for me
The text was updated successfully, but these errors were encountered: