Add healthz endpoint #107

xinbinhuang · 2022-09-09T02:25:56Z

With the current /metrics endpoint for the readiness probe. The log is flooded with error log

{"level":"error","ts":1662690783.7314105,"logger":"http.handlers.metrics","msg":"error encoding and sending metric family:write tcp 10.42.55.171:9765->192.168.0.217:55420: write: connection reset by peer"}
{"level":"error","ts":1662690783.7314105,"logger":"http.handlers.metrics","msg":"error encoding and sending metric family:write tcp 10.42.55.171:9765->192.168.0.217:55420: write: connection reset by peer"}
 {"level":"error","ts":1662690813.7306309,"logger":"http.handlers.metrics","msg":"error encoding and sending metric family:write tcp 10.42.55.171:9765->192.168.0.217:56800: write: broken pipe"}

This's likely due to the large size of the metrics endpoint.

This PR try to fix that:

Add a lightweight healthz endpoint and use that for readinessProbe instead of the metrics endpoint.
Also, fix dev workflow to make it actually working

xinbinhuang · 2022-09-09T02:37:13Z

@nilathedragon @Embraser01 Would you like to take a look?

nilathedragon · 2022-09-09T09:55:03Z

I'm not a maintainer so I can't make the call on this, but I'd welcome the separate healthz endpoint.

Embraser01

Thanks for this! Didn't had time to check this problem yet

Embraser01 · 2022-09-13T08:34:10Z

charts/caddy-ingress-controller/templates/deployment.yaml

+              port: 80
+              path: /healthz


I don't like the idea of having the health check on the 80 port (which serve all requests). It would be better to add it either on the metric endpoint or on a new http server

If you have it on the metrics endpoint, you wouldn't be able to turn off metrics without breaking the health check. So a separate port might be better.

IIRC the metrics server is always on but the metrics handler is enabled only when needed. It should be fine to replace the "static_response" handler by an healthz handler

https://github.com/caddyserver/ingress/blob/master/internal/caddy/global/metrics.go#L28 This line would currently disable the whole server if metrics are turned off. I think I encountered this when I tried disabling metrics and my readiness checks would keep failing.

Let me try to port it to the metrics endpoint and see if disabling the metrics server still works. Will report back

Briefly skim the code. It's definitely doable. I will update the PR to accomodate the change.

I have a question though. Would there be possibility where the ingress_server server is down, but the metrics_server is still active? This will translate to the controller showing as healthy but can't actually route any traffics.

@Embraser01 I've moved the /healthz to the metrics server.

And add a test to validate the change by comparing the generated Caddy JSON config. Currently, it only has one base test case, and doesn't seem straightforward to add more

I have a question though. Would there be possibility where the ingress_server server is down, but the metrics_server is still active? This will translate to the controller showing as healthy but can't actually route any traffics.

I don't think it can happen, at least I think on config reload, Caddy works on a "all or nothing" way where if the ingress_server is not yet started, metrics_server will not serve either although I'm not sure (@mholt should know more on this)

xinbinhuang · 2022-10-26T15:44:00Z

@Embraser01 Can you help approve the workflow again? I fixed the build error for goreleaser.

xinbinhuang · 2022-11-24T01:05:10Z

@Embraser01 CI passed ✔️ Should we merge? Or let me know if you have extra feedback.

Embraser01 · 2022-12-06T10:46:21Z

Yes, really sorry for the long delay

xinbinhuang force-pushed the add-healthz branch from db3a266 to 84539d3 Compare September 9, 2022 02:26

xinbinhuang changed the title ~~add healthz~~ Add healthz endpoint Sep 9, 2022

Embraser01 reviewed Sep 13, 2022

View reviewed changes

xinbinhuang force-pushed the add-healthz branch from 123ee43 to bb7ac2f Compare September 14, 2022 00:40

xinbinhuang requested a review from Embraser01 September 15, 2022 00:23

Embraser01 previously approved these changes Oct 12, 2022

View reviewed changes

xinbinhuang dismissed Embraser01’s stale review via 550ed09 October 26, 2022 15:42

xinbinhuang force-pushed the add-healthz branch from bb7ac2f to 550ed09 Compare October 26, 2022 15:42

xinbinhuang added 2 commits October 26, 2022 11:43

Add healthz handler

83b5bdf

Move healthz to metrics server & add tests

ae14620

xinbinhuang force-pushed the add-healthz branch from 550ed09 to ae14620 Compare October 26, 2022 15:43

xinbinhuang requested a review from Embraser01 October 26, 2022 15:44

Embraser01 approved these changes Nov 21, 2022

View reviewed changes

Embraser01 merged commit 6e28cb2 into caddyserver:master Dec 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add healthz endpoint #107

Add healthz endpoint #107

xinbinhuang commented Sep 9, 2022 •

edited

Loading

xinbinhuang commented Sep 9, 2022 •

edited

Loading

nilathedragon commented Sep 9, 2022

Embraser01 left a comment

Embraser01 Sep 13, 2022

nilathedragon Sep 13, 2022

Embraser01 Sep 13, 2022

nilathedragon Sep 13, 2022

xinbinhuang Sep 13, 2022

xinbinhuang Sep 13, 2022

xinbinhuang Sep 14, 2022 •

edited

Loading

Embraser01 Nov 21, 2022

xinbinhuang commented Oct 26, 2022 •

edited

Loading

xinbinhuang commented Nov 24, 2022

Embraser01 commented Dec 6, 2022

Add healthz endpoint #107

Add healthz endpoint #107

Conversation

xinbinhuang commented Sep 9, 2022 • edited Loading

xinbinhuang commented Sep 9, 2022 • edited Loading

nilathedragon commented Sep 9, 2022

Embraser01 left a comment

Choose a reason for hiding this comment

Embraser01 Sep 13, 2022

Choose a reason for hiding this comment

nilathedragon Sep 13, 2022

Choose a reason for hiding this comment

Embraser01 Sep 13, 2022

Choose a reason for hiding this comment

nilathedragon Sep 13, 2022

Choose a reason for hiding this comment

xinbinhuang Sep 13, 2022

Choose a reason for hiding this comment

xinbinhuang Sep 13, 2022

Choose a reason for hiding this comment

xinbinhuang Sep 14, 2022 • edited Loading

Choose a reason for hiding this comment

Embraser01 Nov 21, 2022

Choose a reason for hiding this comment

xinbinhuang commented Oct 26, 2022 • edited Loading

xinbinhuang commented Nov 24, 2022

Embraser01 commented Dec 6, 2022

xinbinhuang commented Sep 9, 2022 •

edited

Loading

xinbinhuang commented Sep 9, 2022 •

edited

Loading

xinbinhuang Sep 14, 2022 •

edited

Loading

xinbinhuang commented Oct 26, 2022 •

edited

Loading