-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vtorc
: require topo for Healthy: true
in /debug/health
#17129
vtorc
: require topo for Healthy: true
in /debug/health
#17129
Conversation
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #17129 +/- ##
==========================================
+ Coverage 67.41% 67.43% +0.01%
==========================================
Files 1574 1574
Lines 253294 253299 +5
==========================================
+ Hits 170760 170806 +46
+ Misses 82534 82493 -41 ☔ View full report in Codecov by Sentry. |
Added backport labels because I think it's unsafe for VTOrc to report it is healthy when it cannot see the topo. This could lead to outages if a vtorc config is invalid and deployed for some time |
a8d6422
to
81e8f81
Compare
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
8c738f9
to
621e2a6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change makes sense to me. I do have one question inline.
Re: backports, deferring to @deepthi
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes LGTM. I'll defer to Deepthi as well as far as backports are concerned.
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, lgtm.
My understanding is that tests have been moved around but we haven't lost any coverage.
Should be fine. |
Description
This PR addresses #17121 by requiring that we're able to reach the topology at least once (even if it's just empty) before returning
Healthy: true
in/debug/health
. Today a VTOrc is considered healthy purely if it can write to it's own databaseThis is to prevent a
vtorc
deployment with a broken topo config to be seen as healthy, which in Kube can cause a bad config deploy to rollout to all nodes when it could fail on the first that it breaksAlso some error logging was consolidated so we don't log the topo failure multiple times for the same call (see below)
Example with this change:
And in another shell:
(previous to this PR
"Healthy": true
was returned)Backport reason
I think this should be backported to prevent users from the scenario described above. This issue could lead to a user believing a VTOrc deployment is healthy when it's actually unable to load anything from the topo
Related Issue(s)
Resolves #17121
Checklist
Deployment Notes