-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't drop traffic when upgrading a deployment fails #14795
Conversation
8e0535d
to
a18e776
Compare
d2f5d97
to
1687537
Compare
1687537
to
6ac8cfa
Compare
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #14795 +/- ##
==========================================
- Coverage 86.02% 85.78% -0.24%
==========================================
Files 197 198 +1
Lines 14950 15126 +176
==========================================
+ Hits 12860 12976 +116
- Misses 1778 1827 +49
- Partials 312 323 +11 ☔ View full report in Codecov by Sentry. |
When transforming the deployment status to the revision we want to bubble up the more severe condition to Ready. Since Replica failures will include a more actionable error message this condition is preferred
This isn't accurate when the Revision has failed to rollout an update to it's deployment
6ac8cfa
to
eaa966b
Compare
1. PA Reachability now depends on the status of the Deployment If we have available replicas we don't mark the revision as unreachable. This allows ongoing requests to be handled 2. Always propagate the K8s Deployment Status to the Revision. We don't need to gate this depending on whether the Revision required activation. Since the only two conditions we propagate from the Deployment is Progressing and ReplicaSetFailure=False 3. Mark Revision as Deploying if the PA's service name isn't set
eaa966b
to
23247c1
Compare
/test-all |
cf53416
to
b8b019b
Compare
I'm testing the upgrade test to the 1.12 branch to ensure it fails when the fix isn't present. - #14840 |
b8b019b
to
87c1f10
Compare
3882852
to
42dc27d
Compare
591db65
to
42dc27d
Compare
/hold So the upgrade test is failing because the bug exists in 1.12 and 1.13 release branches. When we upgrade from those older release branches the older controller observes the config map changes and then marks reachability=false incorrectly. Thus to ensure this upgrade test works we should cherry pick to 1.12 - perform a release, then cherry pick to 1.13 then do a release. |
/retest v1.13.1 is out that should unblock the upgrade test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dprotaso, ReToCode The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
Fixes: #14660
Depends on 1.13 point release with the following fix #14846