-
Notifications
You must be signed in to change notification settings - Fork 825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add timeout to SDK k8s client #3070
Conversation
f0fc75d
to
33aff15
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a good change no matter what 👍🏻
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: markmandel, zmerlynn The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
New changes are detected. LGTM label has been removed. |
Build Succeeded 👏 Build Id: 8a612f44-eae4-49b7-ad09-35feb7880ab3 The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
The SDK client only ever accesses small amounts of data (single object list / event updates), latency more than a couple of seconds is excessive. We need to keep a relatively tight timeout during initialization as well to allow the informer a chance to retry - the SDK won't reply to /healthz checks until the informer has synced once, and our liveness configuration only allows 9s before a positive /healthz.
The problem addressed by googleforgames#3070 is that on an indeterminate basis, we are seeing containers start without networking fully available. Once networking seems to work, it works fine. However, the fix in googleforgames#3070 introduced a downside: heavy watch traffic, because I didn't quite understand that it would also block the hanging GET of the watch. See googleforgames#3106. Instead of timing out the whole client, let's use an initial-probe approach and instead block on a successful GET (with a reasonable timeout) before we try to start informers. Fixes googleforgames#3106
The problem addressed by googleforgames#3070 is that on an indeterminate basis, we are seeing containers start without networking fully available. Once networking seems to work, it works fine. However, the fix in googleforgames#3070 introduced a downside: heavy watch traffic, because I didn't quite understand that it would also block the hanging GET of the watch. See googleforgames#3106. Instead of timing out the whole client, let's use an initial-probe approach and instead block on a successful GET (with a reasonable timeout) before we try to start informers. Fixes googleforgames#3106
The problem addressed by googleforgames#3070 is that on an indeterminate basis, we are seeing containers start without networking fully available. Once networking seems to work, it works fine. However, the fix in googleforgames#3070 introduced a downside: heavy watch traffic, because I didn't quite understand that it would also block the hanging GET of the watch. See googleforgames#3106. Instead of timing out the whole client, let's use an initial-probe approach and instead block on a successful GET (with a reasonable timeout) before we try to start informers. Fixes googleforgames#3106
The problem addressed by googleforgames#3070 is that on an indeterminate basis, we are seeing containers start without networking fully available. Once networking seems to work, it works fine. However, the fix in googleforgames#3070 introduced a downside: heavy watch traffic, because I didn't quite understand that it would also block the hanging GET of the watch. See googleforgames#3106. Instead of timing out the whole client, let's use an initial-probe approach and instead block on a successful GET (with a reasonable timeout) before we try to start informers. Fixes googleforgames#3106
The problem addressed by googleforgames#3070 is that on an indeterminate basis, we are seeing containers start without networking fully available. Once networking seems to work, it works fine. However, the fix in googleforgames#3070 introduced a downside: heavy watch traffic, because I didn't quite understand that it would also block the hanging GET of the watch. See googleforgames#3106. Instead of timing out the whole client, let's use an initial-probe approach and instead block on a successful GET (with a reasonable timeout) before we try to start informers. Fixes googleforgames#3106
The problem addressed by googleforgames#3070 is that on an indeterminate basis, we are seeing containers start without networking fully available. Once networking seems to work, it works fine. However, the fix in googleforgames#3070 introduced a downside: heavy watch traffic, because I didn't quite understand that it would also block the hanging GET of the watch. See googleforgames#3106. Instead of timing out the whole client, let's use an initial-probe approach and instead block on a successful GET (with a reasonable timeout) before we try to start informers. Fixes googleforgames#3106
The problem addressed by googleforgames#3070 is that on an indeterminate basis, we are seeing containers start without networking fully available. Once networking seems to work, it works fine. However, the fix in googleforgames#3070 introduced a downside: heavy watch traffic, because I didn't quite understand that it would also block the hanging GET of the watch. See googleforgames#3106. Instead of timing out the whole client, let's use an initial-probe approach and instead block on a successful GET (with a reasonable timeout) before we try to start informers. Fixes googleforgames#3106
The problem addressed by googleforgames#3070 is that on an indeterminate basis, we are seeing containers start without networking fully available. Once networking seems to work, it works fine. However, the fix in googleforgames#3070 introduced a downside: heavy watch traffic, because I didn't quite understand that it would also block the hanging GET of the watch. See googleforgames#3106. Instead of timing out the whole client, let's use an initial-probe approach and instead block on a successful GET (with a reasonable timeout) before we try to start informers. Fixes googleforgames#3106
The problem addressed by googleforgames#3070 is that on an indeterminate basis, we are seeing containers start without networking fully available. Once networking seems to work, it works fine. However, the fix in googleforgames#3070 introduced a downside: heavy watch traffic, because I didn't quite understand that it would also block the hanging GET of the watch. See googleforgames#3106. Instead of timing out the whole client, let's use an initial-probe approach and instead block on a successful GET (with a reasonable timeout) before we try to start informers. Fixes googleforgames#3106
* Revert #3070, wait on networking a different way The problem addressed by #3070 is that on an indeterminate basis, we are seeing containers start without networking fully available. Once networking seems to work, it works fine. However, the fix in #3070 introduced a downside: heavy watch traffic, because I didn't quite understand that it would also block the hanging GET of the watch. See #3106. Instead of timing out the whole client, let's use an initial-probe approach and instead block on a successful GET (with a reasonable timeout) before we try to start informers. Along the way: fix nil pointer deref when TestPingHTTP fails Fixes #3106
This seems to help with (many of?) the flakes we're seeing in CI by forcing the informer to retry lists, rather than the SDK dying after 30s of hanging.