-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
auth token invalid after watch reconnects #11954
Comments
It seems a Calico issue should be logged so that client v3.4.9 or higher is used for Calico. |
upgrading client on calico won't eradicate the issue as watching would be terminated and I don't think there is any re-watch mechanism in calico. |
mark |
The following best practices can quickly avoid this problem:
|
/cc @mitake |
Sorry for late reply, I was on sick leave during last a few days. In my first attempt, I was going to implement refresh on the grpc mechanism, PerRPCCredentials.GetRequestMetadata, until I encountered a grpc issue grpc/grpc-go#3749, which was fixed in grpc/grpc-go#3677 just two months ago, and I don't think etcd could upgrade grpc deps in such a shot time that I decided to push ahead in a detour, stream interceptor. Even in my taste, I don't like my PR neither, it's smelly, too hard-coded, too inflexible, but we have to fix it, or it's not reliable on production. I noted PR #12165 won't eradicate the issue, though @cfc4n mentioned that they will raise another PR for it:
from #12157 (comment) so I was wondering what's the status of this issue, is it on @cfc4n 's charge (thanks~), and subsequently I close my PR? |
@jschwinger23 Can you try PR #12264 , thanks. |
Will try the first thing in the morning, thank you.
…On Wed, 2 Sep 2020, 1:04 am CFC4N, ***@***.***> wrote:
@jschwinger23 <https://github.com/jschwinger23> Can you try PR #12264
<#12264> , thanks.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#11954 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABRDGKCRBNYISJTPFBKMVWLSDUSQBANCNFSM4NNV3RBA>
.
|
working now, thanks. |
This version fixes etcd error "auth token invalid after watch reconnects" etcd-io/etcd#11954. Signed-off-by: André Martins <andre@cilium.io>
This version fixes etcd error "auth token invalid after watch reconnects" etcd-io/etcd#11954. Signed-off-by: André Martins <andre@cilium.io>
[ upstream commit 7b0037c ] This version fixes etcd error "auth token invalid after watch reconnects" etcd-io/etcd#11954. Signed-off-by: André Martins <andre@cilium.io>
[ upstream commit 7b0037c ] [ Backporter's notes: This is a complete re-vendoring of the relevant library, so includes a wider set of changes than the original commit. ] This version fixes etcd error "auth token invalid after watch reconnects" etcd-io/etcd#11954. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io>
[ upstream commit 7b0037c ] This version fixes etcd error "auth token invalid after watch reconnects" etcd-io/etcd#11954. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit 7b0037c ] This version fixes etcd error "auth token invalid after watch reconnects" etcd-io/etcd#11954. Signed-off-by: André Martins <andre@cilium.io>
[ upstream commit 7b0037c ] This version fixes etcd error "auth token invalid after watch reconnects" etcd-io/etcd#11954. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Paul Chaignon <paul@cilium.io>
Reproduce Procedure
though I don't think configure matters under this issue, let me present one of them
and the etcd processes are running simply by
no envirnments, no command line arguments
we can put a key to ensure the watch is working right now
wait for 5 min until the token is deleted by
simpleTokenKeeper
, then kill the etcd processes one by one and restart them immediately after killbe noted do NOT kill the process until the cluster recovers healthy
then you'll realize the watch is down with the output
permission deny
Analysis
the issue is cause by
simpleTokenKeeper
, here is the timelineTOKEN-A
TOKEN-A
, and watch / --prefix as expectedsimpleTokenKeeper
deleteTOKEN-A
TOKEN-A
has been deleted, because token is only checked upon grpc invocationTOKEN-A
authStore.AuthInfoFromCtx
will returnErrInvalidAuthToken
due toTOKEN-A
no longer existsImpact
the experiment is conducted using v3.4.9, the good part in this version is client will raise error
permission deny
and terminate watching;however in our live cluster, the etcd server is v3.4.3, etcdv3 client is v3.3.8, and there will be no error, no log, no output, no termination, everything looks good but the watch has failed in silence, this is bad.
sometimes we can barely control the client version, such as calico-felix v3.4 binds with clientv3 v3.3.8, and upgrade calico version is subtle in live.
Improvement
In my opinion there are 2 ways to improve:
keepalive response
or something like this, we can invokesimpleTokenKeeper.resetSimpleToken
to renew TTL<-chan WatchResponse
, if we could encapsulate the re-fetching token and re-dialing grpc connection after receivingtransport is closing
WatchResponse, the Watcher client, outside the interface, would not be influenced.Related issues
I presume the following issues are talking the exact same thing as I talk
#11121
#11381
looking forward to your kind feedback
The text was updated successfully, but these errors were encountered: