Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Timeout: Too large resource version" when running kubernetes 1.17 #4087

Closed
hugoShaka opened this issue Jan 5, 2021 · 0 comments · Fixed by #4097
Closed

"Timeout: Too large resource version" when running kubernetes 1.17 #4087

hugoShaka opened this issue Jan 5, 2021 · 0 comments · Fixed by #4097
Labels
>bug Something isn't working

Comments

@hugoShaka
Copy link

hugoShaka commented Jan 5, 2021

Bug Report

What did you do?

We updated kubernetes from 1.16 to 1.17.13 a couple of weeks ago.

What did you expect to see?

No error, same behaviour than with 1.16.

What did you see instead? Under which circumstances?

On a some clusters we got paged by a high apiserver latency. It was due to the eck operator performing LIST operations in loop for unused ressources (enterprisesearches).

Environment

  • ECK version:

    1.3.0

  • Kubernetes information:

    GKE 1.17.13-gke.2600

  • Resource definition: the error happens on unused CRDs, here enterprisesearches.

  • Logs:

I0105 21:08:43.745862       1 trace.go:205] Trace[639961347]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.18.6/tools/cache/reflector.go:125 (05-Jan-2021 21:08:04.717) (total time: 39028ms):
Trace[639961347]: [39.028591734s] [39.028591734s] END
E0105 21:08:43.745893       1 reflector.go:178] pkg/mod/k8s.io/client-go@v0.18.6/tools/cache/reflector.go:125: Failed to list *v1beta1.EnterpriseSearch: Timeout: Too large resource version: 74171625, current: 68292564
{"log.level":"info","@timestamp":"2021-01-05T21:10:07.430Z","log.logger":"generic-reconciler","message":"Updating resource","service.version":"1.3.0+6db1914b","service.type":"eck","ecs.version":"1.4.0","kind":"ConfigMap","namespace":"fleet-system","name":"elastic-lice
nsing"}
I0105 21:10:21.317149       1 trace.go:205] Trace[476061902]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.18.6/tools/cache/reflector.go:125 (05-Jan-2021 21:09:42.287) (total time: 39029ms):
Trace[476061902]: [39.029302474s] [39.029302474s] END
E0105 21:10:21.317175       1 reflector.go:178] pkg/mod/k8s.io/client-go@v0.18.6/tools/cache/reflector.go:125: Failed to list *v1beta1.EnterpriseSearch: Timeout: Too large resource version: 74171625, current: 68292564
I0105 21:11:53.584646       1 trace.go:205] Trace[1947998032]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.18.6/tools/cache/reflector.go:125 (05-Jan-2021 21:11:14.555) (total time: 39028ms):
Trace[1947998032]: [39.028814884s] [39.028814884s] END
E0105 21:11:53.584676       1 reflector.go:178] pkg/mod/k8s.io/client-go@v0.18.6/tools/cache/reflector.go:125: Failed to list *v1beta1.EnterpriseSearch: Timeout: Too large resource version: 74171625, current: 68292564

Cause

This is likely due to a bug between the apiserver 1.17.x and kubernetes clients prior 1.18.9 (eck 1.3.0 and 1.3.1 uses 1.18.6). kubernetes/kubernetes#94315

Updating client-go to 1.18.9 or higher would fix the bug, in the meantime running eck 1.3.x with a 1.17.x apiserver might cause random loops, pressuring the apiserver, when failing to handle a Too large resource version error. The only way out is to restart the operator.

@botelastic botelastic bot added the triage label Jan 5, 2021
@pebrc pebrc added >bug Something isn't working and removed triage labels Jan 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants