"Timeout: Too large resource version" when running kubernetes 1.17 #4087

hugoShaka · 2021-01-05T22:33:18Z

Bug Report

What did you do?

We updated kubernetes from 1.16 to 1.17.13 a couple of weeks ago.

What did you expect to see?

No error, same behaviour than with 1.16.

What did you see instead? Under which circumstances?

On a some clusters we got paged by a high apiserver latency. It was due to the eck operator performing LIST operations in loop for unused ressources (enterprisesearches).

Environment

ECK version:

1.3.0
Kubernetes information:

GKE 1.17.13-gke.2600
Resource definition: the error happens on unused CRDs, here enterprisesearches.
Logs:

I0105 21:08:43.745862       1 trace.go:205] Trace[639961347]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.18.6/tools/cache/reflector.go:125 (05-Jan-2021 21:08:04.717) (total time: 39028ms):
Trace[639961347]: [39.028591734s] [39.028591734s] END
E0105 21:08:43.745893       1 reflector.go:178] pkg/mod/k8s.io/client-go@v0.18.6/tools/cache/reflector.go:125: Failed to list *v1beta1.EnterpriseSearch: Timeout: Too large resource version: 74171625, current: 68292564
{"log.level":"info","@timestamp":"2021-01-05T21:10:07.430Z","log.logger":"generic-reconciler","message":"Updating resource","service.version":"1.3.0+6db1914b","service.type":"eck","ecs.version":"1.4.0","kind":"ConfigMap","namespace":"fleet-system","name":"elastic-lice
nsing"}
I0105 21:10:21.317149       1 trace.go:205] Trace[476061902]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.18.6/tools/cache/reflector.go:125 (05-Jan-2021 21:09:42.287) (total time: 39029ms):
Trace[476061902]: [39.029302474s] [39.029302474s] END
E0105 21:10:21.317175       1 reflector.go:178] pkg/mod/k8s.io/client-go@v0.18.6/tools/cache/reflector.go:125: Failed to list *v1beta1.EnterpriseSearch: Timeout: Too large resource version: 74171625, current: 68292564
I0105 21:11:53.584646       1 trace.go:205] Trace[1947998032]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.18.6/tools/cache/reflector.go:125 (05-Jan-2021 21:11:14.555) (total time: 39028ms):
Trace[1947998032]: [39.028814884s] [39.028814884s] END
E0105 21:11:53.584676       1 reflector.go:178] pkg/mod/k8s.io/client-go@v0.18.6/tools/cache/reflector.go:125: Failed to list *v1beta1.EnterpriseSearch: Timeout: Too large resource version: 74171625, current: 68292564

Cause

This is likely due to a bug between the apiserver 1.17.x and kubernetes clients prior 1.18.9 (eck 1.3.0 and 1.3.1 uses 1.18.6). kubernetes/kubernetes#94315

Updating client-go to 1.18.9 or higher would fix the bug, in the meantime running eck 1.3.x with a 1.17.x apiserver might cause random loops, pressuring the apiserver, when failing to handle a Too large resource version error. The only way out is to restart the operator.

The text was updated successfully, but these errors were encountered:

botelastic bot added the triage label Jan 5, 2021

pebrc added >bug Something isn't working and removed triage labels Jan 6, 2021

pebrc mentioned this issue Jan 7, 2021

Update client-go to 0.18.14 #4097

Merged

pebrc closed this as completed in #4097 Jan 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Timeout: Too large resource version" when running kubernetes 1.17 #4087

"Timeout: Too large resource version" when running kubernetes 1.17 #4087

hugoShaka commented Jan 5, 2021 •

edited

Loading

"Timeout: Too large resource version" when running kubernetes 1.17 #4087

"Timeout: Too large resource version" when running kubernetes 1.17 #4087

Comments

hugoShaka commented Jan 5, 2021 • edited Loading

Bug Report

Cause

hugoShaka commented Jan 5, 2021 •

edited

Loading