-
Notifications
You must be signed in to change notification settings - Fork 305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
high k8s etcd/apiserver load due to pod listings #753
Comments
Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! 🤗 |
something along the line of this should be more efficient (not yet tested): --- a/kubespawner/reflector.py
+++ b/kubespawner/reflector.py
@@ -208,7 +208,7 @@ class ResourceReflector(LoggingConfigurable):
self.watch_task = None
- async def _list_and_update(self):
+ async def _list_and_update(self, resource_version=None):
"""
Update current list of resources by doing a full fetch.
@@ -218,9 +218,12 @@ class ResourceReflector(LoggingConfigurable):
kwargs = dict(
label_selector=self.label_selector,
field_selector=self.field_selector,
+ resource_version=resource_version,
_request_timeout=self.request_timeout,
_preload_content=False,
)
+ if resource_version:
+ kwargs["resource_version_match"] = "NotOlderThan"
if not self.omit_namespace:
kwargs["namespace"] = self.namespace
@@ -264,6 +267,8 @@ class ResourceReflector(LoggingConfigurable):
selectors.append("field selector=%r" % self.field_selector)
log_selector = ', '.join(selectors)
+ resource_version = ""
+
cur_delay = 0.1
if self.omit_namespace:
@@ -282,7 +287,7 @@ class ResourceReflector(LoggingConfigurable):
start = time.monotonic()
w = watch.Watch()
try:
- resource_version = await self._list_and_update()
+ resource_version = await self._list_and_update(resource_version)
watch_args = {
"label_selector": self.label_selector,
"field_selector": self.field_selector,
@@ -325,6 +330,7 @@ class ResourceReflector(LoggingConfigurable):
else:
# This is an atomic operation on the dictionary!
self.resources[ref_key] = resource
+ resource_version = resource["metadata"]["resourceVersion"]
if self._stopping:
self.log.info("%s watcher stopped: inner", self.kind)
break |
Thanks for opening this, @juliantaylor! I think we already do something like this when handling just a single namespace ( kubespawner/kubespawner/reflector.py Line 239 in 8acc375
|
Bug description
In clusters with many pods the kubespawner creates significant load on the etcd/apiserver due to its frequent pod listings when using
enable_user_namespaces
which does listings of all pods in the whole cluster.full pod listings even with a label selector are very expensive as the pods in the etcd are only indexed by the namespaces, the label index is only performed in the apiserver caches.
When not setting resource_version (equivalent to setting it to "") the apiserver will always retrieve all data from etcd instead of its caches and thus the listing becomes very expensive [0].
Additionally your watches have a default timeout of 10s causing it to relist the pods frequently. In a cluster with 10k pods the listing can take several seconds in the etcd.
To improve the situation I would suggest to set resource_version=0 in the initial list of the _list_and_update and on watch timeouts pass it the last received resource version to it on subsequent calls. This has the effect that the request can be fulfilled by the apiserver caches without hitting the etcd if possible (which makes the label selector effective). This also means you do not have the strong consistency guarantee of the etcd on the listing.
If this is inappropriate I would suggest to put an additional warning into the
enable_user_namespaces
documentation to note that the watch timeout should be increased from the default 10s in large clusters.[0] https://kubernetes.io/docs/reference/using-api/api-concepts/#resource-versions
The text was updated successfully, but these errors were encountered: