-
Notifications
You must be signed in to change notification settings - Fork 39.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows kubelet stats timeout updates #87730
Windows kubelet stats timeout updates #87730
Conversation
/sig windows /test pull-kubernetes-e2e-aks-engine-azure-windows |
/retest pull-kubernetes-integration |
/milestone v1.18 |
I'm testing a custom build with these changes plus #86101 since it also depends on metrics :) |
/assign @yliaog |
/assign @benmoss |
/lgtm |
test/e2e/windows/kubelet_stats.go
Outdated
} | ||
|
||
if foundNode == false { | ||
framework.Skipf("Could not find and ready and schedulable Windows nodes") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is failing batch merge (see https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/batch/pull-kubernetes-bazel-build/1225465355036528642)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like skipf got moved from framework to framework/skipper with this commit
641321c
I'll rebase and push an update
/test pull-kubernetes-bazel-build |
…let stats for windows nodes
…atly reduce latency
e54fd36
to
999fdfa
Compare
@marosset: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/retest |
/lgtm |
…30-upstream-release-1.15 Automated cherry pick of #87730 upstream release 1.15
…30-upstream-release-1.17 Automated cherry pick of #87730 upstream release 1.17
…30-upstream-release-1.16 Automated cherry pick of #87730 upstream release 1.16
…s are present on the node Following changes in kubernetes#87730, Kubelet is directly hcsshim to gather stats. However, unlike `docker stats` API that was used before, hcsshim does not keep information about exited containers. When the Kubelet lists containers (`docker_container.go:ListContainers()`), it sets `All: true`, retrieving non-running containers. When docker stats is called with such container id, it'll return a valid JSON with all values set to 0. The non-running containers are filtered later on in the process. When the hcsshim is called with such container id, it'll return an error, effectively stopping the stats retrieval for all containers.
…s are present on the node Following changes in kubernetes#87730, Kubelet is directly hcsshim to gather stats. However, unlike `docker stats` API that was used before, hcsshim does not keep information about exited containers. When the Kubelet lists containers (`docker_container.go:ListContainers()`), it sets `All: true`, retrieving non-running containers. When docker stats is called with such container id, it'll return a valid JSON with all values set to 0. The non-running containers are filtered later on in the process. When the hcsshim is called with such container id, it'll return an error, effectively stopping the stats retrieval for all containers.
…s are present on the node Following changes in kubernetes#87730, Kubelet is directly hcsshim to gather stats. However, unlike `docker stats` API that was used before, hcsshim does not keep information about exited containers. When the Kubelet lists containers (`docker_container.go:ListContainers()`), it sets `All: true`, retrieving non-running containers. When docker stats is called with such container id, it'll return a valid JSON with all values set to 0. The non-running containers are filtered later on in the process. When the hcsshim is called with such container id, it'll return an error, effectively stopping the stats retrieval for all containers.
…s are present on the node Following changes in kubernetes#87730, Kubelet is directly hcsshim to gather stats. However, unlike `docker stats` API that was used before, hcsshim does not keep information about exited containers. When the Kubelet lists containers (`docker_container.go:ListContainers()`), it sets `All: true`, retrieving non-running containers. When docker stats is called with such container id, it'll return a valid JSON with all values set to 0. The non-running containers are filtered later on in the process. When the hcsshim is called with such container id, it'll return an error, effectively stopping the stats retrieval for all containers.
What type of PR is this?
/kind bug
What this PR does / why we need it:
This PR addresses an issue where kubelet metrics call take a very long time on Windows nodes if more than a handful of containers are running.
Which issue(s) this PR fixes:
Fixes Stats performance is slow on Windows #74991
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: