-
Notifications
You must be signed in to change notification settings - Fork 39.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kube-controller-manager and kube-scheduler restarting in a clean installation #88111
Comments
/sig node |
looks like you logged a separate issue after our discussion here: this seems like an isolated case related to your setup (or just something rare). here you mentioned that you have tried a number of different CNIs, but if that is true why are the CNI pods missing in your
tagging as api-machinery because they own the etcd implementation. /remove-sig node |
Yes, I created this new issue
Just in case someone is interested, the script #!/bin/bash
NOW=`date +"%Y%m%d_%H%M%S"`
LOG_FILE="logs-${NOW}"
CMD='kubectl logs -n kube-system '
PODS=`kubectl get pods -A -o custom-columns=:metadata.name --no-headers=true`
LINE=$(printf "%0.s-" {1..20})
for p in $PODS
do
echo ${LINE} >> ${LOG_FILE}
echo "POD: "${p} >> ${LOG_FILE}
echo ${LINE} >> ${LOG_FILE}
${CMD} ${p} >> ${LOG_FILE}
done
LSL=$(awk -F'[ :.]' '/ leaderelection lost/{printf("%s:%s:%s\n"), $2,$3,$4}' ${LOG_FILE} | sort -u | head -1)
if [[ -z ${LSL} ]]; then
echo "No Leader selection lost"
exit
fi
echo "Leader selection lost: "$LSL
LSL_SEC=$(date +%s -d${LSL})
LSL_HM=$(date +"%H:%M" -d "@"${LSL_SEC})
MIN_BEFORE=$(date +"%H:%M" -d "@"$((${LSL_SEC} - 60)))
echo "Logs from "${MIN_BEFORE}" to " ${LSL_HM}
egrep -e '^---|^POD' -e " ${MIN_BEFORE}" -e " ${LSL_HM}" ${LOG_FILE} |
Regarding the metrics: curl -s localhost:2381/metrics | awk -F' ' '/fsync.*sum/{t=$2;printf("%s: ", $1); getline; if ( $2 != 0) {print t/$2*1000 "ms"} else {print "0 ms"}}' I have this values:
|
/assign @jingyih |
@fedebongio: GitHub didn't allow me to assign the following users: jingyih. Note that only kubernetes members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Tested with CentOS 8 and got the same problem |
The disk performance (fio test) looks good to me. Question:
|
Hi,
Thank you |
Hi, Finally I found the problem :-) 2 of 3 machines have the failure. Guess in which 2 machines I was doing the tests? Regards |
Thanks for reporting back! Happy to hear that you figured it out:) |
great! thanks for the update. i'm closing the issue and will point users to it if needed. |
@neolit123: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What happened:
Just after a installation
controller
andscheduler
begin to restart.What you expected to happen:
I wish to have a stable kubernetes installation without restarts
How to reproduce it (as minimally and precisely as possible):
I only see this problem in our "production" environment. Testing the same procedure with Virtual Box even in a laptop I cannot reproduce it.
Anything else we need to know?:
etcd has a lot of messages
took too long
with values over 15 seconds (took too long (18.015145443s) to execute
)Environment:
kubectl version
):cat /etc/os-release
):uname -a
):Installed with
kubeadm
There are two interfaces
eno1
(for external access) andenp129s0f0
(kuberentes network)I have used exactly the same
kubectl init
command in the "production" master than in the Virtual Box master but I got different IP in theget pods
command"Production" (restarting) IP --> 172.16.1.1 (kubernetes network)
Virtual Box (stable) IP --> 10.0.2.15 (external IP)
Disk performance
I have followed Using Fio to Tell Whether Your Storage is Fast Enough for Etcd
I have another k8s running in a similar machine with Ubuntu 16.04.4 and k8s 1.14.2 without problems but I need to upgrade to Ubuntu 19.10
The text was updated successfully, but these errors were encountered: