New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Fix evented pleg mirror pod & use IsEventedPLEGInUse instead of FG status check #122778

Open

pacoxu wants to merge 5 commits into kubernetes:master from pacoxu:fix-evented-pleg-mirro-pod-3

Member

pacoxu commented Jan 15, 2024 •

edited

Loading

This PR is based on #122763, and we use IsEventedPLEGInUse() instead of utilfeature.DefaultFeatureGate.Enabled(features.EventedPLEG).

Context can be found in #122763.

kubelet: fix a static pod startup bug with Evented PLEG

Contributor

k8s-ci-robot commented Jan 15, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

k8s-ci-robot added do-not-merge/work-in-progress size/M do-not-merge/release-note-label-needed cncf-cla: yes do-not-merge/needs-kind do-not-merge/needs-sig labels

Member Author

pacoxu commented Jan 15, 2024

/test pull-kubernetes-e2e-kind-alpha-beta-features
/test pull-kubernetes-e2e-kind-alpha-features
/test pull-kubernetes-e2e-kind-beta-features

Contributor

k8s-ci-robot commented Jan 15, 2024

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added needs-triage needs-priority area/kubelet area/test sig/node sig/testing and removed do-not-merge/needs-sig labels

k8s-ci-robot requested review from dchen1107 and mrunalp

January 15, 2024 06:11

Member Author

pacoxu commented Jan 15, 2024

/test pull-kubernetes-e2e-kind-alpha-beta-features
/test pull-kubernetes-e2e-kind-alpha-features

Member Author

pacoxu commented Jan 15, 2024

/test pull-kubernetes-unit
/test pull-kubernetes-integration

Member Author

pacoxu commented Jan 15, 2024

/test pull-kubernetes-e2e-kind-alpha-beta-features
/test pull-kubernetes-e2e-kind-alpha-features

2 similar comments

Member Author

pacoxu commented Jan 15, 2024

/test pull-kubernetes-e2e-kind-alpha-beta-features
/test pull-kubernetes-e2e-kind-alpha-features

Member Author

pacoxu commented Jan 15, 2024

/test pull-kubernetes-e2e-kind-alpha-beta-features
/test pull-kubernetes-e2e-kind-alpha-features

pacoxu marked this pull request as ready for review

January 15, 2024 08:41

k8s-ci-robot requested review from odinuge and tallclair

January 15, 2024 08:41

Member Author

pacoxu commented Jan 15, 2024

/test pull-kubernetes-e2e-kind-alpha-beta-features
/test pull-kubernetes-e2e-kind-alpha-features

1 similar comment

Member Author

pacoxu commented Jan 15, 2024

/test pull-kubernetes-e2e-kind-alpha-beta-features
/test pull-kubernetes-e2e-kind-alpha-features

This comment was marked as duplicate.

Sign in to view

1 similar comment

This comment was marked as duplicate.

Sign in to view

wxx213 commented Sep 2, 2024

The dual running issue is not caused by rpc timeout, but for a immediate restart(recreate) for a new container of a static pod currently, this doesn't happen for pod that is not a static pod in the kubernetes manifest. This can be reproduced in the e2e CI using kind. apiserver/controller-manager will fail to start due to dup container created for the same pod, and the second will fail to start due to port already binding. More details in #123087.

@pacoxu I am temporarily unable to access kubernets slack for your discussion context, I'm still confused that how the dup container was created, as we know from the codes, only one podWorkerLoop work could be executed per pod where the pod containers are created and started, could you give your assumption for this?

Member Author

pacoxu commented Sep 2, 2024 •

edited

Loading

The dual running issue is not caused by rpc timeout, but for a immediate restart(recreate) for a new container of a static pod currently, this doesn't happen for pod that is not a static pod in the kubernetes manifest. This can be reproduced in the e2e CI using kind. apiserver/controller-manager will fail to start due to dup container created for the same pod, and the second will fail to start due to port already binding. More details in #123087.

@pacoxu I am temporarily unable to access kubernets slack for your discussion context, I'm still confused that how the dup container was created, as we know from the codes, only one podWorkerLoop work could be executed per pod where the pod containers are created and started, could you give your assumption for this?

@wxx213 This only happened to static pod. In the test CI failure, we find the apiserver pod cannot be started correctly. This bug doesn't occur to a pod which is not static pod, and this is why some users is using the EventedPLEG in their production without a problem.

This bug should be triggered when below two things happened at the same time.

for the change here, a CREATED event from container runtime triggered a restart here.
- if it is not EventedPLEG, but Generic PLEG, the CREATED state of a static pod is in a quite short period and will be ignored.
- if it is not a staticpod, the logic may be like your statement: the pod should be handled by a podWorkerLoop.
meanwhile, the old CREATED container is started(probably, the startContainer is called at the same time the restart happens above.)

BTW, I tried to reproduce the failure in #127063 to show the problem.

Member Author

pacoxu commented Sep 2, 2024

https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/127063/pull-kubernetes-e2e-kind-alpha-features/1830544265197916160

@wxx213 You can see the dup container in https://gcsweb.k8s.io/gcs/kubernetes-jenkins/pr-logs/pull/127063/pull-kubernetes-e2e-kind-alpha-features/1830544265197916160/artifacts/kind-control-plane/pods/kube-system_kube-apiserver-kind-control-plane_6c276a318e8e1532ab2e7dd8dbcfebba/kube-apiserver/

The kubelet log can be found in https://gcsweb.k8s.io/gcs/kubernetes-jenkins/pr-logs/pull/127063/pull-kubernetes-e2e-kind-alpha-features/1830544265197916160/artifacts/kind-control-plane/

wxx213 commented Sep 2, 2024

https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/127063/pull-kubernetes-e2e-kind-alpha-features/1830544265197916160

@wxx213 You can see the dup container in https://gcsweb.k8s.io/gcs/kubernetes-jenkins/pr-logs/pull/127063/pull-kubernetes-e2e-kind-alpha-features/1830544265197916160/artifacts/kind-control-plane/pods/kube-system_kube-apiserver-kind-control-plane_6c276a318e8e1532ab2e7dd8dbcfebba/kube-apiserver/

The kubelet log can be found in https://gcsweb.k8s.io/gcs/kubernetes-jenkins/pr-logs/pull/127063/pull-kubernetes-e2e-kind-alpha-features/1830544265197916160/artifacts/kind-control-plane/

Thanks, I checked the kubelet log for pod kube-system/kube-apiserver-kind-control-plane, and it seems that the start event of the app container wasn't synced to kubelet by evented pleg timely caused this issue. It's a little same as the rpc timeout situation where sometimes the pod is trapped in "dual running" state that what we are suffering.

I summarized the timeline from the log for pod kube-system/kube-apiserver-kind-control-plane:

I0902 10:03:22.220403  kubelet.go:1758] "SyncPod enter", the first sync loop for pod kube-system/kube-apiserver-kind-control-plane started
I0902 10:03:22.710694  evented.go:295 "Received Container Started Event", the evented pleg got container start event, 
    container id: 5dd8a71e5d80f85e0a7c672b7c9a865807c595e403aefa8a33d7266cf5d92811 which is a sandbox container

I0902 10:03:22.710935  pod_workers.go:963] "Notifying pod of pending update, then triggered a pod work pendding for pod 
    kube-system/kube-apiserver-kind-control-plane

I0902 10:03:23.634078  evented.go:292 "Received Container Created Event" , the evented pleg got container created event, 
    container id: eacee74973c186af7e16228a0fb22bb624a25ade9c499e5355910c8659a53968 which is a app container

I0902 10:03:23.839023  kubelet.go:1764 "SyncPod exit", the first sync loop for pod kube-system/kube-apiserver-kind-control-plane finished

I0902 10:03:23.839162  kubelet.go:1758] "SyncPod enter", the second sync loop for pod kube-system/kube-apiserver-kind-control-plane started, the work only 
    got the created state from evented pleg of the app container eacee74973c186af7e16228a0fb22bb624a25ade9c499e5355910c8659a53968
    
I0902 10:03:23.842087  evented.go:295] "Received Container Started Event", container id: eacee74973c186af7e16228a0fb22bb624a25ade9c499e5355910c8659a53968, 
    and then the evented pleg got the started event of the app container eacee74973c186af7e16228a0fb22bb624a25ade9c499e5355910c8659a53968, but the second 
    container sync loop work has already started.

I0902 10:03:23.845966  kuberuntime_manager.go:968 "Container of pod is not in the desired state and shall be started" containerName="kube-apiserver" 
    pod="kube-system/kube-apiserver-kind-control-plane", so the second container sync loop work think the app container didn't started and try to create a 
    new and started failed for the first one was in running state.

Member Author

pacoxu commented Sep 3, 2024 •

edited

Loading

the start event of the app container wasn't synced to kubelet by evented pleg timely

Does this mean that the kubelet got the started event first and then the created event? Is this an expected behavior?

Rockyspade mentioned this pull request

Create docker-image.yml Rockyspade/kubernetes#1

Open

wxx213 commented Sep 3, 2024

the start event of the app container wasn't synced to kubelet by evented pleg timely

Does this mean that the kubelet got the started event first and then the created event? Is this an expected behavior?

The created event always arrives in kubelet before started event for the same container.

In this case, the started event arrived in kubelet after the second pod sync work started as the timeline described, so the second pod sync work could only got the created state from the pod cache when started the work(the code: podWorkerLoop). Then the second sync work think the app container should be restarted(recreate) in computePodActions

I'm not sure why app container's started event didn't arrive in kubelet just when first pod sync worke finished where the start function already be called for the app container, maybe because the start event is send by async like containerd

Anyway, kubelet should manage this information gap correctly with runtime.

Member Author

pacoxu commented Sep 3, 2024

Returning to the current PR fix, the current solution f99b04e should correctly handle this scenario. Do we have some better approach? I'm afraid that this is the simplest fix for the bug.

If a pod hangs for more than 1m, we should trigger the restart then.

IMO if the current approach is the intended fix, the key point of discussion is whether 1 minute is an appropriate time threshold to trigger the restart.

wxx213 mentioned this pull request

fix evented pleg global pod cache update issue #127086

Closed

This was referenced Sep 6, 2024

Avoid SidecarContainers code path for non-sidecar pods #127162

Merged

kubelet: delete channel from the terminated after closing it #122781

Merged

Contributor

hshiina commented Sep 10, 2024

#126543 has been recently merged. A similar fix would be necessary for init containers:

kubernetes/pkg/kubelet/kuberuntime/kuberuntime_container.go

Lines 1086 to 1092 in 17bae91

    
           case kubecontainer.ContainerStateCreated: 
        
           	// The main sync loop should have created and started the container 
        
           	// in one step. If the init container is in the 'created' state, 
        
           	// it is likely that the container runtime failed to start it. To 
        
           	// prevent the container from getting stuck in the 'created' state, 
        
           	// restart it. 
        
           	changes.InitContainersToStart = append(changes.InitContainersToStart, i)

I reproduced the same issue with an init container on recent master:

$ cat pod.yaml
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: testpod
  name: testpod
spec:
  containers:
  - command:
    - sh
    - -c
    - trap exit TERM; while true; do sleep 1; done
    image: busybox
    name: testcontainer
    resources: {}
  initContainers:
  - command:
    - mkdir
    - /myvolume/test
    image: busybox
    name: testinitcontainer
    volumeMounts:
    - mountPath: /myvolume
      name: myvolume
  volumes:
  - name: myvolume
    emptyDir: {}
  dnsPolicy: ClusterFirst
  restartPolicy: Always
status: {}

$ kubectl create -f pod.yaml
pod/testpod created

$ kubectl get pod testpod -w
NAME  	READY   STATUS              	RESTARTS 	AGE
testpod   0/1 	Init:CrashLoopBackOff   2 (3s ago)   7s
testpod   0/1 	Init:0/1            	3 (14s ago)   18s
testpod   0/1 	Init:Error          	3 (14s ago)   18s
testpod   0/1 	Init:CrashLoopBackOff   3 (0s ago)	18s
testpod   0/1 	Init:0/1            	4 (27s ago)   45s
testpod   0/1 	Init:Error          	4 (27s ago)   45s
testpod   0/1 	Init:CrashLoopBackOff   4 (0s ago)	45s

This message implies that another init container was created:

$ kubectl logs testpod -c testinitcontainer
mkdir: can't create directory '/myvolume/test': File exists

Member Author

pacoxu commented Sep 12, 2024

/test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e

pacoxu force-pushed the fix-evented-pleg-mirro-pod-3 branch from f99b04e to 76d8dae Compare

September 12, 2024 07:49

Member Author

pacoxu commented Sep 12, 2024

#126543 has been recently merged. A similar fix would be necessary for init containers:

I reproduced the same issue with an init container on recent master:

@hshiina I rebased with the latest code.

Do you know if we have an e2e test to address the init container restart problem? Or probably, we should add one.

Member Author

pacoxu commented Sep 12, 2024

/test pull-kubernetes-e2e-kind-beta-features
/test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e
/test pull-kubernetes-e2e-kind-evented-pleg
/test pull-kubernetes-node-kubelet-serial-containerd
/test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-features
/test pull-kubernetes-cos-cgroupv1-containerd-node-e2e-features

pacoxu mentioned this pull request

[Flaking Test] [EventedPLEG] Containers Lifecycle should continue running liveness probes for restartable init containers and restart them while in preStop #127312

Open

Contributor

hshiina commented Sep 12, 2024

#126543 has been recently merged. A similar fix would be necessary for init containers:
I reproduced the same issue with an init container on recent master:

@hshiina I rebased with the latest code.

Do you know if we have an e2e test to address the init container restart problem? Or probably, we should add one.

There doesn't look to be a test that address this problem explicitly. However, I guess a test that runs a web server on a init container like the following one will fail if we hit this issue. Init containers created later will fail to start with an error like "address already in use" and its restart count will be increased:

kubernetes/test/e2e/common/node/container_probe.go

Lines 787 to 788 in 8e3adc4

    
           ginkgo.It("with readiness probe that fails should never be ready and never restart", func(ctx context.Context) { 
        
           	p := podClient.Create(ctx, testWebServerSidecarPodSpec(probe.withFailing().build(), nil, "test-webserver", 80))

Please note that #124297 made this issue less likely to occur.

pacoxu mentioned this pull request

pleg: enable tests on file changes kubernetes/test-infra#33463

Open

pacoxu added 5 commits

September 25, 2024 10:21


          for CREATED state, skip if EventedPLEG is enabled

9b41caa


          fix eventedPLEG flake: init container may waiting with empty reason b…

f05bcac

…efore seeting it to PodInitializing


          add log for whether trigger start/restart for Created state

9e71c9c


          use IsEventedPLEGInUse instead of check FG status

2d797b3


          restart with a time check for evented pleg: default 1m

e404a28

pacoxu force-pushed the fix-evented-pleg-mirro-pod-3 branch from 76d8dae to e404a28 Compare

September 25, 2024 02:22

Member Author

pacoxu commented Sep 25, 2024 •

edited

Loading

after rebasing with master to include some related changes.
/test pull-kubernetes-e2e-kind-beta-features
/test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e
/test pull-kubernetes-e2e-kind-evented-pleg
/test pull-kubernetes-node-kubelet-serial-containerd
/test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-features
/test pull-kubernetes-cos-cgroupv1-containerd-node-e2e-features

Member Author

pacoxu commented Oct 11, 2024

/test pull-kubernetes-node-e2e-containerd-alpha-features

Contributor

k8s-ci-robot commented Oct 11, 2024

@pacoxu: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e	`e404a28`	link	false	`/test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e`
pull-kubernetes-node-e2e-containerd-alpha-features	`e404a28`	link	false	`/test pull-kubernetes-node-e2e-containerd-alpha-features`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Member

SergeyKanzhelev commented Oct 11, 2024

should it also be handled here: #126543? I remember that PR was introducing something similar for sidecars

This is one of the PRs that are hard to reason about - it fixes EventedPLEG feature's bug and introduces changes all over, potentially affecting generic PLEG as well. Please make sure all related places are well-documented that those are only for EventedPLEG.

Finally, is the test change covers this fix so we will catch regression in the future? Or we may need to do something more than we already have?

Member Author

pacoxu commented Oct 12, 2024

should it also be handled here: #126543? I remember that PR was introducing something similar for sidecars

This is one of the PRs that are hard to reason about - it fixes EventedPLEG feature's bug and introduces changes all over, potentially affecting generic PLEG as well. Please make sure all related places are well-documented that those are only for EventedPLEG.

Yes. This PR originally only fixed the problem of static pod when EventedPLEG is enabled.

#126543 mentions that there is a similar issue for sidecar/init container.

Finally, is the test change covers this fix so we will catch regression in the future? Or we may need to do something more than we already have?

This can be mocked after #127495 was merged. I can write a new Test to cover those cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

wxx213 wxx213 left review comments

dchen1107 Awaiting requested review from dchen1107

mrunalp Awaiting requested review from mrunalp

odinuge Awaiting requested review from odinuge

tallclair Awaiting requested review from tallclair

aojea Awaiting requested review from aojea

sairameshv Awaiting requested review from sairameshv

SergeyKanzhelev Awaiting requested review from SergeyKanzhelev

mikebrow Awaiting requested review from mikebrow

smarterclayton Awaiting requested review from smarterclayton

harche Awaiting requested review from harche

yujuhong Awaiting requested review from yujuhong

Labels

area/kubelet area/test cncf-cla: yes do-not-merge/hold kind/bug needs-triage priority/important-soon release-note sig/node sig/testing size/M