Flaky error checking the state of pods when running snc #920

adrianriobo · 2024-06-18T11:32:39Z

We are hitting a flaky error checking the state of the operators and pods after the cluster has been started and patches has been applied:

Retry 9/10 exited 1, retrying in 256 seconds...
+ sleep 256
+ all_pods_are_running_completed none
+ local ignoreNamespace=none
+ ./openshift-clients/linux/oc get pod --no-headers --all-namespaces '--field-selector=metadata.namespace!=none'
+ grep -v Running
+ grep -v Completed
openshift-kube-apiserver                           installer-11-crc                                         0/1   ContainerStatusUnknown   1                19m
+ exit=1
+ wait=512
+ count=10
+ '[' 10 -lt 10 ']'
+ echo 'Retry 10/10 exited 1, no more retries left.'
Retry 10/10 exited 1, no more retries left.

The text was updated successfully, but these errors were encountered:

Sometime pods goes to `ContainerStatusUnknown` state where it is not able to send the status to kubelet and it stays there till manually deleted and due to it our snc script fails. In this PR we are deleting the pods which are in failed state (which is the same for ContainerStatusUnknown one) and then checks the pods availablity. ``` + sleep 256 + all_pods_are_running_completed none + local ignoreNamespace=none + ./openshift-clients/linux/oc get pod --no-headers --all-namespaces '--field-selector=metadata.namespace!=none' + grep -v Running + grep -v Completed openshift-kube-apiserver installer-11-crc 0/1 ContainerStatusUnknown 1 19m + exit=1 + wait=512 + count=10 + '[' 10 -lt 10 ']' + echo 'Retry 10/10 exited 1, no more retries left.' Retry 10/10 exited 1, no more retries left. ``` fixes: crc-org#920

praveenkumar mentioned this issue Jun 19, 2024

snc-library: Delete the failed pods before check for available one #921

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flaky error checking the state of pods when running snc #920

Flaky error checking the state of pods when running snc #920

adrianriobo commented Jun 18, 2024

Flaky error checking the state of pods when running snc #920

Flaky error checking the state of pods when running snc #920

Comments

adrianriobo commented Jun 18, 2024