Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky error checking the state of pods when running snc #920

Open
adrianriobo opened this issue Jun 18, 2024 · 0 comments
Open

Flaky error checking the state of pods when running snc #920

adrianriobo opened this issue Jun 18, 2024 · 0 comments

Comments

@adrianriobo
Copy link
Contributor

We are hitting a flaky error checking the state of the operators and pods after the cluster has been started and patches has been applied:

Retry 9/10 exited 1, retrying in 256 seconds...
+ sleep 256
+ all_pods_are_running_completed none
+ local ignoreNamespace=none
+ ./openshift-clients/linux/oc get pod --no-headers --all-namespaces '--field-selector=metadata.namespace!=none'
+ grep -v Running
+ grep -v Completed
openshift-kube-apiserver                           installer-11-crc                                         0/1   ContainerStatusUnknown   1                19m
+ exit=1
+ wait=512
+ count=10
+ '[' 10 -lt 10 ']'
+ echo 'Retry 10/10 exited 1, no more retries left.'
Retry 10/10 exited 1, no more retries left.
praveenkumar added a commit to praveenkumar/snc that referenced this issue Jun 19, 2024
Sometime pods goes to `ContainerStatusUnknown` state where it is not
able to send the status to kubelet and it stays there till manually
deleted and due to it our snc script fails. In this PR we are deleting
the pods which are in failed state (which is the same for
ContainerStatusUnknown one) and then checks the pods availablity.

```
+ sleep 256
+ all_pods_are_running_completed none
+ local ignoreNamespace=none
+ ./openshift-clients/linux/oc get pod --no-headers --all-namespaces '--field-selector=metadata.namespace!=none'
+ grep -v Running
+ grep -v Completed
openshift-kube-apiserver                           installer-11-crc                                         0/1   ContainerStatusUnknown   1                19m
+ exit=1
+ wait=512
+ count=10
+ '[' 10 -lt 10 ']'
+ echo 'Retry 10/10 exited 1, no more retries left.'
Retry 10/10 exited 1, no more retries left.
```

fixes: crc-org#920
praveenkumar added a commit to praveenkumar/snc that referenced this issue Jul 3, 2024
Sometime pods goes to `ContainerStatusUnknown` state where it is not
able to send the status to kubelet and it stays there till manually
deleted and due to it our snc script fails. In this PR we are deleting
the pods which are in failed state (which is the same for
ContainerStatusUnknown one) and then checks the pods availablity.

```
+ sleep 256
+ all_pods_are_running_completed none
+ local ignoreNamespace=none
+ ./openshift-clients/linux/oc get pod --no-headers --all-namespaces '--field-selector=metadata.namespace!=none'
+ grep -v Running
+ grep -v Completed
openshift-kube-apiserver                           installer-11-crc                                         0/1   ContainerStatusUnknown   1                19m
+ exit=1
+ wait=512
+ count=10
+ '[' 10 -lt 10 ']'
+ echo 'Retry 10/10 exited 1, no more retries left.'
Retry 10/10 exited 1, no more retries left.
```

fixes: crc-org#920
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant