Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

image-registry Available: The deployment does not exist.. | Unable to apply 4.16.15: the cluster operator image-registry is not available #29212

Open
n00bsi opened this issue Oct 21, 2024 · 1 comment

Comments

@n00bsi
Copy link

n00bsi commented Oct 21, 2024

[provide a description of the issue]

Version

[provide output of the openshift version or oc version command]

$ oc version
Client Version: 4.15.11
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: 4.16.14
Kubernetes Version: v1.29.8+f10c92d

Steps To Reproduce
  1. start Update from 4.16.14 to 4.16.15
  2. update hang at 88%
Current Result

Update hat at 88%

image

image

Available: The deployment does not exist NodeCADaemonAvailable: The daemon set node-ca has available replicas ImagePrunerAvailable: Pruner CronJob has been created


$ oc describe pod -n openshift-image-registry node-ca-5c6gg | grep Node
Node-Selectors:              kubernetes.io/os=linux
  Warning  NodeNotReady  100m (x3 over 5h21m)  node-controller  Node i
```s not ready



$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.16.14 True True 6d Unable to apply 4.16.15: the cluster operator image-registry is not available

$ oc get clusteroperator image-registry
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
image-registry False True True 26m Available: The deployment does not exist...


$ oc get pvc
...
...
ocs4registry Bound pvc-38960e2f-4c6b-450d-a5fe-c1a26714e496 1Gi RWX longhorn 162
...

How to fix this ?


$ oc get pods -n openshift-image-registry
NAME READY STATUS RESTARTS AGE
cluster-image-registry-operator-7c87776c4c-csz22 1/1 Running 0 38m
node-ca-5c6gg 1/1 Running 0 38m
node-ca-c492l 1/1 Running 0 38m
node-ca-crzlc 1/1 Running 0 156m
node-ca-dskf6 1/1 Running 0 38m
node-ca-mpwjb 1/1 Running 0 38m
node-ca-xmjbp 1/1 Running 0 38m






Output of: `oc edit configs.imageregistry.operator.openshift.io -o yaml`

see this attach
[image_reg.yaml.log](https://github.com/user-attachments/files/17459954/image_reg.yaml.log)


##### Expected Result

Update go to the end

##### Additional Information
[try to run `$ oc adm diagnostics` (or `oadm diagnostics`) command if possible]
[if you are reporting issue related to builds, provide build logs with `BUILD_LOGLEVEL=5`]
[consider attaching output of the `$ oc get all -o json -n <namespace>` command to the issue]
[visit https://docs.openshift.org/latest/welcome/index.html]
@n00bsi
Copy link
Author

n00bsi commented Oct 22, 2024

Found some things:

oc describe clusteroperator/machine-config

oc delete pod node-ca-8566c -n openshift-image-registry
and all other node-* pods

oc get pods -n openshift-machine-config-operator 
oc logs -f -n openshift-machine-config-operator machine-config-controller-645db999c6-xjsqs -c machine-config-controller

oc adm drain node1.domain.tld --ignore-daemonsets --force --delete-emptydir-data

https://www.neteye-blog.com/2023/08/debug-and-workarounds-for-a-stuck-update-on-openshift-4-13-6/

https://access.redhat.com/solutions/5317441

https://access.redhat.com/solutions/5598401

Now all Nodes have the same level:

Red Hat Enterprise Linux CoreOS 416.94.202409191851-0

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.16.15   True        False         112m    Cluster version is 4.16.15


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant