Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pods failing with ImagePullBackOff #312

Closed
pdfruth opened this issue Dec 30, 2021 · 6 comments
Closed

pods failing with ImagePullBackOff #312

pdfruth opened this issue Dec 30, 2021 · 6 comments
Assignees

Comments

@pdfruth
Copy link

pdfruth commented Dec 30, 2021

Bug Report

What did you do?

  1. Install the Open Liberty Operator V0.8.0 from Operator Hub
  2. Create a new Openshift project
oc new-project openliberty-demo
  1. Build, package, and push an Openliberty container image to the Openshift internal image registry as image stream app-modernization:v1.0.0
    Note: I have a simple demo app, with instructions in the Git repo here -> https://github.com/OpenShift-Z/openliberty-operator-ocpz#build-and-push-the-container-image

  2. Create an OpenLibertyApplication CR (copy the example yaml to a file olapp.yaml)

apiVersion: apps.openliberty.io/v1beta2
kind: OpenLibertyApplication
metadata:
  name: appmod
spec:
  applicationImage: image-registry.openshift-image-registry.svc:5000/openliberty-demo/app-modernization:v1.0.0
  pullPolicy: Always
  expose: true
  route:
    host: 'modresort.apps.ocp10.internal.net'
    path: '/resorts'
  1. Create the OLA with
oc -n openliberty-demo create -f olapp.yaml
  1. After a few minutes, notice the resulting pod is in ImagePullBackOff status
oc -n openliberty-demo get all
NAME                         READY   STATUS             RESTARTS   AGE
pod/appmod-9c768c58c-89nf2   0/1     ImagePullBackOff   0          4s

NAME             TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
service/appmod   ClusterIP   172.30.165.64   <none>        9080/TCP   48s

NAME                     READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/appmod   0/1     1            0           47s

NAME                               DESIRED   CURRENT   READY   AGE
replicaset.apps/appmod-9c768c58c   1         1         0       47s

NAME                                               IMAGE REPOSITORY                                                                                    TAGS     UPDATED
imagestream.image.openshift.io/app-modernization   default-route-openshift-image-registry.apps.ocp10.internal.net/openliberty-demo/app-modernization   v1.0.0   About a minute ago

NAME                              HOST/PORT                           PATH       SERVICES   PORT       TERMINATION   WILDCARD
route.route.openshift.io/appmod   modresort.apps.ocp10.internal.net   /resorts   appmod     9080-tcp                 None

  1. Review of the pod events indicates pulling the image failed due to authorization failure
oc describe pod appmod-9c768c58c-89nf2

...

Events:
  Type     Reason          Age                     From               Message
  ----     ------          ----                    ----               -------
  Normal   Scheduled       162m                    default-scheduler  Successfully assigned openliberty-demo/appmod-9c768c58c-89nf2 to worker3.ocp10.internal.net
  Normal   AddedInterface  162m                    multus             Add eth0 [192.128.3.190/23] from openshift-sdn
  Warning  Failed          161m (x6 over 162m)     kubelet            Error: ImagePullBackOff
  Normal   Pulling         161m (x4 over 162m)     kubelet            Pulling image "image-registry.openshift-image-registry.svc:5000/openliberty-demo/app-modernization@sha256:57db258d9db75734654890c3ecb4ddc15539cf69e7c3879fea815b3cfdea58a2"
  Warning  Failed          161m (x4 over 162m)     kubelet            Failed to pull image "image-registry.openshift-image-registry.svc:5000/openliberty-demo2/app-modernization@sha256:57db258d9db75734654890c3ecb4ddc15539cf69e7c3879fea815b3cfdea58a2": rpc error: code = Unknown desc = reading manifest sha256:57db258d9db75734654890c3ecb4ddc15539cf69e7c3879fea815b3cfdea58a2 in image-registry.openshift-image-registry.svc:5000/openliberty-demo2/app-modernization: unauthorized: authentication required
  Warning  Failed          161m (x4 over 162m)     kubelet            Error: ErrImagePull
  Normal   BackOff         2m23s (x704 over 162m)  kubelet            Back-off pulling image "image-registry.openshift-image-registry.svc:5000/openliberty-demo/app-modernization@sha256:57db258d9db75734654890c3ecb4ddc15539cf69e7c3879fea815b3cfdea58a2"
  1. Delete the pod
oc delete pod appmod-9c768c58c-89nf2
  1. Notice the pod starts successfully, as expected

What did you expect to see?

The pod should start normally without having to delete the pod first

What did you see instead?

The pod starts before the necessary objects are in place to permit the container image to be pulled from the internal image registry. (eg. secret, service account, role, and rolebinding)

Environment

  • OpenShift version information (if applicable):
oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.11    True        False         5h27m   Cluster version is 4.9.11

Possible solution

I believe the Openliberty Operator may be creating the deployment/pod resource before a requisite service account, secret, and associated Role & Rolebinding is created. Thus, the necessary authorization isn't established yet, when the pod starts.
Merely deleting the pod and letting the deployment recreate the pod seems to work. Suggesting this may be a timing/synchronization issue.

Additional context

Add any other context about the problem here.

@leochr
Copy link
Member

leochr commented Jan 28, 2022

This seems similar to the issue from #236. We are investigating.

@m-reza-rahman
Copy link

We have observed this issue as well as part of our work to enable Liberty on ARO: https://docs.microsoft.com/en-us/azure/developer/java/ee/websphere-family#open-liberty-and-websphere-liberty-on-aro. A fix would be highly appreciated.

@gcharters
Copy link
Member

I'm hitting this on a regular basis so would really appreciate it being fixed. Thank you.

@idlewis
Copy link
Member

idlewis commented Mar 22, 2022

We've just merged a PR into main which should fix this issue.

@m-reza-rahman
Copy link

Great news!

@leochr
Copy link
Member

leochr commented Mar 28, 2022

Open Liberty Operator v0.8.1 is now released with the fix for this issue. Release information is documented here.

fyi @pdfruth @m-reza-rahman @gcharters

@leochr leochr closed this as completed Mar 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants