pods failing with ImagePullBackOff #312

pdfruth · 2021-12-30T04:35:02Z

Bug Report

What did you do?

Install the Open Liberty Operator V0.8.0 from Operator Hub
Create a new Openshift project

oc new-project openliberty-demo

Build, package, and push an Openliberty container image to the Openshift internal image registry as image stream app-modernization:v1.0.0
Note: I have a simple demo app, with instructions in the Git repo here -> https://github.com/OpenShift-Z/openliberty-operator-ocpz#build-and-push-the-container-image
Create an OpenLibertyApplication CR (copy the example yaml to a file olapp.yaml)

apiVersion: apps.openliberty.io/v1beta2
kind: OpenLibertyApplication
metadata:
  name: appmod
spec:
  applicationImage: image-registry.openshift-image-registry.svc:5000/openliberty-demo/app-modernization:v1.0.0
  pullPolicy: Always
  expose: true
  route:
    host: 'modresort.apps.ocp10.internal.net'
    path: '/resorts'

Create the OLA with

oc -n openliberty-demo create -f olapp.yaml

After a few minutes, notice the resulting pod is in ImagePullBackOff status

oc -n openliberty-demo get all
NAME                         READY   STATUS             RESTARTS   AGE
pod/appmod-9c768c58c-89nf2   0/1     ImagePullBackOff   0          4s

NAME             TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
service/appmod   ClusterIP   172.30.165.64   <none>        9080/TCP   48s

NAME                     READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/appmod   0/1     1            0           47s

NAME                               DESIRED   CURRENT   READY   AGE
replicaset.apps/appmod-9c768c58c   1         1         0       47s

NAME                                               IMAGE REPOSITORY                                                                                    TAGS     UPDATED
imagestream.image.openshift.io/app-modernization   default-route-openshift-image-registry.apps.ocp10.internal.net/openliberty-demo/app-modernization   v1.0.0   About a minute ago

NAME                              HOST/PORT                           PATH       SERVICES   PORT       TERMINATION   WILDCARD
route.route.openshift.io/appmod   modresort.apps.ocp10.internal.net   /resorts   appmod     9080-tcp                 None

Review of the pod events indicates pulling the image failed due to authorization failure

oc describe pod appmod-9c768c58c-89nf2

...

Events:
  Type     Reason          Age                     From               Message
  ----     ------          ----                    ----               -------
  Normal   Scheduled       162m                    default-scheduler  Successfully assigned openliberty-demo/appmod-9c768c58c-89nf2 to worker3.ocp10.internal.net
  Normal   AddedInterface  162m                    multus             Add eth0 [192.128.3.190/23] from openshift-sdn
  Warning  Failed          161m (x6 over 162m)     kubelet            Error: ImagePullBackOff
  Normal   Pulling         161m (x4 over 162m)     kubelet            Pulling image "image-registry.openshift-image-registry.svc:5000/openliberty-demo/app-modernization@sha256:57db258d9db75734654890c3ecb4ddc15539cf69e7c3879fea815b3cfdea58a2"
  Warning  Failed          161m (x4 over 162m)     kubelet            Failed to pull image "image-registry.openshift-image-registry.svc:5000/openliberty-demo2/app-modernization@sha256:57db258d9db75734654890c3ecb4ddc15539cf69e7c3879fea815b3cfdea58a2": rpc error: code = Unknown desc = reading manifest sha256:57db258d9db75734654890c3ecb4ddc15539cf69e7c3879fea815b3cfdea58a2 in image-registry.openshift-image-registry.svc:5000/openliberty-demo2/app-modernization: unauthorized: authentication required
  Warning  Failed          161m (x4 over 162m)     kubelet            Error: ErrImagePull
  Normal   BackOff         2m23s (x704 over 162m)  kubelet            Back-off pulling image "image-registry.openshift-image-registry.svc:5000/openliberty-demo/app-modernization@sha256:57db258d9db75734654890c3ecb4ddc15539cf69e7c3879fea815b3cfdea58a2"

Delete the pod

oc delete pod appmod-9c768c58c-89nf2

Notice the pod starts successfully, as expected

What did you expect to see?

The pod should start normally without having to delete the pod first

What did you see instead?

The pod starts before the necessary objects are in place to permit the container image to be pulled from the internal image registry. (eg. secret, service account, role, and rolebinding)

Environment

OpenShift version information (if applicable):

oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.11    True        False         5h27m   Cluster version is 4.9.11

Possible solution

I believe the Openliberty Operator may be creating the deployment/pod resource before a requisite service account, secret, and associated Role & Rolebinding is created. Thus, the necessary authorization isn't established yet, when the pod starts.
Merely deleting the pod and letting the deployment recreate the pod seems to work. Suggesting this may be a timing/synchronization issue.

Additional context

Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

leochr · 2022-01-28T15:53:21Z

This seems similar to the issue from #236. We are investigating.

m-reza-rahman · 2022-03-11T15:13:21Z

We have observed this issue as well as part of our work to enable Liberty on ARO: https://docs.microsoft.com/en-us/azure/developer/java/ee/websphere-family#open-liberty-and-websphere-liberty-on-aro. A fix would be highly appreciated.

gcharters · 2022-03-17T09:23:04Z

I'm hitting this on a regular basis so would really appreciate it being fixed. Thank you.

idlewis · 2022-03-22T11:00:18Z

We've just merged a PR into main which should fix this issue.

m-reza-rahman · 2022-03-22T11:51:23Z

Great news!

leochr · 2022-03-28T12:41:53Z

Open Liberty Operator v0.8.1 is now released with the fix for this issue. Release information is documented here.

fyi @pdfruth @m-reza-rahman @gcharters

leochr assigned idlewis Feb 11, 2022

idlewis mentioned this issue Feb 17, 2022

Timing #320

Merged

idlewis mentioned this issue Mar 7, 2022

Dont create deployment without a pull secret application-stacks/runtime-component-operator#343

Merged

leochr closed this as completed Mar 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pods failing with ImagePullBackOff #312

pods failing with ImagePullBackOff #312

pdfruth commented Dec 30, 2021 •

edited

Loading

leochr commented Jan 28, 2022

m-reza-rahman commented Mar 11, 2022

gcharters commented Mar 17, 2022

idlewis commented Mar 22, 2022

m-reza-rahman commented Mar 22, 2022

leochr commented Mar 28, 2022

pods failing with ImagePullBackOff #312

pods failing with ImagePullBackOff #312

Comments

pdfruth commented Dec 30, 2021 • edited Loading

Bug Report

What did you do?

What did you expect to see?

What did you see instead?

Environment

Possible solution

Additional context

leochr commented Jan 28, 2022

m-reza-rahman commented Mar 11, 2022

gcharters commented Mar 17, 2022

idlewis commented Mar 22, 2022

m-reza-rahman commented Mar 22, 2022

leochr commented Mar 28, 2022

pdfruth commented Dec 30, 2021 •

edited

Loading