Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TaskRun stays Running when pod goes to ImagePullBackOff #4895

Closed
dibyom opened this issue May 20, 2022 · 8 comments
Closed

TaskRun stays Running when pod goes to ImagePullBackOff #4895

dibyom opened this issue May 20, 2022 · 8 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@dibyom
Copy link
Member

dibyom commented May 20, 2022

Expected Behavior

If the Pod is in ImagePullBackOff state, the TaskRun should fail.

Actual Behavior

The TaskRun's remains "Running" with the message Pending until it eventually times out

Steps to Reproduce the Problem

apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
  generateName: imagepull-fail
spec:
  taskSpec:
    steps:
    - image: whatever
      script: 'true'

@dibyom dibyom added the kind/bug Categorizes issue or PR as related to a bug. label May 20, 2022
@dibyom dibyom changed the title TaskRun stays in Pending when pod goes to ImagePullBackOff TaskRun stays Running when pod goes to ImagePullBackOff May 20, 2022
@lbernick
Copy link
Member

Duplicate of #4890

@imjasonh
Copy link
Member

I'm not sure this is a duplicate of #4890, since that one deals with InvalidImageName (e.g., image: "t0T/Al+y//In\/@l1D") vs ImagePullBackoff (image name is valid, but can't be pulled).

In the latter case, I think we've discussed this before, and decided to follow K8s' behavior where it continues to try to pull the image with backoff, until the image exists and is pullable. This can be useful if you accidentally forget to push the image you're trying to run, or don't have auth set up correctly yet.

In K8s's case it will happily sit back and pull forever since there's no concept of a Pod timeout. In Tekton it might make sense to outright fail fast rather than give users an opportunity to fix it.

I suspect this would only work at all today when the image specifies script or command, since otherwise the Pod translation code will try to lookup the image's entrypoint and fail. I don't think it retries, it just gives up. If that's the case, then I think we should aim for consistency and fail fast if the image can't be pulled, even if entrypoint resolution isn't required.

@RafaeLeal
Copy link
Contributor

I'm also suffering from this same issue. There are errors on the pod level such as ImagePullBackoff and FailedMount that make sense for k8s to keep retrying. Still, in a more ephemeral scenario like a CICD Task, most cases are typos and configuration errors that will not recover and I believe it would be nice if Tekton allows us to fail fast those.

@chitrangpatel
Copy link
Contributor

/assign

chitrangpatel added a commit to chitrangpatel/pipeline that referenced this issue May 31, 2022
Prior to this, if the Pod was in ImagePullBackOff state,
the TaskRun would remain `Running` with the message `Pending` until it eventually timed out.
This led to lots of delays. The expected behavior should have been to
terminate the TaskRun and set it to `fail`. This PR addresses issue
tektoncd#4895.
chitrangpatel added a commit to chitrangpatel/pipeline that referenced this issue May 31, 2022
Prior to this, if the Pod was in ImagePullBackOff state,
the TaskRun would remain `Running` with the message `Pending` until it eventually timed out.
This led to lots of delays. The expected behavior should have been to
terminate the TaskRun and set it to `fail`. This PR addresses issue
tektoncd#4895.
@chitrangpatel
Copy link
Contributor

chitrangpatel commented May 31, 2022

cc @dibyom @imjasonh @lbernick : PR #4921 addresses this.

chitrangpatel added a commit to chitrangpatel/pipeline that referenced this issue May 31, 2022
Prior to this, if the Pod was in ImagePullBackOff state,
the TaskRun would remain `Running` with the message `Pending` until it eventually timed out.
This led to lots of delays. The expected behavior should have been to
terminate the TaskRun and set it to `fail`. This PR addresses issue
tektoncd#4895.
chitrangpatel added a commit to chitrangpatel/pipeline that referenced this issue May 31, 2022
Prior to this, if the Pod was in ImagePullBackOff state,
the TaskRun would remain `Running` with the message `Pending` until it eventually timed out.
This led to lots of delays. The expected behavior should have been to
terminate the TaskRun and set it to `fail`. This PR addresses issue
tektoncd#4895.
chitrangpatel added a commit to chitrangpatel/pipeline that referenced this issue May 31, 2022
Prior to this, if the Pod was in ImagePullBackOff state,
the TaskRun would remain `Running` with the message `Pending` until it eventually timed out.
This led to lots of delays. The expected behavior should have been to
terminate the TaskRun and set it to `fail`. This PR addresses issue
tektoncd#4895.
chitrangpatel added a commit to chitrangpatel/pipeline that referenced this issue Jun 8, 2022
Prior to this, if the Pod was in ImagePullBackOff state,
the TaskRun would remain `Running` with the message `Pending` until it eventually timed out.
This led to lots of delays. The expected behavior should have been to
terminate the TaskRun and set it to `fail`. This PR addresses issue
tektoncd#4895.
chitrangpatel added a commit to chitrangpatel/pipeline that referenced this issue Jun 8, 2022
Prior to this, if the Pod was in ImagePullBackOff state,
the TaskRun would remain `Running` with the message `Pending` until it eventually timed out.
This led to lots of delays. The expected behavior should have been to
terminate the TaskRun and set it to `fail`. This PR addresses issue
tektoncd#4895.
tekton-robot pushed a commit that referenced this issue Jun 9, 2022
Prior to this, if the Pod was in ImagePullBackOff state,
the TaskRun would remain `Running` with the message `Pending` until it eventually timed out.
This led to lots of delays. The expected behavior should have been to
terminate the TaskRun and set it to `fail`. This PR addresses issue
#4895.
@RafaeLeal
Copy link
Contributor

Should I open a different issue to discuss the FailedMount error?

@dibyom
Copy link
Member Author

dibyom commented Jun 22, 2022

@RafaeLeal yes please!

@dibyom
Copy link
Member Author

dibyom commented Jul 12, 2022

Closing since the ImagePullBackoff case is now handled

@dibyom dibyom closed this as completed Jul 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

5 participants