Pipeline controller shouldn't retry creating pod when the error cannot be mitigated by retry #4092

jialindai · 2021-07-13T03:26:41Z

Expected Behavior

When executing one pipelinerun, sometimes the pod creation will fail due to error which cannot be mitigated by retrying. In this case, pipeline controller should simply fail the pipelinerun without retrying creating pod.

In my case, such error is pod creation failure due to not enough quota in namespace.

Actual Behavior

Pipeline controller keep trying to create pod even there is no enough quota in namespace.

Steps to Reproduce the Problem

Create one namespace with limited resource quota
Create one pipeline which will consume more resource than allowed in the namespace
Pipeline controller will retry to create the pod

Additional Info

Kubernetes version:

Output of kubectl version:

1.18


- Tekton Pipeline version:
v0.22.0

<!-- Any other additional information -->

The text was updated successfully, but these errors were encountered:

bobcatfish · 2021-07-15T00:13:48Z

hey @jialindai !

Pipeline controller keep trying to create pod even there is no enough quota in namespace.

I'm wondering, how would the pipeline controller know if that situation couldn't be mitigated? i.e. what if more quota became available in the namespace later? It sounds like that won't happen in your case, but I think it could for someone else (if I'm wrong maybe you can provide some more details about your setup - e.g. is there some way to conclusively know that quota won't be available?)

You might also fine #734 interesting, which is all about scheduling in resource constrained environments - in that case we intentionally retry and backoff, waiting until resources are available.

For your specific case, it might make sense for you to create a controller (or maybe even cron?) which observes PipelineRuns in the state you are describing and cancels them.

tekton-robot · 2021-10-15T18:02:30Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

tekton-robot · 2021-11-14T18:15:30Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

tekton-robot · 2021-12-14T19:09:30Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

tekton-robot · 2021-12-14T19:09:31Z

@tekton-robot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

jialindai added the kind/bug Categorizes issue or PR as related to a bug. label Jul 13, 2021

bobcatfish added kind/feature Categorizes issue or PR as related to a new feature. triage/needs-information Indicates an issue needs more information in order to work on it. and removed kind/bug Categorizes issue or PR as related to a bug. labels Jul 15, 2021

tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 15, 2021

tekton-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 14, 2021

tekton-robot closed this as completed Dec 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pipeline controller shouldn't retry creating pod when the error cannot be mitigated by retry #4092

Pipeline controller shouldn't retry creating pod when the error cannot be mitigated by retry #4092

jialindai commented Jul 13, 2021

bobcatfish commented Jul 15, 2021

tekton-robot commented Oct 15, 2021

tekton-robot commented Nov 14, 2021

tekton-robot commented Dec 14, 2021

tekton-robot commented Dec 14, 2021

Pipeline controller shouldn't retry creating pod when the error cannot be mitigated by retry #4092

Pipeline controller shouldn't retry creating pod when the error cannot be mitigated by retry #4092

Comments

jialindai commented Jul 13, 2021

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Additional Info

bobcatfish commented Jul 15, 2021

tekton-robot commented Oct 15, 2021

tekton-robot commented Nov 14, 2021

tekton-robot commented Dec 14, 2021

tekton-robot commented Dec 14, 2021