Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

avoid requeuing taskrun in case of permanent error #3068

Merged
merged 1 commit into from
Nov 3, 2020

Conversation

pritidesai
Copy link
Member

@pritidesai pritidesai commented Aug 7, 2020

Changes

When a taskrun is rejected with permanent error, reconciler should not requeue that taskrun.

After a permanent error is raised in prepare function call, reconciler enters the tr.IsDone block. In this block, sidecars were being terminated without any check on pod name. The rejected taskrun has no pod name associated with it since pod was never created. Reconciler fails to run this Get command and returns a normal error. Next reconciler cycle runs with this normal error instead of permanent error and tries to requeue the taskrun until reconciler exhausts the allowed retries.

These changes are introduced to add a check if pod was created for a taskrun before cleaning up the sidecars.

Most of the changes in this PR are implemented based on the pipelinerun.go.

I have recorded logs for the following taskrun on master and with this PR so that its easy to compare.

Note that the reconciler exhausts the number of retries for the following TaskRun even though its declared failure with permanent error. After introducing these changes, the reconciler exits after seeing the permanent error and does not requeue that TaskRun.

apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
  name: taskrun-failure
spec:
  taskRef:
    name: does-not-exist

Closes #3045
/kind bug

Submitter Checklist

These are the criteria that every PR should meet, please check them off as you
review them:

  • Includes tests (if functionality changed/added)
  • Includes docs (if user facing)
  • Commit messages follow commit message best practices
  • Release notes block has been filled in or deleted (only if no user facing changes)

See the contribution guide for more details.

Double check this list of stuff that's easy to miss:

Reviewer Notes

If API changes are included, additive changes must be approved by at least two OWNERS and backwards incompatible changes must be approved by more than 50% of the OWNERS, and they must first be added in a backwards compatible way.

Release Notes

Do not requeue taskrun if it was rejected with permanent error. This bug was causing the incorrect metrics for tekton_taskrun_count{status="failed"}. 

@tekton-robot tekton-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 7, 2020
@tekton-robot tekton-robot requested review from dibyom and a user August 7, 2020 06:37
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/pipeline/v1beta1/taskrun_types.go 81.8% 78.9% -2.9
pkg/reconciler/taskrun/taskrun.go 78.3% 78.9% 0.6

@pritidesai
Copy link
Member Author

/test pull-tekton-pipeline-integration-tests

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/pipeline/v1beta1/taskrun_types.go 81.8% 78.9% -2.9
pkg/reconciler/taskrun/taskrun.go 78.3% 78.9% 0.6

@vdemeester vdemeester added kind/bug Categorizes issue or PR as related to a bug. and removed kind/bug Categorizes issue or PR as related to a bug. labels Aug 11, 2020
@tekton-robot tekton-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 9, 2020
@tekton-robot tekton-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 1, 2020
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/pipeline/v1beta1/taskrun_types.go 81.8% 78.9% -2.9

@tekton-robot tekton-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 3, 2020
@tekton-robot tekton-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 14, 2020
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/pipeline/v1beta1/taskrun_types.go 81.8% 78.9% -2.9

@ghost
Copy link

ghost commented Oct 22, 2020

/test pull-tekton-pipeline-integration-tests

@vdemeester
Copy link
Member

/test all

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/pipeline/v1beta1/taskrun_types.go 80.4% 77.6% -2.8

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/pipeline/v1beta1/taskrun_types.go 80.4% 77.6% -2.8

When a taskrun is rejected with permanent error, reconciler should not
try to requeue the taskrun. After a permanent error is raised in prepare
function call, reconciler enters the tr.IsDone block. In that block,
sidecars were being terminated without any check on pod name.
Such rejected taskrun has no pod name associated with it since a pod is never
created. Reconciler fails to run such Get command and returns a normal error.
Next reconciler cycle runs with this normal error instead of permanent error
and tries to requeue the taskrun until reconciler exhausts the allowed retries.

These changes are introduced to add a check if pod was created for a taskrun
before cleaning up the sidecars.

Most of the changes in this PR are introduced based on the pipelinerun.go
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/pipeline/v1beta1/taskrun_types.go 80.4% 77.6% -2.8

@pritidesai
Copy link
Member Author

Its ready for review, PTAL 🙏

@tekton-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sbwsg

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 3, 2020
@jerop
Copy link
Member

jerop commented Nov 3, 2020

thank you @pritidesai!

/lgtm

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Nov 3, 2020
@afrittoli
Copy link
Member

/hold

@tekton-robot tekton-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 3, 2020
@afrittoli
Copy link
Member

Sorry about the hold @pritidesai I just wanted to clarify one bit as I was reviewing this PR.

Comment on lines -116 to -117
_, updateErr := c.updateLabelsAndAnnotations(ctx, tr)
merr = multierror.Append(cloudEventErr, updateErr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was confused by the removal of this one, but I think it may not be needed indeed?
It's not clear to me how this change is related to this PR though

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

never mind, I was confusing myself

@afrittoli
Copy link
Member

/hold cancel

@tekton-robot tekton-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 3, 2020
@afrittoli
Copy link
Member

Sorry about the confusion 🙏 I was in the middle of reviewing and I though I saw an issue but I was wrong.
Thanks for this!!

@tekton-robot tekton-robot merged commit ce1b7b5 into tektoncd:master Nov 3, 2020
@pritidesai
Copy link
Member Author

no worries @afrittoli, thanks a bunch for the review @sbwsg @afrittoli @jerop 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error count "failed" taskrun for metric of Prometheus
5 participants