Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Account for finally TaskRun retries in PR timeouts #4508

Closed
wants to merge 1 commit into from

Conversation

lbernick
Copy link
Member

@lbernick lbernick commented Jan 24, 2022

Changes

Prior to this commit, the PipelineRun reconciler did not account for time elapsed during a finally
TaskRun's retries when setting its timeout. This resulted in pipelinerun.timeouts.finally not being
respected when a finally TaskRun was retried.

This commit updates the finally TaskRun's timeout to account for time elapsed during retries.
Closes #4071.

Co-authored-by: Jerop Kipruto jerop@google.com @jerop

/kind bug

Submitter Checklist

As the author of this PR, please check off the items in this checklist:

  • Docs included if any changes are user facing
  • Tests included if any functionality added or changed
  • Follows the commit message standard
  • Meets the Tekton contributor standards (including
    functionality, content, code)
  • Release notes block below has been filled in or deleted (only if no user facing changes)

Release Notes

[Bug fix]: Account for time elapsed during finally TaskRun retries when applying PipelineRun timeouts

@tekton-robot tekton-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. labels Jan 24, 2022
@tekton-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign jerop
You can assign the PR to them by writing /assign @jerop in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jan 24, 2022
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/pipelinerun.go 83.6% 83.8% 0.2
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go 93.8% 93.6% -0.1

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/pipelinerun.go 83.6% 83.8% 0.2
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go 93.8% 93.6% -0.1

@bobcatfish bobcatfish added this to the Pipelines v0.33 milestone Jan 25, 2022
Copy link
Member

@jerop jerop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @lbernick!

could we add some documentation updates for this? possibly in the pipelinerun timeouts section: https://github.com/tektoncd/pipeline/blob/main/docs/pipelineruns.md#configuring-a-failure-timeout

@lbernick
Copy link
Member Author

could we add some documentation updates for this? possibly in the pipelinerun timeouts section: https://github.com/tektoncd/pipeline/blob/main/docs/pipelineruns.md#configuring-a-failure-timeout

Done!

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/pipelinerun.go 83.6% 83.8% 0.2
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go 93.8% 93.6% -0.1

@tekton-robot tekton-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 4, 2022
Prior to this commit, the PipelineRun reconciler did not account for time elapsed during a finally
TaskRun's retries when setting its timeout. This resulted in `pipelinerun.timeouts.finally` not being
respected when a finally TaskRun was retried.

This commit updates the finally TaskRun's timeout to account for time elapsed during retries.

Co-authored-by: Jerop Kipruto jerop@google.com
@tekton-robot tekton-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 4, 2022
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/pipelinerun.go 83.9% 84.0% 0.1
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go 93.8% 93.6% -0.1

Comment on lines +184 to +219
if t.IsCustomTask() {
r := t.Run
if r == nil {
return nil
}
startTime = r.Status.StartTime
if startTime.IsZero() {
if len(r.Status.RetriesStatus) == 0 {
return startTime
}
startTime = &metav1.Time{Time: c.Now()}
}
for _, retry := range r.Status.RetriesStatus {
if retry.StartTime.Time.Before(startTime.Time) {
startTime = retry.StartTime
}
}
return startTime
}
tr := t.TaskRun
if tr == nil {
return nil
}
startTime = tr.Status.StartTime
if startTime.IsZero() {
if len(tr.Status.RetriesStatus) == 0 {
return startTime
}
startTime = &metav1.Time{Time: c.Now()}
}
for _, retry := range tr.Status.RetriesStatus {
if retry.StartTime.Time.Before(startTime.Time) {
startTime = retry.StartTime
}
}
return startTime
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there's a way we can reduce the duplication here with an interface? These types seem very similar (e.g. they are the same base data with extra data layered on top), so it might make sense for them to depend on a common base with an interface that can extract out the common values.

startTime = &metav1.Time{Time: c.Now()}
}
for _, retry := range tr.Status.RetriesStatus {
if retry.StartTime.Time.Before(startTime.Time) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this actually work? 🤔 The API documentation says: All TaskRunStatus stored in RetriesStatus will have no date within the RetriesStatus as is redundant.

This might just be out of date documentation, but we should double check this and ideally add a test that goes through a full reconcile loop if we can - IIUC the current reconciler tests just mock this by injecting a simulated resolved Task with the retry values set.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, this actually does not work, but not for this reason (docs are out of date; retries status does have a timestamp). I think #4409 might be affecting how this works; it's possible we need to address this bug before addressing this TODO.

@@ -174,6 +176,49 @@ func (t ResolvedPipelineRunTask) IsStarted() bool {
return t.TaskRun != nil && t.TaskRun.Status.GetCondition(apis.ConditionSucceeded) != nil
}

// FirstAttemptStartTime returns the start time of the first time the ResolvedPipelineRunTask was attempted.
// Returns nil if no attempt has been started.
func (t *ResolvedPipelineRunTask) FirstAttemptStartTime(c clock.Clock) *metav1.Time {
Copy link
Member

@wlynch wlynch Feb 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want clock based funcs to be exposed to clients, or is this an implementation detail that should be unexported?

@lbernick
Copy link
Member Author

lbernick commented Feb 7, 2022

/hold

@tekton-robot tekton-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 7, 2022
@jerop
Copy link
Member

jerop commented Mar 21, 2022

@lbernick given the hold, may I move this to the next milestone? hoping to release 0.34 today

@lbernick
Copy link
Member Author

@lbernick given the hold, may I move this to the next milestone? hoping to release 0.34 today

yup sounds good!

@tekton-robot tekton-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 27, 2022
@tekton-robot
Copy link
Collaborator

@lbernick: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@pritidesai
Copy link
Member

releasing 0.35 today, moving this to the next milestone! 🤞

@dibyom dibyom removed this from the Pipelines v0.37 milestone Jun 14, 2022
@dibyom
Copy link
Member

dibyom commented Jun 14, 2022

Closing since this is not being actively worked on.

@dibyom dibyom closed this Jun 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/bug Categorizes issue or PR as related to a bug. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

task time exceeded timeouts.tasks when task retried
9 participants