Account for finally TaskRun retries in PR timeouts #4508

lbernick · 2022-01-24T17:56:49Z

Changes

Prior to this commit, the PipelineRun reconciler did not account for time elapsed during a finally
TaskRun's retries when setting its timeout. This resulted in pipelinerun.timeouts.finally not being
respected when a finally TaskRun was retried.

This commit updates the finally TaskRun's timeout to account for time elapsed during retries.
Closes #4071.

Co-authored-by: Jerop Kipruto jerop@google.com @jerop

/kind bug

Submitter Checklist

As the author of this PR, please check off the items in this checklist:

Docs included if any changes are user facing
Tests included if any functionality added or changed
Follows the commit message standard
Meets the Tekton contributor standards (including
functionality, content, code)
Release notes block below has been filled in or deleted (only if no user facing changes)

Release Notes

[Bug fix]: Account for time elapsed during finally TaskRun retries when applying PipelineRun timeouts

tekton-robot · 2022-01-24T17:56:58Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign jerop
You can assign the PR to them by writing /assign @jerop in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

tekton-robot · 2022-01-24T18:00:02Z

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/reconciler/pipelinerun/pipelinerun.go	83.6%	83.8%	0.2
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go	93.8%	93.6%	-0.1

tekton-robot · 2022-01-24T18:53:25Z

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/reconciler/pipelinerun/pipelinerun.go	83.6%	83.8%	0.2
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go	93.8%	93.6%	-0.1

jerop

thanks @lbernick!

could we add some documentation updates for this? possibly in the pipelinerun timeouts section: https://github.com/tektoncd/pipeline/blob/main/docs/pipelineruns.md#configuring-a-failure-timeout

lbernick · 2022-01-28T15:36:39Z

could we add some documentation updates for this? possibly in the pipelinerun timeouts section: https://github.com/tektoncd/pipeline/blob/main/docs/pipelineruns.md#configuring-a-failure-timeout

Done!

tekton-robot · 2022-01-28T15:37:54Z

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/reconciler/pipelinerun/pipelinerun.go	83.6%	83.8%	0.2
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go	93.8%	93.6%	-0.1

Prior to this commit, the PipelineRun reconciler did not account for time elapsed during a finally TaskRun's retries when setting its timeout. This resulted in `pipelinerun.timeouts.finally` not being respected when a finally TaskRun was retried. This commit updates the finally TaskRun's timeout to account for time elapsed during retries. Co-authored-by: Jerop Kipruto jerop@google.com

tekton-robot · 2022-02-04T14:09:24Z

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/reconciler/pipelinerun/pipelinerun.go	83.9%	84.0%	0.1
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go	93.8%	93.6%	-0.1

wlynch · 2022-02-04T21:50:15Z

pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go

+	if t.IsCustomTask() {
+		r := t.Run
+		if r == nil {
+			return nil
+		}
+		startTime = r.Status.StartTime
+		if startTime.IsZero() {
+			if len(r.Status.RetriesStatus) == 0 {
+				return startTime
+			}
+			startTime = &metav1.Time{Time: c.Now()}
+		}
+		for _, retry := range r.Status.RetriesStatus {
+			if retry.StartTime.Time.Before(startTime.Time) {
+				startTime = retry.StartTime
+			}
+		}
+		return startTime
+	}
+	tr := t.TaskRun
+	if tr == nil {
+		return nil
+	}
+	startTime = tr.Status.StartTime
+	if startTime.IsZero() {
+		if len(tr.Status.RetriesStatus) == 0 {
+			return startTime
+		}
+		startTime = &metav1.Time{Time: c.Now()}
+	}
+	for _, retry := range tr.Status.RetriesStatus {
+		if retry.StartTime.Time.Before(startTime.Time) {
+			startTime = retry.StartTime
+		}
+	}
+	return startTime


I wonder if there's a way we can reduce the duplication here with an interface? These types seem very similar (e.g. they are the same base data with extra data layered on top), so it might make sense for them to depend on a common base with an interface that can extract out the common values.

wlynch · 2022-02-04T21:52:49Z

pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go

+		startTime = &metav1.Time{Time: c.Now()}
+	}
+	for _, retry := range tr.Status.RetriesStatus {
+		if retry.StartTime.Time.Before(startTime.Time) {


Will this actually work? 🤔 The API documentation says: All TaskRunStatus stored in RetriesStatus will have no date within the RetriesStatus as is redundant.

This might just be out of date documentation, but we should double check this and ideally add a test that goes through a full reconcile loop if we can - IIUC the current reconciler tests just mock this by injecting a simulated resolved Task with the retry values set.

You're right, this actually does not work, but not for this reason (docs are out of date; retries status does have a timestamp). I think #4409 might be affecting how this works; it's possible we need to address this bug before addressing this TODO.

wlynch · 2022-02-04T22:02:32Z

pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go

@@ -174,6 +176,49 @@ func (t ResolvedPipelineRunTask) IsStarted() bool {
 	return t.TaskRun != nil && t.TaskRun.Status.GetCondition(apis.ConditionSucceeded) != nil
 }

+// FirstAttemptStartTime returns the start time of the first time the ResolvedPipelineRunTask was attempted.
+// Returns nil if no attempt has been started.
+func (t *ResolvedPipelineRunTask) FirstAttemptStartTime(c clock.Clock) *metav1.Time {


Do we want clock based funcs to be exposed to clients, or is this an implementation detail that should be unexported?

lbernick · 2022-02-07T16:37:14Z

/hold

jerop · 2022-03-21T13:29:58Z

@lbernick given the hold, may I move this to the next milestone? hoping to release 0.34 today

lbernick · 2022-03-21T14:56:27Z

@lbernick given the hold, may I move this to the next milestone? hoping to release 0.34 today

yup sounds good!

tekton-robot · 2022-03-27T12:13:28Z

@lbernick: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

pritidesai · 2022-04-26T15:37:02Z

releasing 0.35 today, moving this to the next milestone! 🤞

dibyom · 2022-06-14T16:46:18Z

Closing since this is not being actively worked on.

tekton-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. labels Jan 24, 2022

tekton-robot requested review from bobcatfish and dlorenc January 24, 2022 17:56

tekton-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jan 24, 2022

lbernick force-pushed the elapsed branch from 5964986 to 83c8747 Compare January 24, 2022 18:51

bobcatfish added this to the Pipelines v0.33 milestone Jan 25, 2022

jerop reviewed Jan 28, 2022

View reviewed changes

lbernick force-pushed the elapsed branch from 83c8747 to 8131cdd Compare January 28, 2022 15:35

tekton-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 4, 2022

lbernick force-pushed the elapsed branch from 8131cdd to 38049c9 Compare February 4, 2022 14:06

tekton-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 4, 2022

wlynch reviewed Feb 4, 2022

View reviewed changes

tekton-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 7, 2022

afrittoli modified the milestones: Pipelines v0.33, Pipelines v0.34 Feb 8, 2022

jerop modified the milestones: Pipelines v0.34, Pipelines v0.35 Mar 22, 2022

tekton-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 27, 2022

lbernick mentioned this pull request Apr 7, 2022

task time exceeded timeouts.tasks when task retried #4071

Closed

pritidesai modified the milestones: Pipelines v0.35, Pipelines v0.36 Apr 26, 2022

vdemeester modified the milestones: Pipelines v0.36, Pipelines v0.37 May 31, 2022

dibyom removed this from the Pipelines v0.37 milestone Jun 14, 2022

dibyom closed this Jun 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Account for finally TaskRun retries in PR timeouts #4508

Account for finally TaskRun retries in PR timeouts #4508

lbernick commented Jan 24, 2022 •

edited

Loading

tekton-robot commented Jan 24, 2022

tekton-robot commented Jan 24, 2022

tekton-robot commented Jan 24, 2022

jerop left a comment

lbernick commented Jan 28, 2022

tekton-robot commented Jan 28, 2022

tekton-robot commented Feb 4, 2022

wlynch Feb 4, 2022

wlynch Feb 4, 2022

lbernick Feb 7, 2022

wlynch Feb 4, 2022 •

edited

Loading

lbernick commented Feb 7, 2022

jerop commented Mar 21, 2022

lbernick commented Mar 21, 2022

tekton-robot commented Mar 27, 2022

pritidesai commented Apr 26, 2022

dibyom commented Jun 14, 2022

Account for finally TaskRun retries in PR timeouts #4508

Account for finally TaskRun retries in PR timeouts #4508

Conversation

lbernick commented Jan 24, 2022 • edited Loading

Changes

Submitter Checklist

Release Notes

tekton-robot commented Jan 24, 2022

tekton-robot commented Jan 24, 2022

tekton-robot commented Jan 24, 2022

jerop left a comment

Choose a reason for hiding this comment

lbernick commented Jan 28, 2022

tekton-robot commented Jan 28, 2022

tekton-robot commented Feb 4, 2022

wlynch Feb 4, 2022

Choose a reason for hiding this comment

wlynch Feb 4, 2022

Choose a reason for hiding this comment

lbernick Feb 7, 2022

Choose a reason for hiding this comment

wlynch Feb 4, 2022 • edited Loading

Choose a reason for hiding this comment

lbernick commented Feb 7, 2022

jerop commented Mar 21, 2022

lbernick commented Mar 21, 2022

tekton-robot commented Mar 27, 2022

pritidesai commented Apr 26, 2022

dibyom commented Jun 14, 2022

lbernick commented Jan 24, 2022 •

edited

Loading

wlynch Feb 4, 2022 •

edited

Loading