Switch PipelineRun timeout -> TaskRun logic to instead signal the TaskRuns to stop #5134

abayer · 2022-07-13T20:20:41Z

Changes

An alternative approach vs #5133, attempting to eliminate the problem entirely rather than just working around it.

As noted in #5127, the logic around calculating a timeout for a PipelineRun's TaskRun to make sure that the TaskRun's timeout is always going to end before the PipelineRun's timeout ends is problematic. It can result in race conditions where a TaskRun gets timed out, immediately followed by the PipelineRun being reconciled while not yet having hit the end of its own timeout. This change gets rid of that behavior, and instead sets the TaskRun.Spec.Status to a new value, TaskRunTimedOut, with the TaskRun reconciler handling that in the same way it does setting TaskRun.Spec.Status to TaskRunCancelled.

By doing this, we can unify the behavior for both TaskRuns and Runs upon PipelineRuns timing out, given that we already cancel Runs upon PipelineRun timeout, and we can kill off a bunch of flaky outcomes for PipelineRuns.

Also, rather than setting a 1s timeout on tasks which are created after the .timeouts.pipeline, .timeouts.tasks, or .timeouts.finally timeouts have been passed, this change will skip creation of those TaskRuns or Runs completely. However, it will still report those "tasks" as failed, not skipped, since attempted creation of tasks after the appropriate timeout has been reached is a failure case, and that's the existing expected behavior for PipelineTasks which try to start after timeouts have elapsed.

Huge thanks to @jerop for suggesting this approach!

/kind bug

Submitter Checklist

As the author of this PR, please check off the items in this checklist:

Has Docs included if any changes are user facing
Has Tests included if any functionality added or changed
Follows the commit message standard
Meets the Tekton contributor standards (including
functionality, content, code)
Has a kind label. You can add one by adding a comment on this PR that contains /kind <type>. Valid types are bug, cleanup, design, documentation, feature, flake, misc, question, tep
Release notes block below has been updated with any user facing changes (API changes, bug fixes, changes requiring upgrade notices or deprecation warnings)
Release notes contains the string "action required" if the change requires additional action from users switching to the new release

Release Notes

Change PipelineRun timeout behavior for child TaskRuns and Runs to behave like cancellation rather than explicitly setting timeouts on the child tasks at runtime.

abayer · 2022-07-13T20:52:22Z

/wip
/assign @jerop @pritidesai @dibyom @afrittoli @vdemeester

(marking this as WIP because I want to get a good amount of test runs in to be sure it actually fixes the TestPipelineRunTimeout flaky test)

This is a rather significant change to how we actually time out TaskRuns for a timed-out PipelineRun, but I think it's reasonable/safe - this is what we do for cancellation already, so it's not that new, and it's frankly significantly more logical than the current "subtract the elapsed time from the PR start time plus PR timeout, and tada, that's the TR timeout" behavior.

abayer · 2022-07-14T11:52:36Z

I cherry-picked this onto #5077, since I'm running the e2e tests over and over on there anyway to validate kind jobs, and in the 19 *-integration-tests jobs that have run to completion (1 failed fast to ghcr.io issues) since I did the cherry-pick, none have failed due to TestPipelineRunTimeout. In previous testing on that branch, I saw it fail 7 times out of 34 runs. So...I think this does the trick. =)

abayer · 2022-07-14T13:55:42Z

/assign @lbernick

jerop · 2022-07-14T14:19:28Z

since I did the cherry-pick, none have failed due to TestPipelineRunTimeout. In previous testing on that branch, I saw it fail 7 times out of 34 runs. So...I think this does the trick. =)

@abayer glad to see these results, and thank you! 😀

jerop · 2022-07-14T14:26:39Z

pkg/reconciler/pipelinerun/pipelinerun_test.go

@@ -3259,7 +3401,7 @@ func TestGetTaskRunTimeout(t *testing.T) {
 				},
 			},
 		},
-		expected: &metav1.Duration{Duration: 10 * time.Minute},
+		expected: &metav1.Duration{Duration: 15 * time.Minute},
 	}, {
 		name: "taskrun with elapsed time; timeouts.pipeline applies",


this test case is confusing, it says timeouts.pipeline applies but that field was not set 😕

(not caused by this change, was there before, just trying to understand the full thing)

Yeah, lemme change that.

Tweaked it to actually use timeouts.pipeline. =)

jerop · 2022-07-14T14:32:25Z

pkg/reconciler/pipelinerun/pipelinerun.go

-		timeRemaining := (timeout - pElapsedTime)
-		// Return the smaller of timeRemaining and rpt.pipelineTask.timeout
-		if rpt.PipelineTask.Timeout != nil && rpt.PipelineTask.Timeout.Duration < timeRemaining {
+		if rpt.PipelineTask.Timeout != nil && rpt.PipelineTask.Timeout.Duration < timeout {


the docstring for calculateTaskRunTimeout needs to be updated to reflect this change

jerop · 2022-07-14T14:38:20Z

pkg/reconciler/pipelinerun/timeout.go

+}
+
+// timeoutPipelineTaskRuns patches `TaskRun` and `Run` with timed out status
+func timeoutPipelineTaskRuns(ctx context.Context, logger *zap.SugaredLogger, pr *v1beta1.PipelineRun, clientSet clientset.Interface) []string {


this function name is confusing - made me think we were timing out taskruns but it's also timing out runs

can we split it into two functions - timeoutTaskRuns and timeoutRuns? then we call both of them in timeoutPipelineRun?

I'd actually go with renaming it to just timeoutPipelineTasks and keeping TaskRun and Run timeout in here - the name was just copy-pasted from cancel.go and renamed from cancelPipelineTaskRuns.

that works too, thank you!

jerop · 2022-07-14T14:40:16Z

pkg/reconciler/pipelinerun/pipelinerun.go

@@ -621,7 +631,7 @@ func (c *Reconciler) processRunTimeouts(ctx context.Context, pr *v1beta1.Pipelin
 	}
 	for _, rpt := range pipelineState {
 		if rpt.IsCustomTask() {
-			if rpt.Run != nil && !rpt.Run.IsCancelled() && (pr.HasTimedOut(ctx, c.Clock) || (rpt.Run.HasTimedOut(c.Clock) && !rpt.Run.IsDone())) {
+			if rpt.Run != nil && !rpt.Run.IsCancelled() && rpt.Run.HasTimedOut(c.Clock) && !rpt.Run.IsDone() {


given that we're timing out runs in timeoutPipelineTaskRuns - i'd think that we don't need to do it in processRunTimeouts again - that is, we can remove this function

that way, timing out of taskruns and runs is processed at one point, or is there something i'm missing?

So this confused me a bunch at first - previous to this PR, processRunTimeouts handles both timing out a Run because its parent PipelineRun has timed out (which is the bit I removed here) and timing out a Run because the Run itself has timed out. There's no guarantee that a custom task reconciler will actually handle timeouts itself, so the only way we can guarantee that a Run with a timeout set on it in a Pipeline gets timed out is to do so in the PipelineRun reconciler, hence this function sticking around.

I'm going to add docstrings to clarify this.

aaaaah makes sense now, +1 to adding a docstring will really help here

can we have a function named timeoutRun which is used here and in timeoutPipelineTaskRuns to remove the duplication?

It's not actually duplication any more - processRunTimeouts and timeoutPipelineTasks both end up calling cancelRun, but the "should I time out the Run?" logic is different.

jerop · 2022-07-14T14:43:42Z

pkg/apis/pipeline/v1beta1/taskrun_types.go

+// IsSpecTimedOut returns true if the TaskRun's spec status is set to TimedOut state
+func (tr *TaskRun) IsSpecTimedOut() bool {
+	return tr.Spec.Status == TaskRunSpecStatusTimedOut
+}
+


please add tests for this function 🙏🏾

Good catch!

lbernick

I really like this approach and it will probably help with #4071 as well! Nice catch, this was a sneaky bug! One challenge here is that our test coverage for timeouts is not very good, and I think this will result in timeouts.tasks and timeouts.finally not working as intended.

lbernick · 2022-07-14T14:35:35Z

pkg/apis/pipeline/v1beta1/taskrun_types.go

+	// TaskRunSpecStatusTimedOut indicates that the PipelineRun owning this task wants to mark it as timed out,
+	// if not already cancelled or terminated
+	TaskRunSpecStatusTimedOut = "TaskRunTimedOut"


I'm not sure I understand why this is going in the spec. SpecStatusCancelled is for letting a user cancel the TaskRun, but I'm not sure we'd want to be able to set the spec to TimedOut

@jerop's idea was to leverage the same approach we use when cancelling a PipelineRun to get its child TaskRuns and Runs to cancel as well, which is to set spec.status toTaskRunSpecStatusCancelled. If there's another way to signal the TaskRun that it should stop itself and report as timed out, I'm more than happy to see it. =)

could we reuse the timeout strategy we use for runs?

pipeline/pkg/reconciler/pipelinerun/pipelinerun.go

Lines 623 to 648 in d3b8017

// processRunTimeouts custom tasks are requested to cancel, if they have timed out. Custom tasks can do any cleanup

// during this step.

func (c *Reconciler) processRunTimeouts(ctx context.Context, pr *v1beta1.PipelineRun, pipelineState resources.PipelineRunState) error {

errs := []string{}

logger := logging.FromContext(ctx)

if pr.IsCancelled() {

return nil

}

for _, rpt := range pipelineState {

if rpt.IsCustomTask() {

if rpt.Run != nil && !rpt.Run.IsCancelled() && (pr.HasTimedOut(ctx, c.Clock) || (rpt.Run.HasTimedOut(c.Clock) && !rpt.Run.IsDone())) {

logger.Infof("Cancelling run task: %s due to timeout.", rpt.RunName)

err := cancelRun(ctx, rpt.RunName, pr.Namespace, c.PipelineClientSet)

if err != nil {

errs = append(errs,

fmt.Errorf("failed to patch Run `%s` with cancellation: %s", rpt.RunName, err).Error())

}

}

}

}

if len(errs) > 0 {

e := strings.Join(errs, "\n")

return fmt.Errorf("error(s) from processing cancel request for timed out Run(s) of PipelineRun %s: %s", pr.Name, e)

}

return nil

}

We could - the tradeoff is that if we just directly cancel the TaskRuns, like we do with Runs, the TaskRun will not report that it was timed out. I personally think it would be better to have that information in the TaskRun's status. And ideally, I'd want that in the timed-out Run's status, but since we don't control the custom task reconcilers, we can't guarantee they'd actually support a new spec status.

I tend to agree with @abayer there

Oh actually I like andrea's idea, that could make a lot of sense!

👍 We'd need to add a field to PipelineRunSpec alongside PipelineRunSpec.Status for a reason, right?

I think a user shouldn't need to provide a cancellation reason, and the PR controller can provide the reason for cancelling the taskrun, so I'm not sure we need to change pipelineRun.spec.status, however we might need to add a field to taskrun.spec.status

Oops, yeah, I meant taskRun.spec.status. Also I forgot to delete this in my latest commit. heh.

I agree a user shouldn't need to provide a cancellation reason, but they might want to - it could be an interesting possibility to give our users for auditing. That said, the same could be achieved a bit more loosely with annotations.

pkg/reconciler/pipelinerun/pipelinerun.go

abayer · 2022-07-14T18:54:16Z

pkg/reconciler/pipelinerun/pipelinerun.go

-		timeout = pr.TasksTimeout().Duration
+		// Subtract the current run time of the PipelineRun from the tasks timeout to ensure that the maximum timeout for this task
+		// is the remaining time in the tasks timeout since the PipelineRun was started.
+		timeout = pr.TasksTimeout().Duration - c.Since(pr.Status.StartTime.Time)


@lbernick - I think this solves the timeouts.Tasks issue you raised.

abayer · 2022-07-14T18:55:16Z

pkg/reconciler/pipelinerun/pipelinerun_test.go

+    name: test-pipeline
+  serviceAccountName: test-sa
+  timeouts:
+    tasks: 15m


@lbernick - here's a test verifying that the new TaskRun gets a timeout of timeouts.Tasks minus the current run time of the PipelineRun.

sorry, I still think this doesn't work. The pipelinerun started 1 day before "now", but its tasks timeout is 15 minutes. This means the task has already exceeded its timeout and will be created with a timeout of 1s here. (That's a known bug I've been wrestling with.)

I think the test case we actually want is a pipelinerun with some elapsed time having occurred, but still not having passed its "tasks" timeout. Previously, we would have set the taskrun timeout to the tasks timeout minus the elapsed time (covered in the test case below called "taskrun with elapsed time; task.timeout applies"), but it sounds like now, the new approach will be for the pipelinerun to cancel all taskruns when the "tasks" timeout is exceeded.

If it would be helpful, I can create a PR adding some test cases to the PR reconciler with what I have in mind and you can rebase onto that, so it's easier to tell what your PR changes? (I think it will be helpful regardless)

Right, that's actually what this test does - the PipelineRun started 5 minutes before "now" - startTime: "2021-12-31T23:55:00Z". =)

oh oops I didn't look at the start time carefully enough!

pkg/reconciler/pipelinerun/pipelinerun_test.go

abayer · 2022-08-18T14:48:38Z

...dangit, I tried to prune a bunch of old coverage report comments to make the PR discussion easier to read and it decided to log each of them here, creating a big ol' list of me deleting things. Oops. =)

abayer · 2022-08-18T15:10:37Z

/test pull-tekton-pipeline-integration-tests
/test pull-tekton-pipeline-alpha-integration-tests

vdemeester · 2022-08-18T15:21:16Z

pkg/apis/pipeline/v1beta1/pipelinerun_types.go

@@ -449,6 +487,12 @@ const (
 	GracefullyStoppedSkip SkippingReason = "PipelineRun was gracefully stopped"
 	// MissingResultsSkip means the task was skipped because it's missing necessary results
 	MissingResultsSkip SkippingReason = "Results were missing"
+	// PipelineTimedOutSkip means the task was skipped because the PipelineRun has passed its overall timeout.
+	PipelineTimedOutSkip SkippingReason = "PipelineRun timeout has been reached"


Agreeing with @abayer

vdemeester · 2022-08-18T15:25:15Z

pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go

+			c := run.Status.GetCondition(apis.ConditionSucceeded)
+			runCancelled := c.IsFalse() &&
+				c.Reason == v1alpha1.RunReasonCancelled &&
+				run.Spec.StatusMessage == v1alpha1.RunCancelledByPipelineTimeoutMsg


This sounds like a very weird behavior, but I agree, if we want to change it, it shouldn't be in that PR.

tekton-robot · 2022-08-18T15:26:27Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jerop, lbernick, vdemeester

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [jerop,lbernick,vdemeester]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

abayer · 2022-08-18T15:27:35Z

fwiw, it looks like that little change I made did get rid of the weird failures on the non-alpha test, so hey, we're all good. =)

pritidesai · 2022-08-18T20:18:43Z

Alright, thanks a bunch all of you @abayer @afrittoli @jerop @lbernick and @vdemeester Most of my concerns are addressed, I haven't got chance to review all the code changes but happy to cancel the hold, please feel free to merge this 🙏

A few nice to have items:

Most of these discussions/workflow can be captured in a doc
Add more e2e tests to cover all possible edge cases

/cancel hold

pritidesai · 2022-08-18T20:19:47Z

is it not /cancel hold 🤔

/hold cancel

jerop · 2022-08-19T12:40:30Z

/lgtm

excited to see none (or less xD) of the timeouts flakes, thank you again @abayer! 🎉

…reached TestWaitCustomTask_PipelineRun/Wait_Task_Retries_on_Timeout has been flaky for a while. This commit stops the PipelineRun reconciler from cancelling Run when it detects that the task-level Timeout configured for the Run has passed, which will address the flake (similar to tektoncd#5134 which addresses TestPipelineRunTimeout).

…reached TestWaitCustomTask_PipelineRun/Wait_Task_Retries_on_Timeout has been flaky for a while. This commit stops the PipelineRun reconciler from cancelling Run when it detects that the task-level Timeout configured for the Run has passed, which will address the flake (similar to #5134 which addresses TestPipelineRunTimeout).

@pxp928

Add start and end time, as well as details about the owner resource to to the resource requests. Example: NAME OWNERKIND OWNER SUCCEEDED REASON STARTTIME ENDTIME git-40e5840171b418bcbd0bfa73defec338 TaskRun git-resolver-p75s8 True 2022-10-05T09:16:08Z 2022-10-05T09:16:10Z git-6ecf81c8e0b418bcbd0c05c1bc3cd0c5 TaskRun git-resolver-tmvqd True 2022-10-05T09:11:20Z 2022-10-05T09:11:22Z git-e97b40047eb418bcbd0be5341ed71802 TaskRun git-resolver-xdq55 False ResolutionFailed 2022-10-05T09:19:51Z 2022-10-05T09:19:52Z Signed-off-by: Andrea Frittoli <andrea.frittoli@uk.ibm.com> Bump google.golang.org/grpc from 1.50.0 to 1.50.1 Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.50.0 to 1.50.1. - [Release notes](https://github.com/grpc/grpc-go/releases) - [Commits](grpc/grpc-go@v1.50.0...v1.50.1) --- updated-dependencies: - dependency-name: google.golang.org/grpc dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Migrate PipelineRun Reconciler__TestReconcileTaskResolutionError Signed-off-by: xin.li <xin.li@daocloud.io> Remove minimal-release.yaml and resolvers.yaml Closes tektoncd#5607 After discussion, we've decided to get rid of the separate `resolvers.yaml` and the resolver-less `minimal-release.yaml`. Signed-off-by: Andrew Bayer <andrew.bayer@gmail.com> Bump k8s.io/apimachinery from 0.25.2 to 0.25.3 Bumps [k8s.io/apimachinery](https://github.com/kubernetes/apimachinery) from 0.25.2 to 0.25.3. - [Release notes](https://github.com/kubernetes/apimachinery/releases) - [Commits](kubernetes/apimachinery@v0.25.2...v0.25.3) --- updated-dependencies: - dependency-name: k8s.io/apimachinery dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Bump k8s.io/api from 0.25.2 to 0.25.3 Bumps [k8s.io/api](https://github.com/kubernetes/api) from 0.25.2 to 0.25.3. - [Release notes](https://github.com/kubernetes/api/releases) - [Commits](kubernetes/api@v0.25.2...v0.25.3) --- updated-dependencies: - dependency-name: k8s.io/api dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Bump k8s.io/client-go from 0.25.2 to 0.25.3 Bumps [k8s.io/client-go](https://github.com/kubernetes/client-go) from 0.25.2 to 0.25.3. - [Release notes](https://github.com/kubernetes/client-go/releases) - [Changelog](https://github.com/kubernetes/client-go/blob/master/CHANGELOG.md) - [Commits](kubernetes/client-go@v0.25.2...v0.25.3) --- updated-dependencies: - dependency-name: k8s.io/client-go dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> resolution/framework : inject the request name in the context Similar to the namespace, it might be of interest for the resolver to get access to its name, as well as the namespace. Today this is only the case for the namespace. On possible use case for this is, if the resolver wants to create another kubernetes object and set owner reference on it. Signed-off-by: Vincent Demeester <vdemeest@redhat.com> CSI workspace to Beta This commit removes the alpha feature gate for the csi workspace so that it becomes a beta feature. Remove PipelineRun cancelation of Runs when Pipeline Task timeout is reached TestWaitCustomTask_PipelineRun/Wait_Task_Retries_on_Timeout has been flaky for a while. This commit stops the PipelineRun reconciler from cancelling Run when it detects that the task-level Timeout configured for the Run has passed, which will address the flake (similar to tektoncd#5134 which addresses TestPipelineRunTimeout). Bump github.com/containerd/containerd from 1.6.8 to 1.6.9 Bumps [github.com/containerd/containerd](https://github.com/containerd/containerd) from 1.6.8 to 1.6.9. - [Release notes](https://github.com/containerd/containerd/releases) - [Changelog](https://github.com/containerd/containerd/blob/main/RELEASES.md) - [Commits](containerd/containerd@v1.6.8...v1.6.9) --- updated-dependencies: - dependency-name: github.com/containerd/containerd dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Bump github.com/google/go-containerregistry from 0.11.0 to 0.12.0 Bumps [github.com/google/go-containerregistry](https://github.com/google/go-containerregistry) from 0.11.0 to 0.12.0. - [Release notes](https://github.com/google/go-containerregistry/releases) - [Changelog](https://github.com/google/go-containerregistry/blob/main/.goreleaser.yml) - [Commits](google/go-containerregistry@v0.11.0...v0.12.0) --- updated-dependencies: - dependency-name: github.com/google/go-containerregistry dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Bump github.com/stretchr/testify from 1.8.0 to 1.8.1 Bumps [github.com/stretchr/testify](https://github.com/stretchr/testify) from 1.8.0 to 1.8.1. - [Release notes](https://github.com/stretchr/testify/releases) - [Commits](stretchr/testify@v1.8.0...v1.8.1) --- updated-dependencies: - dependency-name: github.com/stretchr/testify dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Bump github.com/sigstore/sigstore from 1.4.4 to 1.4.5 Bumps [github.com/sigstore/sigstore](https://github.com/sigstore/sigstore) from 1.4.4 to 1.4.5. - [Release notes](https://github.com/sigstore/sigstore/releases) - [Commits](sigstore/sigstore@v1.4.4...v1.4.5) --- updated-dependencies: - dependency-name: github.com/sigstore/sigstore dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> fix tekton documentation contributor`s guide link Add Beta feature gate for projected workspace This commit adds the Beta feature gate for projected workspace in v1. [TEP-0115] Support Artifact Hub in Hub Resolver Part of [issues/667]. This commit adds support to resolve catalog resource from the [Artifact Hub] while keeping current functionality of fetching resources from Tekton Hub. - Change 1: The commit adds a new field `type` to the hub resolver indicating the type of the Hub to pull the resource from. The value can be set to `tekton` or `artifact`. By default, the resolver fetches resources from `https://artifacthub.io/` when setting `type` to `" artifact"`, and fetches resources from user's private instance of Tekton Hub when setting `type` to `"tekton"`. - Change 2: Prior to this change, the hub resolver only supports pulling resources from the Tekton Hub. This commit updates the default hub type to `artifact` since the [Artifact Hub][Artifact Hub] will be the main entrypoint for Tekton Catalogs in the future. - Change 3: Prior to this change, the default Tekton Hub URL is: `https://api.hub.tekton.dev`. This commit removes the default value of the Tekton Hub URL and enforces users to configure their own instance of Tekton Hub since the public instance `https://api.hub.tekton.dev` will be deprecated after the migration to Artifact Hub is done. /kind feature [Artifact Hub]: https://artifacthub.io/ [issues/667]: tektoncd/hub#667 [TEP-0089] Modify entrypoint to sign the results. Breaking down PR tektoncd#4759 originally proposed by @pxp928 to address TEP-0089 according @lumjjb suggestions. Plan for breaking down PR is PR 1.1: api PR 1.2: entrypointer (+cmd line + test/entrypointer) Entrypoint takes results and signs the results (termination message). PR 1.3: reconciler + pod + cmd/controller + integration tests Controller will verify the signed result. This commit corresponds to 1.2 above. Bump HorizontalPodAutoscaler apiVersion to v2 Before this, we get a warning when applying the HPA: Warning: autoscaling/v2beta1 HorizontalPodAutoscaler is deprecated in v1.22+, unavailable in v1.25+; use autoscaling/v2 HorizontalPodAutoscaler This also bumps the min version to 1.23. Signed-off-by: Vincent Demeester <vdemeest@redhat.com> [TEP-0089] Enable SPIRE for signing taskrun results in alpha. Breaking down PR tektoncd#4759 originally proposed by @pxp928 to address TEP-0089 according @lumjjb suggestions. Plan for breaking down PR is PR 1.1: api PR 1.2: entrypointer (+cmd line + test/entrypointer) Entrypoint takes results and signs the results (termination message). PR 1.3: reconciler + pod + cmd/controller + integration tests Controller will verify the signed result. This commit corresponds to 1.3 above.

tekton-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. labels Jul 13, 2022

tekton-robot requested review from jerop and pritidesai July 13, 2022 20:20

tekton-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jul 13, 2022

abayer force-pushed the rework-child-taskrun-timeout-logic branch from b84fc09 to 0d32280 Compare July 13, 2022 20:26

tekton-robot assigned afrittoli, dibyom, jerop, pritidesai and vdemeester Jul 13, 2022

abayer force-pushed the rework-child-taskrun-timeout-logic branch from 0d32280 to bd19261 Compare July 13, 2022 21:03

tekton-robot assigned lbernick Jul 14, 2022

jerop reviewed Jul 14, 2022

View reviewed changes

abayer force-pushed the rework-child-taskrun-timeout-logic branch from bd19261 to 52c419e Compare July 14, 2022 14:29

jerop reviewed Jul 14, 2022

View reviewed changes

lbernick reviewed Jul 14, 2022

View reviewed changes

abayer force-pushed the rework-child-taskrun-timeout-logic branch from 52c419e to 91a14c1 Compare July 14, 2022 15:00

abayer mentioned this pull request Jul 14, 2022

WIP: Ensure that PipelineRuns are marked as timed out if a task timed out due to the PR timeout #5133

Closed

7 tasks

abayer force-pushed the rework-child-taskrun-timeout-logic branch from 91a14c1 to fa5d05a Compare July 14, 2022 18:53

abayer commented Jul 14, 2022

View reviewed changes

pkg/reconciler/pipelinerun/pipelinerun_test.go Show resolved Hide resolved

tektoncd deleted a comment from tekton-robot Aug 18, 2022

vdemeester approved these changes Aug 18, 2022

View reviewed changes

tekton-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 18, 2022

pritidesai modified the milestones: Pipelines v0.39, Pipelines v0.40 Aug 19, 2022

tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 19, 2022

tekton-robot merged commit 58fd07c into tektoncd:main Aug 19, 2022

jerop mentioned this pull request Aug 19, 2022

TestPipelineRunTimeout is flaky #3460

Closed

lbernick mentioned this pull request Sep 8, 2022

task time exceeded timeouts.tasks when task retried #4071

Closed

This was referenced Oct 18, 2022

PipelineRun reconciler shouldn't care about how the Timeout field is implemented in Custom Task Run. #5653

Closed

TEP-0114: Resolve the Flaky Test - TestWaitCustomTask_PipelineRun #5658

Merged

lbernick mentioned this pull request Feb 10, 2023

timeouts is not working as expected. #6137

Closed

chengjoey mentioned this pull request Apr 13, 2023

TaskRun timeout is always set to 1h0m0s after v0.40 #6520

Closed

gabemontero mentioned this pull request Jan 10, 2024

allow pipeline runs whose task/custom runs have been deleted still timeout #7556

Closed

	// processRunTimeouts custom tasks are requested to cancel, if they have timed out. Custom tasks can do any cleanup
	// during this step.
	func (c Reconciler) processRunTimeouts(ctx context.Context, pr v1beta1.PipelineRun, pipelineState resources.PipelineRunState) error {
	errs := []string{}
	logger := logging.FromContext(ctx)
	if pr.IsCancelled() {
	return nil
	}
	for _, rpt := range pipelineState {
	if rpt.IsCustomTask() {
	if rpt.Run != nil && !rpt.Run.IsCancelled() && (pr.HasTimedOut(ctx, c.Clock) \|\| (rpt.Run.HasTimedOut(c.Clock) && !rpt.Run.IsDone())) {
	logger.Infof("Cancelling run task: %s due to timeout.", rpt.RunName)
	err := cancelRun(ctx, rpt.RunName, pr.Namespace, c.PipelineClientSet)
	if err != nil {
	errs = append(errs,
	fmt.Errorf("failed to patch Run `%s` with cancellation: %s", rpt.RunName, err).Error())
	}
	}
	}
	}
	if len(errs) > 0 {
	e := strings.Join(errs, "\n")
	return fmt.Errorf("error(s) from processing cancel request for timed out Run(s) of PipelineRun %s: %s", pr.Name, e)
	}
	return nil
	}

Switch PipelineRun timeout -> TaskRun logic to instead signal the TaskRuns to stop #5134

Switch PipelineRun timeout -> TaskRun logic to instead signal the TaskRuns to stop #5134

Conversation

abayer commented Jul 13, 2022 • edited Loading

Changes

Submitter Checklist

Release Notes

abayer commented Jul 13, 2022

abayer commented Jul 14, 2022

abayer commented Jul 14, 2022

jerop commented Jul 14, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lbernick left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abayer Jul 15, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abayer commented Aug 18, 2022

abayer commented Aug 18, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tekton-robot commented Aug 18, 2022

abayer commented Aug 18, 2022

pritidesai commented Aug 18, 2022

pritidesai commented Aug 18, 2022

jerop commented Aug 19, 2022

abayer commented Jul 13, 2022 •

edited

Loading

abayer Jul 15, 2022 •

edited

Loading