-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch PipelineRun timeout -> TaskRun logic to instead signal the TaskRuns to stop #5134
Switch PipelineRun timeout -> TaskRun logic to instead signal the TaskRuns to stop #5134
Conversation
b84fc09
to
0d32280
Compare
/wip (marking this as WIP because I want to get a good amount of test runs in to be sure it actually fixes the This is a rather significant change to how we actually time out |
0d32280
to
bd19261
Compare
I cherry-picked this onto #5077, since I'm running the e2e tests over and over on there anyway to validate kind jobs, and in the 19 |
/assign @lbernick |
@abayer glad to see these results, and thank you! 😀 |
@@ -3259,7 +3401,7 @@ func TestGetTaskRunTimeout(t *testing.T) { | |||
}, | |||
}, | |||
}, | |||
expected: &metav1.Duration{Duration: 10 * time.Minute}, | |||
expected: &metav1.Duration{Duration: 15 * time.Minute}, | |||
}, { | |||
name: "taskrun with elapsed time; timeouts.pipeline applies", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test case is confusing, it says timeouts.pipeline
applies but that field was not set 😕
(not caused by this change, was there before, just trying to understand the full thing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, lemme change that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tweaked it to actually use timeouts.pipeline
. =)
bd19261
to
52c419e
Compare
timeRemaining := (timeout - pElapsedTime) | ||
// Return the smaller of timeRemaining and rpt.pipelineTask.timeout | ||
if rpt.PipelineTask.Timeout != nil && rpt.PipelineTask.Timeout.Duration < timeRemaining { | ||
if rpt.PipelineTask.Timeout != nil && rpt.PipelineTask.Timeout.Duration < timeout { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the docstring for calculateTaskRunTimeout
needs to be updated to reflect this change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
} | ||
|
||
// timeoutPipelineTaskRuns patches `TaskRun` and `Run` with timed out status | ||
func timeoutPipelineTaskRuns(ctx context.Context, logger *zap.SugaredLogger, pr *v1beta1.PipelineRun, clientSet clientset.Interface) []string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function name is confusing - made me think we were timing out taskruns
but it's also timing out runs
can we split it into two functions - timeoutTaskRuns
and timeoutRuns
? then we call both of them in timeoutPipelineRun
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd actually go with renaming it to just timeoutPipelineTasks
and keeping TaskRun
and Run
timeout in here - the name was just copy-pasted from cancel.go
and renamed from cancelPipelineTaskRuns
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that works too, thank you!
@@ -621,7 +631,7 @@ func (c *Reconciler) processRunTimeouts(ctx context.Context, pr *v1beta1.Pipelin | |||
} | |||
for _, rpt := range pipelineState { | |||
if rpt.IsCustomTask() { | |||
if rpt.Run != nil && !rpt.Run.IsCancelled() && (pr.HasTimedOut(ctx, c.Clock) || (rpt.Run.HasTimedOut(c.Clock) && !rpt.Run.IsDone())) { | |||
if rpt.Run != nil && !rpt.Run.IsCancelled() && rpt.Run.HasTimedOut(c.Clock) && !rpt.Run.IsDone() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given that we're timing out runs in timeoutPipelineTaskRuns
- i'd think that we don't need to do it in processRunTimeouts
again - that is, we can remove this function
that way, timing out of taskruns and runs is processed at one point, or is there something i'm missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this confused me a bunch at first - previous to this PR, processRunTimeouts
handles both timing out a Run
because its parent PipelineRun
has timed out (which is the bit I removed here) and timing out a Run
because the Run
itself has timed out. There's no guarantee that a custom task reconciler will actually handle timeouts itself, so the only way we can guarantee that a Run
with a timeout set on it in a Pipeline
gets timed out is to do so in the PipelineRun
reconciler, hence this function sticking around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to add docstrings to clarify this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aaaaah makes sense now, +1 to adding a docstring will really help here
can we have a function named timeoutRun
which is used here and in timeoutPipelineTaskRuns
to remove the duplication?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not actually duplication any more - processRunTimeouts
and timeoutPipelineTasks
both end up calling cancelRun
, but the "should I time out the Run
?" logic is different.
// IsSpecTimedOut returns true if the TaskRun's spec status is set to TimedOut state | ||
func (tr *TaskRun) IsSpecTimedOut() bool { | ||
return tr.Spec.Status == TaskRunSpecStatusTimedOut | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add tests for this function 🙏🏾
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like this approach and it will probably help with #4071 as well! Nice catch, this was a sneaky bug! One challenge here is that our test coverage for timeouts is not very good, and I think this will result in timeouts.tasks and timeouts.finally not working as intended.
// TaskRunSpecStatusTimedOut indicates that the PipelineRun owning this task wants to mark it as timed out, | ||
// if not already cancelled or terminated | ||
TaskRunSpecStatusTimedOut = "TaskRunTimedOut" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand why this is going in the spec. SpecStatusCancelled is for letting a user cancel the TaskRun, but I'm not sure we'd want to be able to set the spec to TimedOut
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jerop's idea was to leverage the same approach we use when cancelling a PipelineRun
to get its child TaskRun
s and Run
s to cancel as well, which is to set spec.status
toTaskRunSpecStatusCancelled
. If there's another way to signal the TaskRun
that it should stop itself and report as timed out, I'm more than happy to see it. =)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we reuse the timeout strategy we use for runs?
pipeline/pkg/reconciler/pipelinerun/pipelinerun.go
Lines 623 to 648 in d3b8017
// processRunTimeouts custom tasks are requested to cancel, if they have timed out. Custom tasks can do any cleanup | |
// during this step. | |
func (c *Reconciler) processRunTimeouts(ctx context.Context, pr *v1beta1.PipelineRun, pipelineState resources.PipelineRunState) error { | |
errs := []string{} | |
logger := logging.FromContext(ctx) | |
if pr.IsCancelled() { | |
return nil | |
} | |
for _, rpt := range pipelineState { | |
if rpt.IsCustomTask() { | |
if rpt.Run != nil && !rpt.Run.IsCancelled() && (pr.HasTimedOut(ctx, c.Clock) || (rpt.Run.HasTimedOut(c.Clock) && !rpt.Run.IsDone())) { | |
logger.Infof("Cancelling run task: %s due to timeout.", rpt.RunName) | |
err := cancelRun(ctx, rpt.RunName, pr.Namespace, c.PipelineClientSet) | |
if err != nil { | |
errs = append(errs, | |
fmt.Errorf("failed to patch Run `%s` with cancellation: %s", rpt.RunName, err).Error()) | |
} | |
} | |
} | |
} | |
if len(errs) > 0 { | |
e := strings.Join(errs, "\n") | |
return fmt.Errorf("error(s) from processing cancel request for timed out Run(s) of PipelineRun %s: %s", pr.Name, e) | |
} | |
return nil | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could - the tradeoff is that if we just directly cancel the TaskRun
s, like we do with Run
s, the TaskRun
will not report that it was timed out. I personally think it would be better to have that information in the TaskRun
's status. And ideally, I'd want that in the timed-out Run
's status, but since we don't control the custom task reconcilers, we can't guarantee they'd actually support a new spec status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to agree with @abayer there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh actually I like andrea's idea, that could make a lot of sense!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 We'd need to add a field to PipelineRunSpec
alongside PipelineRunSpec.Status
for a reason, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a user shouldn't need to provide a cancellation reason, and the PR controller can provide the reason for cancelling the taskrun, so I'm not sure we need to change pipelineRun.spec.status
, however we might need to add a field to taskrun.spec.status
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, yeah, I meant taskRun.spec.status
. Also I forgot to delete this in my latest commit. heh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree a user shouldn't need to provide a cancellation reason, but they might want to - it could be an interesting possibility to give our users for auditing. That said, the same could be achieved a bit more loosely with annotations.
52c419e
to
91a14c1
Compare
91a14c1
to
fa5d05a
Compare
timeout = pr.TasksTimeout().Duration | ||
// Subtract the current run time of the PipelineRun from the tasks timeout to ensure that the maximum timeout for this task | ||
// is the remaining time in the tasks timeout since the PipelineRun was started. | ||
timeout = pr.TasksTimeout().Duration - c.Since(pr.Status.StartTime.Time) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lbernick - I think this solves the timeouts.Tasks
issue you raised.
name: test-pipeline | ||
serviceAccountName: test-sa | ||
timeouts: | ||
tasks: 15m |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lbernick - here's a test verifying that the new TaskRun
gets a timeout of timeouts.Tasks
minus the current run time of the PipelineRun
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I still think this doesn't work. The pipelinerun started 1 day before "now", but its tasks timeout is 15 minutes. This means the task has already exceeded its timeout and will be created with a timeout of 1s here. (That's a known bug I've been wrestling with.)
I think the test case we actually want is a pipelinerun with some elapsed time having occurred, but still not having passed its "tasks" timeout. Previously, we would have set the taskrun timeout to the tasks timeout minus the elapsed time (covered in the test case below called "taskrun with elapsed time; task.timeout applies"), but it sounds like now, the new approach will be for the pipelinerun to cancel all taskruns when the "tasks" timeout is exceeded.
If it would be helpful, I can create a PR adding some test cases to the PR reconciler with what I have in mind and you can rebase onto that, so it's easier to tell what your PR changes? (I think it will be helpful regardless)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, that's actually what this test does - the PipelineRun
started 5 minutes before "now" - startTime: "2021-12-31T23:55:00Z"
. =)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh oops I didn't look at the start time carefully enough!
...dangit, I tried to prune a bunch of old coverage report comments to make the PR discussion easier to read and it decided to log each of them here, creating a big ol' list of me deleting things. Oops. =) |
/test pull-tekton-pipeline-integration-tests |
@@ -449,6 +487,12 @@ const ( | |||
GracefullyStoppedSkip SkippingReason = "PipelineRun was gracefully stopped" | |||
// MissingResultsSkip means the task was skipped because it's missing necessary results | |||
MissingResultsSkip SkippingReason = "Results were missing" | |||
// PipelineTimedOutSkip means the task was skipped because the PipelineRun has passed its overall timeout. | |||
PipelineTimedOutSkip SkippingReason = "PipelineRun timeout has been reached" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreeing with @abayer
c := run.Status.GetCondition(apis.ConditionSucceeded) | ||
runCancelled := c.IsFalse() && | ||
c.Reason == v1alpha1.RunReasonCancelled && | ||
run.Spec.StatusMessage == v1alpha1.RunCancelledByPipelineTimeoutMsg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like a very weird behavior, but I agree, if we want to change it, it shouldn't be in that PR.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jerop, lbernick, vdemeester The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
fwiw, it looks like that little change I made did get rid of the weird failures on the non-alpha test, so hey, we're all good. =) |
Alright, thanks a bunch all of you @abayer @afrittoli @jerop @lbernick and @vdemeester Most of my concerns are addressed, I haven't got chance to review all the code changes but happy to cancel the hold, please feel free to merge this 🙏 A few nice to have items:
/cancel hold |
is it not /hold cancel |
/lgtm excited to see none (or less xD) of the timeouts flakes, thank you again @abayer! 🎉 |
…reached TestWaitCustomTask_PipelineRun/Wait_Task_Retries_on_Timeout has been flaky for a while. This commit stops the PipelineRun reconciler from cancelling Run when it detects that the task-level Timeout configured for the Run has passed, which will address the flake (similar to tektoncd#5134 which addresses TestPipelineRunTimeout).
…reached TestWaitCustomTask_PipelineRun/Wait_Task_Retries_on_Timeout has been flaky for a while. This commit stops the PipelineRun reconciler from cancelling Run when it detects that the task-level Timeout configured for the Run has passed, which will address the flake (similar to tektoncd#5134 which addresses TestPipelineRunTimeout).
…reached TestWaitCustomTask_PipelineRun/Wait_Task_Retries_on_Timeout has been flaky for a while. This commit stops the PipelineRun reconciler from cancelling Run when it detects that the task-level Timeout configured for the Run has passed, which will address the flake (similar to #5134 which addresses TestPipelineRunTimeout).
Add start and end time, as well as details about the owner resource to to the resource requests. Example: NAME OWNERKIND OWNER SUCCEEDED REASON STARTTIME ENDTIME git-40e5840171b418bcbd0bfa73defec338 TaskRun git-resolver-p75s8 True 2022-10-05T09:16:08Z 2022-10-05T09:16:10Z git-6ecf81c8e0b418bcbd0c05c1bc3cd0c5 TaskRun git-resolver-tmvqd True 2022-10-05T09:11:20Z 2022-10-05T09:11:22Z git-e97b40047eb418bcbd0be5341ed71802 TaskRun git-resolver-xdq55 False ResolutionFailed 2022-10-05T09:19:51Z 2022-10-05T09:19:52Z Signed-off-by: Andrea Frittoli <andrea.frittoli@uk.ibm.com> Bump google.golang.org/grpc from 1.50.0 to 1.50.1 Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.50.0 to 1.50.1. - [Release notes](https://github.com/grpc/grpc-go/releases) - [Commits](grpc/grpc-go@v1.50.0...v1.50.1) --- updated-dependencies: - dependency-name: google.golang.org/grpc dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Migrate PipelineRun Reconciler__TestReconcileTaskResolutionError Signed-off-by: xin.li <xin.li@daocloud.io> Remove minimal-release.yaml and resolvers.yaml Closes tektoncd#5607 After discussion, we've decided to get rid of the separate `resolvers.yaml` and the resolver-less `minimal-release.yaml`. Signed-off-by: Andrew Bayer <andrew.bayer@gmail.com> Bump k8s.io/apimachinery from 0.25.2 to 0.25.3 Bumps [k8s.io/apimachinery](https://github.com/kubernetes/apimachinery) from 0.25.2 to 0.25.3. - [Release notes](https://github.com/kubernetes/apimachinery/releases) - [Commits](kubernetes/apimachinery@v0.25.2...v0.25.3) --- updated-dependencies: - dependency-name: k8s.io/apimachinery dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Bump k8s.io/api from 0.25.2 to 0.25.3 Bumps [k8s.io/api](https://github.com/kubernetes/api) from 0.25.2 to 0.25.3. - [Release notes](https://github.com/kubernetes/api/releases) - [Commits](kubernetes/api@v0.25.2...v0.25.3) --- updated-dependencies: - dependency-name: k8s.io/api dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Bump k8s.io/client-go from 0.25.2 to 0.25.3 Bumps [k8s.io/client-go](https://github.com/kubernetes/client-go) from 0.25.2 to 0.25.3. - [Release notes](https://github.com/kubernetes/client-go/releases) - [Changelog](https://github.com/kubernetes/client-go/blob/master/CHANGELOG.md) - [Commits](kubernetes/client-go@v0.25.2...v0.25.3) --- updated-dependencies: - dependency-name: k8s.io/client-go dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> resolution/framework : inject the request name in the context Similar to the namespace, it might be of interest for the resolver to get access to its name, as well as the namespace. Today this is only the case for the namespace. On possible use case for this is, if the resolver wants to create another kubernetes object and set owner reference on it. Signed-off-by: Vincent Demeester <vdemeest@redhat.com> CSI workspace to Beta This commit removes the alpha feature gate for the csi workspace so that it becomes a beta feature. Remove PipelineRun cancelation of Runs when Pipeline Task timeout is reached TestWaitCustomTask_PipelineRun/Wait_Task_Retries_on_Timeout has been flaky for a while. This commit stops the PipelineRun reconciler from cancelling Run when it detects that the task-level Timeout configured for the Run has passed, which will address the flake (similar to tektoncd#5134 which addresses TestPipelineRunTimeout). Bump github.com/containerd/containerd from 1.6.8 to 1.6.9 Bumps [github.com/containerd/containerd](https://github.com/containerd/containerd) from 1.6.8 to 1.6.9. - [Release notes](https://github.com/containerd/containerd/releases) - [Changelog](https://github.com/containerd/containerd/blob/main/RELEASES.md) - [Commits](containerd/containerd@v1.6.8...v1.6.9) --- updated-dependencies: - dependency-name: github.com/containerd/containerd dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Bump github.com/google/go-containerregistry from 0.11.0 to 0.12.0 Bumps [github.com/google/go-containerregistry](https://github.com/google/go-containerregistry) from 0.11.0 to 0.12.0. - [Release notes](https://github.com/google/go-containerregistry/releases) - [Changelog](https://github.com/google/go-containerregistry/blob/main/.goreleaser.yml) - [Commits](google/go-containerregistry@v0.11.0...v0.12.0) --- updated-dependencies: - dependency-name: github.com/google/go-containerregistry dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Bump github.com/stretchr/testify from 1.8.0 to 1.8.1 Bumps [github.com/stretchr/testify](https://github.com/stretchr/testify) from 1.8.0 to 1.8.1. - [Release notes](https://github.com/stretchr/testify/releases) - [Commits](stretchr/testify@v1.8.0...v1.8.1) --- updated-dependencies: - dependency-name: github.com/stretchr/testify dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Bump github.com/sigstore/sigstore from 1.4.4 to 1.4.5 Bumps [github.com/sigstore/sigstore](https://github.com/sigstore/sigstore) from 1.4.4 to 1.4.5. - [Release notes](https://github.com/sigstore/sigstore/releases) - [Commits](sigstore/sigstore@v1.4.4...v1.4.5) --- updated-dependencies: - dependency-name: github.com/sigstore/sigstore dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> fix tekton documentation contributor`s guide link Add Beta feature gate for projected workspace This commit adds the Beta feature gate for projected workspace in v1. [TEP-0115] Support Artifact Hub in Hub Resolver Part of [issues/667]. This commit adds support to resolve catalog resource from the [Artifact Hub] while keeping current functionality of fetching resources from Tekton Hub. - Change 1: The commit adds a new field `type` to the hub resolver indicating the type of the Hub to pull the resource from. The value can be set to `tekton` or `artifact`. By default, the resolver fetches resources from `https://artifacthub.io/` when setting `type` to `" artifact"`, and fetches resources from user's private instance of Tekton Hub when setting `type` to `"tekton"`. - Change 2: Prior to this change, the hub resolver only supports pulling resources from the Tekton Hub. This commit updates the default hub type to `artifact` since the [Artifact Hub][Artifact Hub] will be the main entrypoint for Tekton Catalogs in the future. - Change 3: Prior to this change, the default Tekton Hub URL is: `https://api.hub.tekton.dev`. This commit removes the default value of the Tekton Hub URL and enforces users to configure their own instance of Tekton Hub since the public instance `https://api.hub.tekton.dev` will be deprecated after the migration to Artifact Hub is done. /kind feature [Artifact Hub]: https://artifacthub.io/ [issues/667]: tektoncd/hub#667 [TEP-0089] Modify entrypoint to sign the results. Breaking down PR tektoncd#4759 originally proposed by @pxp928 to address TEP-0089 according @lumjjb suggestions. Plan for breaking down PR is PR 1.1: api PR 1.2: entrypointer (+cmd line + test/entrypointer) Entrypoint takes results and signs the results (termination message). PR 1.3: reconciler + pod + cmd/controller + integration tests Controller will verify the signed result. This commit corresponds to 1.2 above. Bump HorizontalPodAutoscaler apiVersion to v2 Before this, we get a warning when applying the HPA: Warning: autoscaling/v2beta1 HorizontalPodAutoscaler is deprecated in v1.22+, unavailable in v1.25+; use autoscaling/v2 HorizontalPodAutoscaler This also bumps the min version to 1.23. Signed-off-by: Vincent Demeester <vdemeest@redhat.com> [TEP-0089] Enable SPIRE for signing taskrun results in alpha. Breaking down PR tektoncd#4759 originally proposed by @pxp928 to address TEP-0089 according @lumjjb suggestions. Plan for breaking down PR is PR 1.1: api PR 1.2: entrypointer (+cmd line + test/entrypointer) Entrypoint takes results and signs the results (termination message). PR 1.3: reconciler + pod + cmd/controller + integration tests Controller will verify the signed result. This commit corresponds to 1.3 above.
Changes
Fixes #5127, #3460
An alternative approach vs #5133, attempting to eliminate the problem entirely rather than just working around it.
As noted in #5127, the logic around calculating a timeout for a
PipelineRun
'sTaskRun
to make sure that theTaskRun
's timeout is always going to end before thePipelineRun
's timeout ends is problematic. It can result in race conditions where aTaskRun
gets timed out, immediately followed by thePipelineRun
being reconciled while not yet having hit the end of its own timeout. This change gets rid of that behavior, and instead sets theTaskRun.Spec.Status
to a new value,TaskRunTimedOut
, with theTaskRun
reconciler handling that in the same way it does settingTaskRun.Spec.Status
toTaskRunCancelled
.By doing this, we can unify the behavior for both
TaskRun
s andRun
s uponPipelineRun
s timing out, given that we already cancelRun
s uponPipelineRun
timeout, and we can kill off a bunch of flaky outcomes forPipelineRun
s.Also, rather than setting a 1s timeout on tasks which are created after the
.timeouts.pipeline
,.timeouts.tasks
, or.timeouts.finally
timeouts have been passed, this change will skip creation of thoseTaskRun
s orRun
s completely. However, it will still report those "tasks" as failed, not skipped, since attempted creation of tasks after the appropriate timeout has been reached is a failure case, and that's the existing expected behavior forPipelineTask
s which try to start after timeouts have elapsed.Huge thanks to @jerop for suggesting this approach!
/kind bug
Submitter Checklist
As the author of this PR, please check off the items in this checklist:
functionality, content, code)
/kind <type>
. Valid types are bug, cleanup, design, documentation, feature, flake, misc, question, tepRelease Notes