-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TaskRun timeout is always set to 1h0m0s after v0.40 #6520
Comments
I believe this is a duplicate of #6137. @syan-tibco can you take a look at that issue and lmk if this answers your question? In the example above, tekton should cancel the taskruns after 30s even if their timeout is 1h. You can use pipeline.spec.tasks[].timeout to set timeouts for child taskruns. |
hi @lbernick thanks for the response. It is similar issue but this change breaks our use case. Yes in the sample that I give tekton will cancel the taskrun after 30s. But when we have longer taskruns; the task will also be canceled in 1 hour. Here is the details use caseWe have some pipelines that need to run 6 ~ 7 hours. The task inside that pipeline might need to run 3 ~ 4 hours. It covers cases like:
problemThe pipelinerun that we submit apiVersion: tekton.dev/v1beta1
kind: PipelineRun
...
timeouts:
finally: 5m0s
pipeline: 2h0m0s
tasks: 1h55m0s This will cover 90% of cases. The longest pipeline timeout is 7h. Here the taskrun that I get
Then after 1 hour I will get error like conditions:
- lastTransitionTime: "2023-04-11T02:05:55Z"
message: TaskRun "generic-runner-910716586980-1681175155534-generic-runner"
failed to finish within "1h0m0s"
reason: TaskRunTimeout
status: "False"
type: Succeeded And it will delete the pods thus we will lost all logs. workaroundNo workaround for us; we have to revert our production setup to v0.39.0. Here is the sample taskrun for v0.39.0 apiVersion: tekton.dev/v1beta1
kind: TaskRun
...
timeout: 6h59m59.943281705s here is the pipelinerun setting apiVersion: tekton.dev/v1beta1
kind: PipelineRun
....
timeouts:
pipeline: 7h0m0s |
@syan-tibco have you tried setting pipeline.spec.tasks[].timeout? another option is to change the default timeout from 1h to something longer than 7h |
yeah I tried all of them. I setup timeout on all the possible place that I can set
I also change the default value to That is why I name the topic. In production, before I revert to v0.39.0; I tried to set |
This situation may have started from this PR. In v0.39.0, pipeline/pkg/reconciler/pipelinerun/pipelinerun.go Lines 1144 to 1155 in 6a789d4
after v0.39.0 taskrun timeout looks like it can only be received from pipelinetask.Timeout pipeline/pkg/reconciler/pipelinerun/pipelinerun.go Lines 842 to 844 in abd1849
is this as expected? may be I'm missing something |
I just tried to reproduce this on main and was unable to. Using the following PipelineRun:
a child TaskRun was created with timeout 90m (expected behavior). I also tried setting the default timeout to 10 minutes. Using the following PipelineRun:
a child TaskRun was created with timeout 10 minutes (also expected behavior). pipelineRun.spec.timeouts.tasks applies to the cumulative time taken by all child TaskRuns. When that timeout elapses, all children will be canceled. It's no longer used to calculate the timeout applied directly to each child TaskRun. This had the unfortunate unintended consequence of being a breaking change for PipelineRuns with I'm confused why in your initial bug you set default-timeout-minutes: "30" but didn't see this reflected on the taskrun. What version of tekton were you using when you observed this behavior, and what's the full yaml of your config-defaults configmap? |
Hi @lbernick thanks for the test case. I played for a while and finally understood the logic. Previously I set the default setting wrongly. Now I make the default value work. here is my test case: apiVersion: tekton.dev/v1
kind: PipelineRun
metadata:
name: timeout
spec:
timeouts:
pipeline: 2h
tasks: 10s
pipelineSpec:
tasks:
- name: task1
timeout: 90m
taskSpec:
steps:
- image: busybox
timeout: "20s"
script: |
echo "hello"
sleep 30
echo "goodbye" Here is my finding. We can set timeout in 4 places for a pipelinerun:
The surprising part is that we have 2 hidden minimum timeout (#1 and #3) and 1 maximum timeout (#2 and #4). In this sample, the timeout on taskrun will be set as In my actual use case I have 3 files ---
# file 1
apiVersion: tekton.dev/v1
kind: Task
metadata:
name: task1
spec:
steps:
- image: busybox
# timeout: "10m"
script: |
echo "hello"
sleep 1200
echo "goodbye"
---
# file 2
apiVersion: tekton.dev/v1
kind: Pipeline
metadata:
name: pipeline1
spec:
tasks:
- name: task1
taskRef:
name: task1
# timeout: "20m"
---
# file 3
apiVersion: tekton.dev/v1
kind: PipelineRun
metadata:
name: pipelinetimeout
spec:
timeouts:
pipeline: "2h"
tasks: "1h55m0s"
finally: "5m0s"
taskRunTemplate:
serviceAccountName: pipeline-cluster-admin
pipelineRef:
name: pipeline1 I guess if I set timeout on file 2; I should have correct maximum timeout. I will give a try and update the issue. The testing I am doing is under v0.46.0. I will need to test on v0.41.2. |
I tested in v0.41.2 and added timeout on file 2. It works. The confusing part is that when we have layered yaml references; normally the top layer will overwrite values in the inner layer. When I set this The timeout value in taskrun which comes from And finally, we have two timers running in parallel to countdown: one in pipelinerun and the other in taskrun. Make it hard to know which one is the source of the truce. Thanks for the help. I now have a workaround. Just share the feedback about timeout. We can close this issue if nothing needs to be enhanced. |
Thanks @syan-tibco, if there are any aspects of the documentation that can be improved here, please feel free to reopen the issue or open a pull request! |
Expected Behavior
The timeout in TaskRun should honor the timeout.tasks in pipelinerun
Actual Behavior
It works before v0.39. After v0.40 the timeout in TaskRun is always set to 1h0m0s.
Steps to Reproduce the Problem
PipelineRun --> pipeline --> task --> TaskRun
I intentionally change the timeout in config-defaults as
default-timeout-minutes: "30"
. Just to make sure I can see different values in TaskRunI can see the timeout is set correctly for
I can see the timeout is always 1h for
Additional Info
Kubernetes version:
Output of
kubectl version
:Tekton Pipeline version:
Output of
tkn version
orkubectl get pods -n tekton-pipelines -l app=tekton-pipelines-controller -o=jsonpath='{.items[0].metadata.labels.version}'
If this is indeed a bug; we want this to have a patch on 0.41.x. We have some GUI dependency on the old tekton status. (new tekton move task in status to different CR.)
The text was updated successfully, but these errors were encountered: