-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TEP-0050 - ignore task failures #342
Conversation
b0fb546
to
07372ce
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/assign
teps/0050-ignore-task-failures.md
Outdated
|
||
* As a pipeline author, I would like to design a pipeline where a task running | ||
[unit tests](https://github.com/tektoncd/catalog/tree/master/task/golang-test/0.1) might fail, | ||
but can continue running integration tests, so that my pipeline can identify failures in both the tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to explain a bit more here? one way this could be addressed today is to run the unit tests and integration tests in parallel; that way the unit tests can fail but the integration tests will still complete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pritidesai i was thinking about this more and i have a couple ideas of how we can tweak the use cases so that running the tests in parallel won't solve the problem (maybe this is what you already had in mind)
- A "warning only" use case - e.g. a pipeline wants to run some tests, and in parallel it wants to run something option, e.g. maybe some really intense linting, and the pipeline doesn't need to fail if the linting fails
| |
v v
linting unit tests
|
v
integration tests
There's another very similar use case, where you have the same pipeline as above, but you DO want linting to fail the pipeline, you just don't want it to interrupt the unit tests -> integration tests branch <-- for that case i wonder if pipelines in pipelines is more appropriate? not sure!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @bobcatfish for looking into it 😻
Yes, pipeline in pipeline can be an appropriate solution to not interrupt the combo of unit tests -> integration tests
. I will add both the points (running in parallel and pipeline in pipeline) in the TEP.
But, I was thinking of a use case in which I don't want linting
failure or unit test
failure to interrupt deploying an application to a staging cluster so that the new features in the app can be shared for early feedback.
| | |
v v v
linting unit tests integration tests
|
v
deploying to staging cluster
One more use case I was thinking of having one single GitHub repo with multiple runtimes, and building/deploying an image for each runtime, and I don't want failure in one runtime interrupt the other runtime. But we can argue that this use case can fit well with pipeline in pipeline
approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's another very similar use case, where you have the same pipeline as above, but you DO want linting to fail the pipeline, you just don't want it to interrupt the unit tests -> integration tests branch <-- for that case i wonder if pipelines in pipelines is more appropriate? not sure!
@bobcatfish @pritidesai could this be handled at the pipeline level too —aka some configuration of the pipeline that says "don't interup other branches" execution ?
I see two different problem we are trying to tackle here:
- Do not fail the pipeline if a task fail (and continue the execution, …)
- Do fail the pipeline if a task fail but do not interrupt other branches (or even the current one, also this would seem weird to me)
Do we want to tackle those at the same time, in the TEP ? I also tend to agree that something might be better handle by pipeline in pipeline
approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do fail the pipeline if a task fail but do not interrupt other branches (or even the current one, also this would seem weird to me)
I agree - I think we're trying to tackle (1) and could leave (2) as separate (maybe pipelines in pipelines or something there is the best solution for (2))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @bobcatfish @vdemeester
yup sure, I just updated the TEP to include only (1)
but can continue running tests, so that my pipeline can report failures from the linting and all the tests. | ||
|
||
* As a new Tekton user, I want to migrate existing workflows from the other CI/CD systems that allowed a | ||
similar task unit of failure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to give some examples? i think a lot of systems that folks are migrating from dont have the same parallelism that Pipelines support, so I'm wondering if that might be the answer to most of these cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jenkins (a Jenkinsfile) would be an example 😛
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can include an example of a Jenkinsfile (with a failure allowed) that we'd be migrating? I'm struggling with trying to understand if one of the main motivations around the migration use case is that many CD systems do not allow for parallel execution, so it'd be great to see a counter example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, I can add an example, there are other tools as well, Travis, Argo Workflow, etc
07372ce
to
7a384a4
Compare
/kind tep |
/assign |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As commented, I am a bit fuzzy if we are tackling two use cases at the same time or not, and if we should.
teps/0050-ignore-task-failures.md
Outdated
|
||
* As a pipeline author, I would like to design a pipeline where a task running | ||
[unit tests](https://github.com/tektoncd/catalog/tree/master/task/golang-test/0.1) might fail, | ||
but can continue running integration tests, so that my pipeline can identify failures in both the tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's another very similar use case, where you have the same pipeline as above, but you DO want linting to fail the pipeline, you just don't want it to interrupt the unit tests -> integration tests branch <-- for that case i wonder if pipelines in pipelines is more appropriate? not sure!
@bobcatfish @pritidesai could this be handled at the pipeline level too —aka some configuration of the pipeline that says "don't interup other branches" execution ?
I see two different problem we are trying to tackle here:
- Do not fail the pipeline if a task fail (and continue the execution, …)
- Do fail the pipeline if a task fail but do not interrupt other branches (or even the current one, also this would seem weird to me)
Do we want to tackle those at the same time, in the TEP ? I also tend to agree that something might be better handle by pipeline in pipeline
approach.
but can continue running tests, so that my pipeline can report failures from the linting and all the tests. | ||
|
||
* As a new Tekton user, I want to migrate existing workflows from the other CI/CD systems that allowed a | ||
similar task unit of failure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jenkins (a Jenkinsfile) would be an example 😛
From my end, the only thing missing to merge the problem statement is to be clear on which of the 2 cases (or both) that @vdemeester pointed out that we want to tackle and then i think we'll be gtg! |
8f9766f
to
eaecdea
Compare
/assign @vdemeester |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
/approve
just waiting for @vdemeester to approve (and maybe @jerop since you're assigned?)
/hold
} | ||
} | ||
} | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!! thanks this example really helps
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bobcatfish The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Adding a tep to ignore task failures, allowing pipeline authors to unblock execution after a single failure
eaecdea
to
ff0c820
Compare
but the pipeline can continue running integration tests and deploying an application to a staging cluster, so that the | ||
application can be shared with other developers for early feedback. | ||
|
||
For example, do not fail the pipeline if `unit tests` fail. Continue deploying to a staging cluster if integration tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pritidesai does do not fail the pipeline
mean that not only does execution continue but also the pipelinerun status should not be failure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or is it just about the execution flow and it doesn't affect the overall status?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @jerop the high level idea here is to ignore/allow pipelineTask
failure, continue execution and do not fail the pipelineRun
.
@pritidesai does do not fail the pipeline mean that not only does execution continue but also the pipelinerun status should not be failure?
Yes, pipelineRun
status should not be failure. I will cover this in the design proposal with potential alternatives. We could set the overall pipelineRun
status to success
, completed
, success with warning
, or something more relevant.
In this example, `linting` and `unit tests` are executed in parallel. This specific use case can be | ||
supported by `pipeline in pipeline` approach by creating two separate pipelines, | ||
(1) `linting` -> `report linter output` and (2) `unit tests` -> `integration tests`. | ||
`pipeline in pipeline` approach would be better fit when you want `linting` to fail the pipeline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the pipeline
here referring to the first sub-pipeline or the whole pipeline?
if pipelines in pipelines approach is used, it might be better for report linter output
to be a finally task which would then report the linting output and the tests sub-pipeline would execute independently in parallel so linting
failure won't affect it
what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the pipeline here referring to the first sub-pipeline or the whole pipeline?
Its referring to the sub pipeline and in turn the whole pipeline since the sub pipeline status would be reflected in the overall pipelineRun status.
what do you think?
Yup, having report linter output
under finally
won't be affected in terms of the execution by the linting
failure but that sub pipeline would still result in failure
.
Here the idea is to treat linting
as a warning. linting
is part of the pipeline but its failure has no impact on the execution of the rest of the pipeline or the overall status.
Edit: And, if we combine two features pipeline in pipeline
and ignore/allow failure of the linter sub-pipeline (when defined as a pipelineTask
) would not impact overall status as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question, how does this relates to TEP-0007 ?
- Design a task failure strategy so that the pipeline author can control the behavior of the underlying tasks
and decide whether to continue executing the rest of the graph in the event of failure.
This goal seems to be aligned with TEP-0007. I guess the only difference is the final status of the Pipeline (succeeded in this TEP, still fail in 0007).
If we go with this approach, is the other TEP required ? and vice-versa ?
yes yes, the two are aligned and that's why I'd put #258 on hold to explore failure strategies more broadly beyond allowing skipping a single task, which led to creating Pipelines in Pipelines to solve for some of the failure scenarios if you define
which one did you intend to address here @pritidesai? regardless, these issues should/will be solved together and there are some thoughts about the overlap in this doc I shared with @pritidesai |
Thanks @vdemeester and @jerop
TEP-0007 is limited to a skipped task i.e. a task is skipped and not executed. The main goal of this TEP is to ignore
Yes both the TEPs are required, as both are addressing the flow control based on the different statuses of a In the following example,
Now, in this example,
Now, with TEP-0007,
|
thanks @jerop, this TEP is focused on ignoring task failures after getting an error. |
/lgtm |
thanks for the reviews @bobcatfish @vdemeester @jerop 🙏 /hold cancel |
TEP-0040 is proposing a TEP to ignore step errors. We have identified a need for two separate proposals during the multiple rounds of discussion in
TEP-0040
:(1) ignore errors at the step level (
TEP-0040
)(2) ignore failures at the task level
Proposing a TEP for (2) to ignore failures at the task level and allow pipeline authors to unblock execution after a single task failure.