Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implementing pipeline level finally #2661

Merged
merged 1 commit into from
Jun 26, 2020

Conversation

pritidesai
Copy link
Member

@pritidesai pritidesai commented May 21, 2020

Changes

We can now specify a list of tasks needs to be executed just before
pipeline exits (either after finishing all non-final tasks successfully or after
a single failure)

Most useful for tasks such as report test results, cleanup cluster resources, etc

Summary of Changes:

  1. pipelines.md: Documented final tasks including how to specify workspaces and params, documenting updates to pipeline run status as a result of finally clause, and known limitations (task results, from clause, pipeline results, runAfter, and conditions).
  2. pipelinerun-with-final-tasks.yaml: Example pipeline demonstrating cloning a repo into shared workspace and cleaning up the workspace.
  3. pipelinerun.go: Building graph for final tasks, build pipelineState for both DAG tasks and final tasks, invoke final tasks once all DAG tasks are done (execution queue is empty)
  4. pipelinerun_test.go: Test three major test pipelines, (1) pipeline when a DAG task fails but final task succeeds, (2) pipeline when all DAG task succeeds but a final task fails, (3) pipeline when one of the DAG tasks fail and a final task fails
  5. pipelinerunresolution.go: Replace SuccessfulPipelineTaskNames with SuccessfulOrSkippedDAGTasks like explained here. Two helper functions isDAGTasksDone, isDAGTaskFailed . GetFinalTasks to return final tasks once they are eligible to execute.
  6. pipelinefinally_test.go: Validate pipelineRun Status and taskRun Status along with competition time making sure (1) Final tasks are executed after all DAG tasks succeeds (2) Pipeline results in failure since a DAG task fails but final tasks does get executed successfully (3) Pipeline exits with failure when final task execution fails (4) The DAG task is skipped due to condition failure, final tasks is still executed.

Example:

apiVersion: tekton.dev/v1beta1
kind: Pipeline
metadata:
  name: pipeline-with-final-tasks
spec:
  tasks:
    - name: pre-work
      taskRef:
        Name: some-pre-work
    - name: unit-test
      taskRef:
        Name: run-unit-test
      runAfter:
        - pre-work
    - name: integration-test
      taskRef:
        Name: run-integration-test
      runAfter:
        - unit-test
  finally:
    - name: cleanup-test
      taskRef:
        Name: cleanup-cluster
    - name: report-results
      taskRef:
        Name: report-test-results

Design doc: https://docs.google.com/document/d/1lxpYQHppiWOxsn4arqbwAFDo4T0-LCqpNa6p-TJdHrw/edit#

Part of #1684
Fixes #2446

Depends on: #2650

Submitter Checklist

These are the criteria that every PR should meet, please check them off as you
review them:

See the contribution guide for more details.

Double check this list of stuff that's easy to miss:

Reviewer Notes

If API changes are included, additive changes must be approved by at least two OWNERS and backwards incompatible changes must be approved by more than 50% of the OWNERS, and they must first be added in a backwards compatible way.

Release Notes

Users can now specify Tasks within a Pipeline that will always execute, even if Tasks fail, via the new `finally` clause

@tekton-robot tekton-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 21, 2020
@tekton-robot tekton-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label May 21, 2020
@tekton-robot
Copy link
Collaborator

This PR cannot be merged: expecting exactly one kind/ label

Available kind/ labels are:

kind/bug: Categorizes issue or PR as related to a bug.
kind/flake: Categorizes issue or PR as related to a flakey test
kind/cleanup: Categorizes issue or PR as related to cleaning up code, process, or technical debt.
kind/design: Categorizes issue or PR as related to design.
kind/documentation: Categorizes issue or PR as related to documentation.
kind/feature: Categorizes issue or PR as related to a new feature.
kind/misc: Categorizes issue or PR as a miscellaneuous one.

2 similar comments
@tekton-robot
Copy link
Collaborator

This PR cannot be merged: expecting exactly one kind/ label

Available kind/ labels are:

kind/bug: Categorizes issue or PR as related to a bug.
kind/flake: Categorizes issue or PR as related to a flakey test
kind/cleanup: Categorizes issue or PR as related to cleaning up code, process, or technical debt.
kind/design: Categorizes issue or PR as related to design.
kind/documentation: Categorizes issue or PR as related to documentation.
kind/feature: Categorizes issue or PR as related to a new feature.
kind/misc: Categorizes issue or PR as a miscellaneuous one.

@tekton-robot
Copy link
Collaborator

This PR cannot be merged: expecting exactly one kind/ label

Available kind/ labels are:

kind/bug: Categorizes issue or PR as related to a bug.
kind/flake: Categorizes issue or PR as related to a flakey test
kind/cleanup: Categorizes issue or PR as related to cleaning up code, process, or technical debt.
kind/design: Categorizes issue or PR as related to design.
kind/documentation: Categorizes issue or PR as related to documentation.
kind/feature: Categorizes issue or PR as related to a new feature.
kind/misc: Categorizes issue or PR as a miscellaneuous one.

@pritidesai
Copy link
Member Author

/kind feature

@tekton-robot tekton-robot added the kind/feature Categorizes issue or PR as related to a new feature. label May 21, 2020
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/pipelinerun.go 75.3% 76.1% 0.8
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go 90.5% 92.6% 2.1

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/pipelinerun.go 75.3% 76.1% 0.8
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go 90.5% 92.6% 2.1

Copy link
Collaborator

@bobcatfish bobcatfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mostly just looked at the docs so some comments there, but mostly I think this feature is looking great!!! I was focusing on reviewing the validation PR (#2650) and can review this in more detail after :D

docs/pipelines.md Outdated Show resolved Hide resolved
docs/pipelines.md Outdated Show resolved Hide resolved
docs/pipelines.md Outdated Show resolved Hide resolved
docs/pipelines.md Outdated Show resolved Hide resolved
docs/pipelines.md Outdated Show resolved Hide resolved
pkg/reconciler/pipelinerun/pipelinerun.go Show resolved Hide resolved
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/pipelinerun.go 75.3% 76.1% 0.8
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go 90.5% 92.6% 2.1

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/pipelinerun.go 75.3% 76.1% 0.8
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go 90.5% 92.6% 2.1

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/pipelinerun.go 75.3% 76.2% 0.9
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go 90.5% 92.7% 2.3

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/pipelinerun.go 75.9% 76.8% 0.9
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go 90.5% 92.7% 2.3

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/pipelinerun.go 75.9% 76.8% 0.9
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go 90.5% 92.7% 2.3

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/pipelinerun.go 80.5% 81.2% 0.8
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go 90.5% 92.6% 2.1

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
internal/builder/v1beta1/pipeline.go 82.0% 82.1% 0.2
pkg/reconciler/pipelinerun/pipelinerun.go 84.7% 84.9% 0.2
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go 92.2% 93.1% 0.9

@pritidesai
Copy link
Member Author

/test pull-tekton-pipeline-integration-tests

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
internal/builder/v1beta1/pipeline.go 82.0% 82.1% 0.2
pkg/reconciler/pipelinerun/pipelinerun.go 84.6% 84.8% 0.2
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go 92.2% 93.1% 0.9

@pritidesai
Copy link
Member Author

@bobcatfish @afrittoli @vdemeester I think I have addressed all the comments so far and ready for review.

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
internal/builder/v1beta1/pipeline.go 82.0% 82.1% 0.2
pkg/reconciler/pipelinerun/pipelinerun.go 85.0% 85.2% 0.2
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go 92.2% 93.1% 0.9

Copy link
Collaborator

@bobcatfish bobcatfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the back and forth on this @pritidesai !

I'm still not sure why we have isDAGTasksStopped and checkTasksDone (details below) but since we've gone back and forth on this so much and since this is a detail completely hidden from the users and since the functions aren't even exported i'm happy to merge as-is.

Other than that, I'm happy to merge! Consider the PR approved from me :)

@vdemeester @afrittoli do you have any other blocking feedback?


Okay about isDAGTasksStopped and checkTasksDone:

I don't understand why we need both isDAGTasksStopped and checkTasksDone - both are called with a dag.Graph and both seem to want to return true if all tasks in the dag.Graph have stopped executing, but they have different names and totally different logic

is it reasonable to have checkTasksDone be the only function you need to call to be able to tell if all tasks in a dag.Graph have finished executing?

@@ -498,7 +568,7 @@ func GetPipelineConditionStatus(pr *v1beta1.PipelineRun, state PipelineRunState,

// Hasn't timed out; not all tasks have finished....
// Must keep running then....
if failedTasks > 0 || cancelledTasks > 0 {
if cancelledTasks > 0 || (failedTasks > 0 && state.checkTasksDone(dfinally)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pritidesai : this additional check is making sure that pipeline does not exit when final tasks are not done executing.
Without this additional check state.isFinalTasksDone(dfinally), GetPipelineConditionStatus changes the pipeline status to stopping irrespective of where the final tasks are in their execution cycle (scheduled, started, running, failed, succeeded, retrying etc)

Thanks for explaining - can you add a comment explaining this?

I think it might also be worth explaining this progression in the markdown docs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @bobcatfish I have updated doc and added comment explaining this.

We can now specify a list of tasks needs to be executed just before
pipeline exits (either after finishing all non-final tasks successfully or after
a single failure)

Most useful for tasks such as report test results, cleanup cluster resources, etc

```
apiVersion: tekton.dev/v1beta1
kind: Pipeline
metadata:
  name: pipeline-with-final-tasks
spec:
  tasks:
    - name: pre-work
      taskRef:
        Name: some-pre-work
    - name: unit-test
      taskRef:
        Name: run-unit-test
      runAfter:
        - pre-work
    - name: integration-test
      taskRef:
        Name: run-integration-test
      runAfter:
        - unit-test
  finally:
    - name: cleanup-test
      taskRef:
        Name: cleanup-cluster
    - name: report-results
      taskRef:
        Name: report-test-results
```
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
internal/builder/v1beta1/pipeline.go 82.0% 82.1% 0.2
pkg/reconciler/pipelinerun/pipelinerun.go 85.0% 85.2% 0.2
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go 92.2% 93.1% 0.9

@afrittoli
Copy link
Member

Thanks for all the back and forth on this @pritidesai !

I'm still not sure why we have isDAGTasksStopped and checkTasksDone (details below) but since we've gone back and forth on this so much and since this is a detail completely hidden from the users and since the functions aren't even exported i'm happy to merge as-is.

isDAGTasksStopped adds more checks compared to checkTasksDone but still I agree that it could be implemented as a single function. I think the problem will disappear once we refactor the pipelineresolution module a bit, so don't think we need to iterate further on this one.

Other than that, I'm happy to merge! Consider the PR approved from me :)

@vdemeester @afrittoli do you have any other blocking feedback?

No, I would go ahead with this.
Once this is merge we can start the pipelineresolution refactor work.

Okay about isDAGTasksStopped and checkTasksDone:

I don't understand why we need both isDAGTasksStopped and checkTasksDone - both are called with a dag.Graph and both seem to want to return true if all tasks in the dag.Graph have stopped executing, but they have different names and totally different logic

is it reasonable to have checkTasksDone be the only function you need to call to be able to tell if all tasks in a dag.Graph have finished executing?

Copy link
Member

@afrittoli afrittoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the latest updates!
/lgtm

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 25, 2020
@afrittoli
Copy link
Member

/hold
to give @vdemeester too an opportunity to comment.

@tekton-robot tekton-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 25, 2020
@pritidesai
Copy link
Member Author

pritidesai commented Jun 25, 2020

Okay about isDAGTasksStopped and checkTasksDone:

I don't understand why we need both isDAGTasksStopped and checkTasksDone - both are called with a dag.Graph and both seem to want to return true if all tasks in the dag.Graph have stopped executing, but they have different names and totally different logic

is it reasonable to have checkTasksDone be the only function you need to call to be able to tell if all tasks in a dag.Graph have finished executing?

Sorry for making it so difficult to understand 😢 I will make one more attempt to explain it (while refactoring GetPipelineConditionStatus as we speak) 🙏

Pipeline is assigned stopping state as soon as a dag task failure is discovered. In this stopping mode, pipeline is waiting for currently running tasks to finish and does not schedule any new tasks. Without any final tasks, pipeline is transitioned to failed state from stopping state. isDAGTasksStopped is checking if all those running tasks are done so that it can schedule final tasks.

checkTasksDone works for a pipeline with all successful dag tasks and/or skipped dag tasks or a failed task without any parallel running tasks (this last part is overlapping with isDAGTasksStopped).

This is the result of GetPipelineConditionStatus considering not started tasks skipped on a single failure. There is no issue with it except it is not tracked outside of GetPipelineConditionStatus. This decision can be moved to isSkipped function which is also called by checkTasksDone and we would be able to drop isDAGTasksStopped.

And with or without stopping state, GetPipelineConditionStatus iterating over the pipelineState forms a list of skippedTasks and determines that its the end of pipeline, no more tasks to execute. But this information is getting lost when reconciler is building pipelineState in next iteration (from informer's copy). At any time, a task in pipelineState must be aware of whether its will be scheduled in future or was skipped because of parent was skipped or now pipeline in stopping state.

Hope this make sense.

@pritidesai
Copy link
Member Author

pritidesai commented Jun 26, 2020

@bobcatfish @afrittoli if its helpful, these changes can directly go on top of this PR and gets rid of extra isDAGTasksStopped function we are concerned about. It simplifies GetPipelineConditionStatus a bit along the way (build and all unit tests passes)

@vdemeester
Copy link
Member

No blocker from my side 👼 I would have prefered not using the dag for the finally part but… ain't such a big deal, it can be refactor and evolve and it is relatively well tested so, all good 😉

/approve
/hold cancel

@tekton-robot tekton-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 26, 2020
@tekton-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vdemeester

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 26, 2020
@tekton-robot tekton-robot merged commit f7d78a1 into tektoncd:master Jun 26, 2020
@bobcatfish
Copy link
Collaborator

@bobcatfish @afrittoli if its helpful, these changes can directly go on top of this PR and gets rid of extra isDAGTasksStopped function we are concerned about. It simplifies GetPipelineConditionStatus a bit along the way (build and all unit tests passes)

excellent!!!! thanks @pritidesai

bobcatfish added a commit to bobcatfish/pipeline that referenced this pull request Jul 15, 2020
@pritidesai has been a reviewer on many PRs (37 by the count at
https://github.com/tektoncd/pipeline/pulls?q=is%3Apr+is%3Aopen+reviewed-by%3Apritidesai)
and she has been the author of several PRs (27 at https://github.com/tektoncd/pipeline/pulls/pritidesai)
including finally (tektoncd#2661) which was one of the most substantial changes
to pipeline execution since we implemented DAGs (not to mention that she
patiently worked with the evolution of this feature over several months
from the runOn feature which evolved into this one).

Thanks so much @pritidesai!!!! So excited that you want to become even
more involved in pipelines :D :D :D
@bobcatfish bobcatfish mentioned this pull request Jul 15, 2020
3 tasks
tekton-robot pushed a commit that referenced this pull request Jul 16, 2020
@pritidesai has been a reviewer on many PRs (37 by the count at
https://github.com/tektoncd/pipeline/pulls?q=is%3Apr+is%3Aopen+reviewed-by%3Apritidesai)
and she has been the author of several PRs (27 at https://github.com/tektoncd/pipeline/pulls/pritidesai)
including finally (#2661) which was one of the most substantial changes
to pipeline execution since we implemented DAGs (not to mention that she
patiently worked with the evolution of this feature over several months
from the runOn feature which evolved into this one).

Thanks so much @pritidesai!!!! So excited that you want to become even
more involved in pipelines :D :D :D
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"Finally" tasks for Pipeline
6 participants