Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEP-0054 Add a step reuse proposal #369

Closed
wants to merge 4 commits into from
Closed

Conversation

jstrachan
Copy link

@jstrachan jstrachan commented Mar 4, 2021

A proposal for git and OCI based step reuse in tekton pipelines.

We have been using this approach for months on Jenkins X but using a ko and mink style preprocessor up to now. This proposal is for adding this capability natively in tekton

Check out the Step Composition User Guide

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Mar 4, 2021

CLA Signed

The committers are authorized under a signed CLA.

@tekton-robot tekton-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 4, 2021
@bobcatfish
Copy link
Contributor

/assign

@vdemeester
Copy link
Member

/assign

@csantanapr
Copy link

This is awesome if it can be added to Tekton and later to OpenShift Pipelines AddOn +1

@jstrachan Would this be similar to the implementation of lighthouse that is there today?

@jstrachan
Copy link
Author

jstrachan commented Mar 4, 2021

This is awesome if it can be added to Tekton and later to OpenShift Pipelines AddOn +1

@jstrachan Would this be similar to the implementation of lighthouse that is there today?

yeah - this is basically the same model we've been using in lighthouse for a while only instead of using a special image URI for a step:

        - image: uses:tektoncd/catalog/task/git-clone/0.2/git-clone.yaml@HEAD

we now use the tekton CRD instead...

        - uses:
           path: tektoncd/catalog/task/git-clone/0.2/git-clone.yaml@HEAD

so there's no magic image tag any more; which makes things even more clear really.

Hopefully if this is agreed by the community and the PR for the implementation merges then anyone using tekton can reuse it.

Copy link
Member

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems simple enough, I have a few questions

  • If the task referenced in the uses has multiple step, what happens ? we inject all steps at the point where uses is ?
  • Looking at the opened PR, there is a usesTemplate too, we may want to detail this here too ?
  • Do we want to make it possible to override what is injection from uses (like some specific fields) ? If yes, how do we see this done with a task with multiple steps ?
  • The proposal seems to only look at tasks. Do we see value on being able to refer to pipeline too (from a Pipeline spec) in a similar fashion ? Should we discuss this here or in another TEP ?
  • This proposal seems to limit the uses keyword to only OCI bundle and git reference (using path), any reason for not allow to refer to in-cluster tasks too ?

Overall, I tend to like that approach, it seems a clean and simple enough solution to composing tasks at authoring time, as well as grouping "parts" of task in a single pod. This has to be reviewed in parallel of #318 and #316 (especially TEP-0044). It does solve Matt's use case and it also allows composing tasks in a simple fashion.

@vdemeester vdemeester added the kind/tep Categorizes issue or PR as related to a TEP (or needs a TEP). label Mar 5, 2021
@jstrachan
Copy link
Author

Seems simple enough, I have a few questions

  • If the task referenced in the uses has multiple step, what happens ? we inject all steps at the point where uses is ?

Yes. The 2 main use cases we've found with the uses approach is reusing all the steps from a task (e.g. git clone) as a single unit (injecting all the steps as-is in place) or listing each step we want to reuse so that we can override any of them or add custom steps in between any of them.

  • Do we want to make it possible to override what is injection from uses (like some specific fields) ? If yes, how do we see this done with a task with multiple steps ?

I've added a Step Composition User Guide that describes how to reuse all steps in a task or a specific step and how the overriding works with examples (linking to the unit tests).

  • The proposal seems to only look at tasks. Do we see value on being able to refer to pipeline too (from a Pipeline spec) in a similar fashion ? Should we discuss this here or in another TEP ?

This proposal only reuses steps from a Task but the Task can come from a Pipeline/PipelineRun resource too. e.g. the unit tests reuse steps from a Task, Pipeline and PipelineRun.

We could look at reusing the same git reference in the TaskRef for the case of a Pipeline referencing a Task too; maybe we can do that in a separate TEP once this work is complete?

  • This proposal seems to limit the uses keyword to only OCI bundle and git reference (using path), any reason for not allow to refer to in-cluster tasks too ?

I just didn't think about that ;) - we generally have been using git to reuse tasks so I did that first. I've just pushed a fix to support reusing the TaskRef struct to reuse resources too.

I've just refactored the implementation to reuse TaskRef for referring to named resources or resources in OCI bundles and added unit tests for in cluster resources.

  • Looking at the opened PR, there is a usesTemplate too, we may want to detail this here too ?

On reflection and after adding support for named resources and TaskRef I no longer think the usesTemplate property adds enough value and just adds confusion so I've removed it.

I originally added it to try keep things DRY when including many steps from a Task in git; but on reflection its simpler to just be explicit in each step; you can then copy/paste any step to any other Task and its self contained and still works.

Overall, I tend to like that approach, it seems a clean and simple enough solution to composing tasks at authoring time, as well as grouping "parts" of task in a single pod. This has to be reviewed in parallel of #318 and #316 (especially TEP-0044). It does solve Matt's use case and it also allows composing tasks in a simple fashion.

Thanks. Yeah, we've been loving this composition model on Jenkins X so far; its pretty simple to understand, there's no real magic involved so its easy to read and understand.

@tekton-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign bobcatfish
You can assign the PR to them by writing /assign @bobcatfish in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@michaelsauter
Copy link

Apart from the comments I added in tektoncd/pipeline#3807, I have a more general question around task parameter handling. The referenced steps might make use of parameters defined in the referenced tasks. But those parameters might not be defined in the tasks referencing to those steps. How is this supposed to be dealt with? Would that just cause an error?

@jstrachan
Copy link
Author

@michaelsauter any parameters & results from the inherited steps are inherited too so that things just work OOTB. You can obviously specify any parameter/results locally too.

e.g. see the unit tests.

For example the override steps task generates these parameters

Dealing with parameters across Tasks and TaskRun / Pipeline / PipelineRun is a pain which this reuse model really helps hide all of that noise

@michaelsauter
Copy link

So I have had some more thoughts on this over the weekend ... and I think I changed my mind from "it would be great to have something like this" to "actually I think it would be better not to have this".

Two main reasons:

  • From a user perspective, it makes it less clear what the right level of re-use it. Should I re-use tasks, or should I re-use steps? From an author perspective - should I now put all steps into one big task, because the steps can be reused? But on the other hand, tasks have the possibility to run in parallel, so that should be thought of as well. Right now, when you think of reuse, it's clear - make the task reusable. Going forward, this would be less clear - should the step be reusable? And if there are a few steps which really belong together (as in, they form a logical unit), how would I communicate this?
  • Tasks have a clear interface (parameters and results). Steps do not have a clear interface. Reusing objects that do not have a clear interface is not a good idea IMHO. Your proposal just "makes it work", but it's really not that clear what exactly happens to the user, and I'm sure there will be nitty-gritty details and edge cases because of a missing interface. Tasks on the other hand have a clear interface because they were designed for that. I think my initial concerns are rooted here - reusing sth. that was not designed to be reused feels like a hack.

I would love to see alternative solutions to the problems you're trying to solve. I see two main problems this proposal is addressing:

  • referring to resources in Git: I see this as a separate problem, because this is also missing for tasks. If it is to be added, it should be available for both resources. That said, I'm not necessarily in favour of having this in Tekton core.
  • avoiding pod (PVC) overhead for tasks that run after each other, and that could run in the same pod. I see this as a runtime problem more than an authoring problem. I also suspect that this issue leads to bloated tasks which have more steps than they logically should have, because authors are trying to minimise the overhead. Solve this issue and you'll have tasks with fewer steps that then are a better fit for reuse. One idea to fix this would be to allow tasks to be assigned into a "group" that then would be run in one pod together. I have no clue how easy / feasible this would be technically, but from an authoring perspective I think it makes sense to allow pipeline authors to change the execution strategy. if it is possible, Tekton could even do this automatically (tasks that run sequentially could be put into the same pod).

@jstrachan
Copy link
Author

@michaelsauter inheriting steps from a task (wherever it is defined) and being able to customise/override the steps when its used for a specific repository/branch is super useful.

We use this feature most days on Jenkins X. e.g. if a pipeline fails; override the command to a sleep so you can kubectl exec inside a step to diagnose it or to turn on debug logging, modify CLI arguments or add extra commands to the script or or whatever.

If you use Tekton for your CI/CD for many repositories you want to be able to reuse tasks for all of them - but at any point in any repository/branch override/replace/customise a step.

BTW folks do this all the time with GitHub Actions which supports being able to easily consume steps from git repositories into a 'task'. I'm not sure why Tekton can't support a similar mechanism?

@michaelsauter
Copy link

michaelsauter commented Mar 8, 2021

I think GH actions actually illustrate my point: it was designed like this - actions have a clear interface. However (as far as I know) you do not extract a job step out of another job. You can only refer to actions which are independent of any job, they are objects in its own right - in contrast to Tekton steps, which are not. As I see it, GH has exactly one reusable object, and that is the action. Tekton however already has one reusable object, and that is the task. Making steps reusable should be done with caution if at all IMHO ...

@vdemeester
Copy link
Member

We use this feature most days on Jenkins X. e.g. if a pipeline fails; override the command to a sleep so you can kubectl exec inside a step to diagnose it or to turn on debug logging, modify CLI arguments or add extra commands to the script or or whatever.

Note that this is what #308 tries to tackle without the need of overriding anything in the pipeline/task/step definition.

Copy link
Contributor

@bobcatfish bobcatfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening this @jstrachan ! This is a very timely proposal with some of the discussions we've been having recently about task reuse. Thanks for your patience with this wall of text below!

This specifically seems like it's one way to address the problem i have been trying to flesh out in tep-0044 (@michaelsauter pointed out some alternatives that are definitely in TEP-0044 territory) - and it has a lot of overlap with some of the conversations we've had around task specialization:
* Specializing Tasks - Vision & Goals
* Task specialization - most appealing options?

My feedback below on this proposal falls into 3 areas:

  1. steps vs task (params, results, workspaces)
  2. git
  3. pipeline task vs embedded taskspec

steps vs task

what happens to the other parts of Tasks, workspaces, params and results? it seems like this proposal wants to be able to refer to just the steps of a Task and ignore "the rest" but im not sure that's actually possible - for example:

        - uses:
           path: tektoncd/catalog/task/git-clone/0.2/git-clone.yaml@HEAD

How does the pipeline provide those values and how do you use the results from that task? I think we could flesh this out pretty easily, maybe something like:

    tasks:
    - name: release-task
       steps:
        - uses:
           name: get-source
           path: tektoncd/catalog/task/git-clone/0.2/git-clone.yaml@HEAD
           params:
             url: "github.com/someorg/somerepo"
           workspaces:
             ...
        - name: after-clone-before-release
          image: something:1.2.3
        - uses:
           path: jenkins-x/jx3-pipeline-catalog/tasks/go/release.yaml@HEAD
        - name: after-release
          image: something:1.2.3
    tasks:
    - taskRef:
       name: some-other-task
       params:
       - some-param: $(tasks.release-task.get-source.results. commit

(We've had some issues opened around making steps a reusable unit, e.g. tektoncd/pipeline#1260, but in my mind the next question is like: what if you need to provide a value to a step, oh yeah you need a param, and before you know it you've got an entire task - i like that this proposal is trying to use the task as the unit of sharing, tho it looks like in the composition guide you can name individual steps - that would only work if those steps have no params/results/workspace needs?)

any parameters & results from the inherited steps are inherited too so that things just work OOTB

Could you show an example of what this looks like in a Pipeline? (you have an example of a generated task and this tep has an example pipeline but id like to see what they look like together

@michaelsauter mentioned:

Tasks have a clear interface (parameters and results). Steps do not have a clear interface. Reusing objects that do not have a clear interface is not a good idea IMHO. Your proposal just "makes it work", but it's really not that clear what exactly happens to the user, and I'm sure there will be nitty-gritty details and edge cases because of a missing interface. Tasks on the other hand have a clear interface because they were designed for that. I think my initial concerns are rooted here - reusing sth. that was not designed to be reused feels like a hack.

I think one possible way to improve this to use "pipeline tasks" instead of Task specs, see my last section here

git

👍 to what @michaelsauter pointed out - we should discuss git support separately (we've started discussing in tektoncd/pipeline#2298) (I'm all for it but there are definitely some things we need to iron out around lines of responsibility, and there's the question of how to provide the git credentials)

Task spec vs. pipelinetask

in the example, you're using an embedded taskspec - at the moment taskspecs map directly to the spec of a Task CRD, does this proposal change that? (or would you see this being something you could do in a standalone task as well?)

An alternative would be to do this in the "pipeline task" and not touch the task at all (which avoids some interesting nesting issues - tasks that nest tasks that nest tasks):

apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
  generatestep: my-release-
spec:
  serviceAccountName: 'default'
  pipelineSpec:
    tasks:
    - taskSpec: # as an alternative to taskRef or taskSpec?
        steps:
        - uses:
           path: tektoncd/catalog/task/git-clone/0.2/git-clone.yaml@HEAD
        - name: after-clone-before-release
          image: something:1.2.3
        - uses:
           path: jenkins-x/jx3-pipeline-catalog/tasks/go/release.yaml@HEAD
        - name: after-release
          image: something:1.2.3

OR we could explore some of the possibilities here, e.g.:

 tasks:
 - name: run-unit-tests
   taskRef:
     name: just-unit-tests
   workspaces:
      - name: source-code
      - name: test-results
   init/before:
      - taskRef: git-clone
        params:
       - name: url
          value: $(params.url)
        - name: revision
          value: $(params.revision)
        workspaces:
        - name: source-code
    finally/after:
      - taskRef: gcs-upload
        params:
        - name: location
          value: gs://my-test-results-bucket/testrun-$(taskRun.name)
        workspaces:
        - name: test-results

tho you'd lose the ability to completely control the sequence of steps, you can only inject before and after (i wonder if we really need more than before and after?)

@ghost
Copy link

ghost commented Mar 9, 2021

If I'm understanding right from the proof-of-concept PR we're adding workspaces from the fetched Task to the consuming Task.

Does this mean that, if workspaces are named differently in two separate Tasks, the user needs to write glue steps to move content between them? e.g.

kind: TaskRun
spec:
  workspaces:
  - name: output # the workspace expected by git-clone
    emptyDir: {}
  - name: source # the workspace expected by golang-test
    emptyDir: {}
  taskSpec:
    steps:
    - uses:
        path: tektoncd/catalog/task/git-clone/0.2/git-clone.yaml@HEAD
    - name: glue-step
      image: alpine
      script: |
        cp -R "$(workspaces.output.path)"/* "$(workspaces.source.path)"/
    - uses:
        path: tektoncd/catalog/task/golang-test/0.1/golang-test.yaml@HEAD 

@jstrachan
Copy link
Author

steps vs task

what happens to the other parts of Tasks, workspaces, params and results? it seems like this proposal wants to be able to refer to just the steps of a Task and ignore "the rest" but im not sure that's actually possible

see my answer here: #369 (comment)

right now any parameters/results are reused

@bobcatfish
Copy link
Contributor

hey @jstrachan ! thanks for that link, i think there are still a number of outstanding questions above.

I'm thinking there are a couple of ways that you could move this forward, depending on what you're looking for

  1. One of the easiest would be to implement this as a custom task instead - custom tasks let you define whatever functionality you want without having to add it to the pipelines controller. Some examples of this at the moment are pipelines in pipelines and task loops
  2. Try to scope down this proposal a bit - for example, it's unlikely that we'd want to add the ability to reference Tasks/steps in git as part of this proposal (we've been having some separate discussions about how to do this - and actually @sbwsg just opened a proposal with an interesting approach TEP-0060: Remote Resource Resolution #389)

I think the last option would be to wait and see how some of the other TEPs progress, e.g. I think that a combo of decoupling task composition from scheduling and remote resource resolution could give you a lot of what you're looking for (but the unit would still be "tasks" and not "steps").

Also please feel free to join one of our working groups if you want to discuss this more in real time! e.g. the general working group on wednesdays or the api working group on mondays (note this coming monday we're trying out the API meeting at 4pm EST instead of the usual time)

@tekton-robot tekton-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 30, 2021
@tekton-robot
Copy link
Contributor

@jstrachan: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@afrittoli
Copy link
Member

Ping @jstrachan

@vdemeester
Copy link
Member

@bobcatfish @afrittoli @jstrachan in the light of TEP-0044, should we close this one, if we are sure this is in the proposed alternative in there ?

@bobcatfish
Copy link
Contributor

@vdemeester sounds good! @jstrachan feel free to re-open,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/tep Categorizes issue or PR as related to a TEP (or needs a TEP). needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

7 participants