Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drop step.group in favour of depends_on to create a DAG #1860

Closed
6543 opened this issue Jun 13, 2023 · 14 comments · Fixed by #2771
Closed

drop step.group in favour of depends_on to create a DAG #1860

6543 opened this issue Jun 13, 2023 · 14 comments · Fixed by #2771
Labels
breaking will break existing installations if no manual action happens pipeline-config related to pipeline config files (.yaml)
Milestone

Comments

@6543
Copy link
Member

6543 commented Jun 13, 2023

Support parallelism inside a pipeline via a DAG. This requires a way to define step dependencies which can then allow the runner to build said DAG. Edit: I see group provides something similar for static fan-out, but certainly not as flexible as defining dependencies directly.

Originally posted by @silverwind in #567 (comment)

@6543 6543 added pipeline-config related to pipeline config files (.yaml) breaking will break existing installations if no manual action happens labels Jun 13, 2023
@6543 6543 mentioned this issue Jun 13, 2023
7 tasks
@6543
Copy link
Member Author

6543 commented Jun 13, 2023

what about depends_on we already use it for workflow dep.-DAG ?

@lafriks
Copy link
Contributor

lafriks commented Jun 13, 2023

Yes, same naming as for pipelines would be the best way to go imho

@6543
Copy link
Member Author

6543 commented Jun 13, 2023

one big downside! and what esp. makes it breaking!!!:

now we assume that steps need to be run in sequence if not told otherwise (because it is what you normally want)
if we change this ... it would be the exact other way around

so what I think is really important: by default it should run sequential (that's just good UX-Design)

so what about a workflow config: parallel: false that if enabled would create a step dag ?!?

@6543
Copy link
Member Author

6543 commented Jun 13, 2023

so I propose:

  • pipeline/backend types (docker/k8s/...) -> always use a DAG
  • pipeline/frontend types (yaml) -> use a dag if specified and if not it creates a simpjle dag
    -> (step1->step2->(group:step3 & step 4)-> step5...) witch could still encounter the groups feature...

that way it also would be non breaking

@6543
Copy link
Member Author

6543 commented Jun 13, 2023

we should explicite write that down as one of our core values:

  • by default the config should work intuetive for beginners and not for advanced users - KISS &
  • the default should be the so that the normal pipelines dont need to configure almost anything

@silverwind
Copy link
Contributor

silverwind commented Jun 14, 2023

In drone, it is also sequential until you add the first depends_on to a step, at which point it becomes a DAG. So in a sense, it's compatible.

@6543
Copy link
Member Author

6543 commented Jun 14, 2023

That sounds nice as long as you dont have a workflow with steps that all can run paralel

@silverwind
Copy link
Contributor

silverwind commented Jun 14, 2023

Right, how about depends_on: [] on all steps to indicate they have no dependencies and can run immediately? Could also name the property dependencies, which I favor slightly because they are called the same in the React world.

@6543
Copy link
Member Author

6543 commented Jun 14, 2023

I would call that not easy to lint, parse and sanitize ...

Also it would increase yaml config more ...

The more i think and talk about this with others the more i think we should have a workflow flag "parallel: true" and depends_on ... for the DAG

@6543 6543 changed the title drop step.group in favour of need drop step.group in favour of depends_on to create a DAG Jun 14, 2023
@lumpi101
Copy link

One more idea: to execute steps in parallel or not can be decided on whether the steps in the yaml are defined as a list or a dictionary.

  • A list is sequential by nature and thus means that all listed steps are executed sequentially in that order.
  • A dictionary has no ordering (also not in the yaml-specs afaik) and thus means that all steps are run in parallel, constrained by the "depends_on" markers.

This is of course not backwards compatible und would require some versioning of the pipeline syntax.
Also, having a dictionary, step names cannot be omitted which makes some kind of sense because autonaming steps with some enumeration could pretend some ordering which is not the case then.

A modified example from the docs...

parallel execution with DAG:

 version: 2
 steps:
   backend:  # <-- step names are mandatory
     image: golang
     commands:
       - go build
       - go test
   frontend:
     image: node
     commands:
       - npm install
       - npm run test
       - npm run build
   publish:
     image: plugins/docker
     depends_on: [backend, frontend]  # <-- if we would omit this, all 3 steps would be executed in parallel
     repo: octocat/hello-world

sequential execution:

 steps:
   - name: backend  # <-- step names are optional (afaik)
     image: golang
     commands:
       - go build
       - go test
   - name: frontend
     image: node
     commands:
       - npm install
       - npm run test
       - npm run build
   - name: publish
     image: plugins/docker
     repo: octocat/hello-world

@6543 6543 added this to the 1.1.0 milestone Jun 19, 2023
@anbraten
Copy link
Member

In drone, it is also sequential until you add the first depends_on to a step, at which point it becomes a DAG. So in a sense, it's compatible.

This sounds like the best option to me atm.

@ghost91-
Copy link

ghost91- commented Jun 20, 2023

In drone, it is also sequential until you add the first depends_on to a step, at which point it becomes a DAG. So in a sense, it's compatible.

This sounds like the best option to me atm.

That’s also more or less, how it works in GitLab CI: By default, everything is sequential, and jobs in the same stage are executed in parallel (that basically matches woodpecker's group). But as soon as you define the needs property for a job, the DAG handling kicks in, and the job is executed outside of the default order, as soon as all jobs listed in needs have completed. In particular, if you specify needs: [], the job starts immediately, regardless of its position.

See https://docs.gitlab.com/ee/ci/yaml/index.html#needs.

So I think this would be a pretty good pattern, since both users of GitLab CI and Drone are already familiar with it.

@6543
Copy link
Member Author

6543 commented Jun 20, 2023

I personally think needs: [] is bad to de-serialize ... from yaml perspective

see: https://go.dev/play/p/JziUknAwKvY

and yes I know there is a golang trick to achive it ... -> https://go.dev/play/p/M1XlwhSUZn_X

but yaml do exist outside of golang to and each parser do behave bit different if it belongs to this edgecases

@ghost91-
Copy link

ghost91- commented Jun 20, 2023

I personally think needs: [] is bad to de-serialize ... from yaml perspective

see: https://go.dev/play/p/JziUknAwKvY

and yes I know there is a golang trick to achive it ... -> https://go.dev/play/p/M1XlwhSUZn_X

but yaml do exist outside of golang to and each parser do behave bit different if it belongs to this edgecases

To be honest, I think it’s weird that parsers would not implement this in a way so that combining deserialisation and serialisation results in the identity. There is obviously a difference between a property being empty, and it not existing at all in yaml, so parsers should respect that, imo.

Doing it the other way round, as you suggest, and making the "empty array" case and the "not set" case behave equally, is extremely surprising from a user perspective: Users could still write depends_on: [], but I suspect that hardly anybody would interpret this as "has the default behavior, i.e., actually depends on all previously defined steps", in particular because it has a different meaning in other (but similar) contexts (Drone, GitLab CI).

By the way, GitHub actions circumvents this problem by executing jobs in parallel, by default, instead of sequentially:

A workflow run is made up of one or more jobs, which run in parallel by default. To run jobs sequentially, you can define dependencies on other jobs using the jobs.<job_id>.needs keyword.

(https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobs)

So the empty array case and and the property missing case do have the same meaning semantically. But not sure if it would be worth a breaking change to change that default in woodpecker…

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking will break existing installations if no manual action happens pipeline-config related to pipeline config files (.yaml)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants