Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"when" expressions do not match user expectations #3345

Closed
skaegi opened this issue Oct 6, 2020 · 27 comments
Closed

"when" expressions do not match user expectations #3345

skaegi opened this issue Oct 6, 2020 · 27 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@skaegi
Copy link
Contributor

skaegi commented Oct 6, 2020

"when" expressions are great but have introduced a behaviour unfamiliar to existing CI/CD users. The issue is that Tasks set to "runAfter" a Task using a "when" expression do not run. We have many users moving from Travis and Jenkins to Tekton and this is not the way these systems work.

https://www.jenkins.io/doc/book/pipeline/syntax/#when
https://docs.travis-ci.com/user/conditional-builds-stages-jobs
https://argoproj.github.io/argo/examples/#conditionals
https://www.gnu.org/software/make/manual/html_node/Conditional-Example.html#Conditional-Example

In the above system the "when" or "conditional" only applies to the task holding the expression. So long as the Task does not produce a failure the next task executes. Looking at #3139 (comment) -- the behaviour that is consistent with what most other CI/CD provide is ... (2) Use Successful = True for the TaskRun status instead of creating a pseudo failure skipped state.

I recognize this might hurt but I think we need to change the behaviour right away as this will prevent adoption.

@skaegi skaegi added the kind/bug Categorizes issue or PR as related to a bug. label Oct 6, 2020
@pritidesai
Copy link
Member

thanks @skaegi, it will be great to discuss this sooner, will add it to next API WG discussion (10/12).

@jerop
Copy link
Member

jerop commented Oct 7, 2020

@skaegi to address how to handle tasks that runAfter a task guarded using WhenExpressions, we proposed using continueAfterSkip as described here in the TEP: https://github.com/tektoncd/community/blob/master/teps/0007-conditions-beta.md#skipping-1 -- the implementation is in progress, waiting on some refactoring we're doing on the execution logic

Does the current proposal solve your use cases?

It seems to me what you're suggesting is that continueAfterSkip defaults to true, right? Proposed to default to false to be consistent with current behavior where runAfter tasks are skipped as well, but users can set continueAfterSkip to true to allow for execution of runAfter tasks

@skaegi
Copy link
Contributor Author

skaegi commented Oct 7, 2020

Thanks @jerop and yes, continueAfterSkip true as the default does what I'm suggesting. I hear what you're saying about our current behavior but at the same time feel we have to recognize that a precedent has already been set by existing CI tools. Current practice in Jenkins etc. suggests treating skipped Tasks as successful (as opposed to a special skip state) is reasonable and useful behavior.

This change will hurt and maybe flipping a global continueAfterSkip default config property to true is a way forward. At the very least I think it's worth considering and talking through next API WG call.

@pritidesai
Copy link
Member

yup I agree, we have associated execution status success for sequential patterns. A sequence of tasks are only executed if all of them are successful. A task being skipped for any reason results in skipping the rest of the tasks in that sequence.

Having continueAfterSkip as true by default would help and allow rest of the tasks in a sequence to execute. But it becomes tricky again with resource dependency i.e. reference to task result of a skipped task. Based on the TEP such dependencies are skipped even with continueAfterSkip set to true.

@skaegi do you have a use case with combination of continueAfterSkip and a reference to task result? If so, we need to further discuss on how to handle it, may be introduce a default for task result?

@vdemeester
Copy link
Member

I think it would be nice discussing this in the next API WG call, but few thoughts

I hear what you're saying about our current behavior but at the same time feel we have to recognize that a precedent has already been set by existing CI tools. Current practice in Jenkins etc. suggests treating skipped Tasks as successful (as opposed to a special skip state) is reasonable and useful behavior.

I don't think we really made our mission to have similar behaviour than other CI tools. There is already a "conversion" buy-in when switch to tekton from existing CI tools (some concept are different, …), this is one of them.

This change will hurt and maybe flipping a global continueAfterSkip default config property to true is a way forward. At the very least I think it's worth considering and talking through next API WG call.

It's gonna do more than hurt, it would go against our API change policy : "Any backwards incompatible changes must be introduced in a backwards compatible manner first, with a deprecation warning in the release notes and migration instructions.". If we ever want to switch the default continueAfterSkip to true, we could do the same trick we do with the HOME environment, have a feature-flag that flips it by default for the whole instance, maybe ?

Having continueAfterSkip as true by default would help and allow rest of the tasks in a sequence to execute. But it becomes tricky again with resource dependency i.e. reference to task result of a skipped task. Based on the TEP such dependencies are skipped even with continueAfterSkip set to true.

Same question as @pritidesai, this is going to make the resolution of the dag more and more complex.

@bobcatfish
Copy link
Collaborator

@skaegi one other thought: the current behavior for when expressions is the same behavior we have for the conditions CRD, so this is less about "when expressions" in particular and more about how we've been treating conditional execution in general

If we ever want to switch the default continueAfterSkip to true, we could do the same trick we do with the HOME environment, have a feature-flag that flips it by default for the whole instance, maybe ?

maybe another way of solving this is having a config option for controlling the default value for a tekton installation? e.g. you could default the value of continueAfterSkip to true

this does have a big downside in portability tho: a pipeline built for a installation with this flag set would not work as expected on one that doesn't have it

@skaegi could you tell us more about what kind of scenarios you're envisioning as far as why the default being false would be insurmountable for your users?

@skaegi
Copy link
Contributor Author

skaegi commented Oct 12, 2020

The biggest issue is that you can't do a merge or a synchronize after a Task that contains a "when" expression if the condition fails.

For example, Logic like this is really challenging...

if (condition1) {
  do X
}
do Y 
if (condition2) {
  do A
}
do B

Instead you're forced to create a positive and negative Task chain to manage this. (Totally possible there is a better coding pattern here -- please feel free to pipe in)

if (condition1) {
  do X
  do Y
  if (condition2) {
    do A
    do B
  }
  if (!condition2) {
    do B
  }
}
if (!condition1) {
  do Y
  if (condition2) {
    do A
    do B
  }
  if (!condition2) {
    do B
  }
}

If you try to write that out in a pipeline CRD it's very complex. This particular shape is very common in our pipelines where the first condition is a check on the existence of a particular tagged image to determine if we need to build it. The second condition is a check on the branch to determine if we need to create an auditable change request entry before promoting to production. It's not "insurmountable" as our teams currently embed logic in steps but this means spinning up a container and not using this feature. I'm genuinely trying to help make this feature useful as it does not currently do what we need.

@jerop
Copy link
Member

jerop commented Oct 12, 2020

For example, Logic like this is really challenging...

if (condition1) {
  do X
}
do Y 
if (condition2) {
  do A
}
do B

Instead you're forced to create a positive and negative Task chain to manage this

@skaegi taking the example you provided above, we can use the continueAfterSkip, as described in TEP to create the logic needed without having to create positive and negative task chains, as such:

...
tasks:
     - name: X
        when: # condition1
          - input: foo
            operator: in
            values: ["bar"]
        continueAfterSkip: true
        taskSpec:
          ...
     - name: Y
        runAfter:
          - X
        taskSpec:
          ...
     - name: A
        runAfter:
          - Y
        when: # condition2
          - input: foo
            operator: in
            values: ["bar"]
        continueAfterSkip: true
        taskSpec:
          ...
     - name: B
        runAfter:
          - A
        taskSpec:
          ...

The only thing you have to do is set continueAfterSkip to true to allow for execution of ordering-dependent tasks after a parent task is skipped, and imo that's much better than trying to change what we already have happening after a task is skipped

@skaegi does this work for you? we can discuss further in the API WG today

@chhsia0
Copy link
Contributor

chhsia0 commented Oct 12, 2020

It seems to me that syntax-wise, when-expressions give users a false impression that they only apply to the associated PipelineTasks, but because of the DAG structure, the expressions apply to more than the associated PipelineTasks (but not all subsequent tasks), thus making it counter-intuitive.

@jerop
Copy link
Member

jerop commented Oct 12, 2020

As discussed in the WG today, another consideration when evaluating continueAfterSkip defaulting to true is how we handle resource dependencies, such as results. That is, what happens if we allow dependent tasks to execute by default after a parent task is skipped when that parent task was supposed to produce a result needed by the dependent tasks.

This concern is currently being discussed in an open pr for supporting results in finally tasks #3242

@jerop
Copy link
Member

jerop commented Oct 12, 2020

It seems to me that syntax-wise, when-expressions give users a false impression that they only apply to the associated PipelineTasks, but because of the DAG structure, the expressions apply to more than the associated PipelineTasks (but not all subsequent tasks), thus making it counter-intuitive.

@chhsia0 when the expressions evaluate to false, the pipelinetask is skipped and all the subsequent tasks are skipped by default

when a task has ordering dependencies only, continueAfterSkip can be set to true to allow for execution of all dependent tasks which are all subsequent tasks (which are ordering dependent only -- further discussed here

@pritidesai
Copy link
Member

Condition guarded the entire branch of tasks not just the task that it's defined in and when expression follows the same design principle. continueAfterSkip helps continuing even after a guarded task is skipped but comes with a cost of adding extra YAML schema in each subsequent pipelineTask.

Also, like @jerop mentioned, continueAfterSkip wouldn't work with resource dependencies such as task results. @skaegi's use case does not include task results so we can delay that discussion until we come across such use case. @skaegi do you have a use case with a mix of continueAfterSkip set to true and false (not ideal for any pipeline)? If not, we can add continueAfterSkip at the pipelineRun level instead of or in addition to pipelineTask level, thoughts?

this does have a big downside in portability tho: a pipeline built for a installation with this flag set would not work as expected on one that doesn't have it

We should avoid any such feature flags controlling the DAG execution flow.

@ghost
Copy link

ghost commented Oct 12, 2020

It seems to me that syntax-wise, when-expressions give users a false impression that they only apply to the associated PipelineTasks, but because of the DAG structure, the expressions apply to more than the associated PipelineTasks (but not all subsequent tasks), thus making it counter-intuitive.

I think this might be highly subjective? My intuition goes immediately the other way 😄 - I expect a branch of execution to be halted by default if a gating check hasn't passed. In other words, the existing behaviour.

It sounded to me like continueAfterSkip is going to solve the immediate pain. I would definitely be +1 on not changing existing behavior and not providing installation-specific differences of the DAG resolution.

@ghost
Copy link

ghost commented Oct 12, 2020

If not, we can add continueAfterSkip at the pipelineRun level instead of or in addition to pipelineTask level, thoughts?

Ah, I really like this idea. Keep the declaration of DAG semantics out of the installation configuration but still allow users to make it the default for their pipeline runs. I am +1 on in-addition-to pipelineTask level.

@skaegi
Copy link
Contributor Author

skaegi commented Oct 15, 2020

@pritidesai Yes that would work for me. I have no examples with a mixture of continueAfterSkip semantics. I wonder if that is a Pipeline thing instead of a PipelineRun thing -- e.g. you would write a Pipeline with particular semantics in mind.

@skaegi
Copy link
Contributor Author

skaegi commented Oct 15, 2020

@pritidesai @jerop Is there an issue for results behaviour in this case?

@afrittoli
Copy link
Member

@pritidesai Yes that would work for me. I have no examples with a mixture of continueAfterSkip semantics. I wonder if that is a Pipeline thing instead of a PipelineRun thing -- e.g. you would write a Pipeline with particular semantics in mind.

+1
Indeed, if we made continueAfterSkip a run option, the pipeline author would have to allow for both ways of execution, which seems overly complicated and unnatural.

If we allow for a pipelineTask to be "optional", we'll need a way to provide results values.
Those could be (1) defined statically in the Pipeline definition, (2) produced by a previous task or even (3) passed through the run.

  1. Default value defined in the pipeline.
  2. Example use case: T1 checks if an artifact already exists and writes the out come to the results. T2 checks the result from T1 in a when expression; it does not run if the artifact already exists. T3 depends on the artifact, it needs the result of T2 or it could use the result from T1 as a fallback
  3. Example use case: T1 is a long running step which produces an artifact. T1 only runs if an input params with the location of the artifact is empty. T2 depends on the result from T1. If T1 does not run, T2 takes the input param from T1 (allowing in a way to restart a pipeline from the middle, which is also an interesting use case).

@jerop
Copy link
Member

jerop commented Oct 15, 2020

I also like the idea of supporting continueAfterSkip on the PipelineRun level in addition to the PipelineTask level. However, there are some things that I'm wondering about:

  • Is it valid to specify continueAfterSkip on both levels in a given PipelineRun + Pipeline? Or do we allow it being specified in only one of the levels in a given PipelineRun + Pipeline?
  • If we allow for it to be specified on both levels in a given PipelineRun + Pipeline, which continueAfterSkip level would be dominant? For example, what happens when a PipelineRun has continueAfterSkip is set to true but one of its PipelineTask has continueAfterSkip set to false? I think that the pipelineTask level one to take precedence is better because it give more control but I'm concerned about the combination of the two levels being surprising to users

@skaegi please share your use cases so that we can find the best way to go forward

@jerop
Copy link
Member

jerop commented Oct 15, 2020

Is there an issue for results behaviour in this case?

@skaegi continueAfterSkip is for allowing execution of ordering-dependent (runAfter) tasks only -- so it won't affect the resource-dependent tasks (with results)

@bobcatfish
Copy link
Collaborator

Is it valid to specify continueAfterSkip on both levels in a given PipelineRun + Pipeline? Or do we allow it being specified in only one of the levels in a given PipelineRun + Pipeline?

I suggest we add this at the Pipeline level and NOT in a PipelineRun - PipelineRun configuration is for specifying behavior at runtime, and the continueAfterSkip decision seems fundamental to the expected behavior of a Pipeline, and I'm guessing not something you'd want to go back and forth on at runtime

(i wonder if it would make sense to codify something in our design principles about runtime vs authoring time 🤔 )

@pritidesai
Copy link
Member

Yes, continueAfterSkip is not something you would want to go back and forth at runtime unless the pipeline is shared across different teams and each team would like to write their own pipelineRun. Also, it's beyond one single deployment of Tekton, we will come across use cases where one deployment would prefer continueAfterSkip to true and others might want to set it to false. continueAfterSkip is just an example (disabling a task is another example), as we are marching for v1, we might discover more such use cases for runtime.

@jerop
Copy link
Member

jerop commented Oct 15, 2020

thank you everyone for your contributions to this discussion!

it seems to me that the options we've considered are (please let me know if i'm missing something):

  1. continueAfterSkip defaults to true, and when a guarded task is skipped:

    a. ordering-dependent tasks are executed while resource-dependent tasks are skipped -- but only a subset of the dependent tasks are executed by default, which can be confusing

    b. both ordering-dependent and resource-dependent tasks are executed -- but the resource-dependent tasks would fail by default and the whole pipeline would fail

    c. both ordering-dependent and resource-dependent tasks are executed (and results have default values) -- but supporting default results has disadvantages discussed here

  2. continueAfterSkip defaults to false, and when a guarded task is skipped:

    a. both ordering-dependent and resource-dependent tasks are skipped -- but users need to pass in an extra field to allow ordering-dependent tasks to execute and users migrating from other ci/cd systems may need to learn how this works

    b. both ordering-dependent and resource-dependent tasks are skipped (and continueAfterSkip is specified at PipelineRun level instead of PipelineTask level) -- but the skipping behavior should be specified in the Pipeline authoring level and not at runtime

    c. both ordering-dependent and resource-dependent tasks are skipped (and results have default values so if continueAfterSkip is set to true, all subsequent tasks are executed) -- but supporting default results has disadvantages discussed here

what we proposed in the tep is 2a that ensures that all the subsequent tasks are skipped, unless the user explicitly specifies that the ordering-dependent tasks should be executed

i'm wondering if anyone prefers the other options after this discussion and if there are more ci/cd use cases to consider

note:
resource-dependency: based on resources needed from parent task, which includes results, workspaces and resources
ordering-dependency: based on runAfter which provides sequencing of tasks when there may not be resource dependencies

@skaegi
Copy link
Contributor Author

skaegi commented Oct 16, 2020

I might prefer 1c, but think I would be fine with 2b even without 2a. I think 2b is a much better alternative than having a global flag. I don't love continueAfterSkip: true/false as the Pipeline field name but I can get over it 😜 . I do not want to treat order and resource dependencies differently and always run or not according to the continueAfterSkip field value.

@bobcatfish
Copy link
Collaborator

I don't love continueAfterSkip: true/false as the Pipeline field name but I can get over it 😜 .

If you have other ideas for the name this is the time! XD I'm not super in love with the name either but it's the clearest name we'd come up with so far. ("keepGoing"? XD)

I do not want to treat order and resource dependencies differently and always run or not according to the continueAfterSkip field value.

I agree that the inconsistency is not great and a bit confusing :( Note that 1c with defaults would only help with resource dependencies on "results" - we've been talking about expressing dependencies on workspaces (and we still have PipelineResource "from")

(Side note: does 1c below underneath 1? It seems like supporting defaults is something that could be done regardless of the default value of continueAfterSkip?)

Also! One nice feature of 2b is that we could still add support for defaults later; I don't think either choice for the default value of continueAfterSkip precludes this :D Since we've brought up this feature several times in different contexts I feel like it might deserve it's own issue (and one day TEP?)

@jerop
Copy link
Member

jerop commented Oct 19, 2020

I do not want to treat order and resource dependencies differently and always run or not according to the continueAfterSkip field value

we can discuss further in the API WG

(Side note: does 1c below underneath 1? It seems like supporting defaults is something that could be done regardless of the default value of continueAfterSkip?)

added 2c for option with default results when continueAfterSkip defaults to false --> #3345 (comment)

One nice feature of 2b is that we could still add support for defaults later

I think you might have meant 2a (because 2b is the one with continueAfterSkip at PipelineRun level)

@pritidesai
Copy link
Member

Going with 1 would break backward compatibility with the way conditions were implemented and brining it up here since we have documented when expression replaces conditions. Folks with conditions in their pipelines starting to migrate to when expression as its released with 0.16. Option 1, would break what we released in 0.16. Its great that we are not going that route.

@jerop
Copy link
Member

jerop commented Oct 21, 2020

thank you everyone for participating in this discussion!

will continue forward with option 2a, where continueAfterSkip defaults to false and users can set it to true at the PipelineTask level to allow execution of ordering-dependent tasks only

as @bobcatfish mentioned, implementing 2a still leaves 2c as an option for the future such that, if we support default results, we can extend it to allow execution of resource-dependent tasks as well when continueAfterSkip is set to true

please let me know if you strongly disagree :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
Status: Done
Development

No branches or pull requests

7 participants