Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HIP for running hooks in parallel #349

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
209 changes: 209 additions & 0 deletions hips/hip-0019.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,209 @@
---
hip: 0019
title: "Enable Running Hooks in Parallel"
authors: [ "Michael Morris <michael.morris@est.tech>" ]
created: "2024-06-06"
type: "feature"
status: "draft"
---

## Abstract

Helm hooks are currently executed serially. In many cases there will be no reason why they cannot be run in parallel. Running hooks in parallel can significantly improve install/upgrade times

## Motivation

Decrease install/upgrade times by executing hooks in parallel

## Specification

Only hooks of the same weight may be executed in parallel. The chart designer can specify if hooks will be run in parallel. I am proposing two options, one more fine grained that the other.

Option 1.
Add a new parameter to the chart metadata of type boolean with name 'runHooksInParallel'.
When set to 'false' (default), the behavior will be as today (hooks run in sequence).
When set to 'true', hooks of the same weight for the chart shall be run in parallel.
In the case of subcharts, hooks of the same weight from different sub charts with 'runHooksInParallel' set to 'true' will be run in parallel. Hooks from subcharts with 'runHooksInParallel' to 'false' shall be run sequentially after hooks of the same weight that are permitted to run in parallel.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I (mostly) agree with this implementation. But for subcharts, there is a hidden complexity that isn't entirely spelt out in the HIP I think.

There is a subtle behavior change here for subcharts which have not specified runHooksInParallel: true. That is, their hook order will be changed ro run after hooks of the same weight that they may have run prior to due to the kind/name alphabetical sorting. If a parent/sibling chart specifies runHooksInParallel: true. And by allowing hooks to run in parallel, we are now suddenly forced to content with e.g. parent/sibling charts which expect hooks of the same weight to run in a specific order.

Strictly, to retain compatibility. We would need keep subchart hooks of the same weight to which have runHooksInParallel: false to run in the same order as they would have previously (wrt. other parent/sibling chart looks)(But, hooks of the same weight from chart(s) which have runHooksInParallel: true could run in parallel of course)

But, IMHO, this implementation introduces a lot of complexity (including the cognitive overhead of chart authors understanding it). And it may not be worth doing so.

The rationale is, that or subcharts, I think there are two scenarios:

  1. The subchart is completely independent of the parent chart and its sibling subcharts (think e.g. nginx, which might be a subchart in many places). And by setting runHooksInParallel: true, it just wants its hooks to run in parallel. And this is fine, since it is independent, and doesn't know/care about any other hooks in that its parent/sibling charts may be executing.
  2. The subchart is somehow coupled with its parent/siblings, and "knows" that its hooks running in parallel with other chart's hooks may cause problems.

For 1. IMHO, there is no problems with the chart simply setting runHooksInParallel: true of course. And hooks running in parallel.

For 2. IMHO, the chart author(s) should be required to coordinate hook weights. So that hooks which must run in order do so. And this should be a prerequisite for being able to set runHooksInParallel: true for a (sub)chart which has coupling with parent/sibling chart hooks.

The reason I say this, is to try and limit the complexity to Helm and chart users. As, iirc, Helm just runs all hooks for all (sub)charts in a single pass today. It doesn't group/order them by subcharts, and to strictly retain order, Helm would need to introduce that concept for this HIP. IMHO the complexity for wanting to run hooks in parallel should be pushed as much as possible to the chart authors, and the implementation/concepts keep as simple as possible e.g. "hooks from a (sub)chart with the same weight run in parallel".

In summary, I think the HIP might want to call out this arkwardness with parent/subcharts hook ordering, if a given (sub)chart introduces runHooksInParallel: true. And that it will be the chart authors responsibility to organise hook weights if there is a hook ordering problem between (sub)charts in their setup? I also think this scenario is likely the uncommon one that only power users will encounter.

Another option might be to propagate ``runHooksInParallel: false|true` upwards from parent charts to subcharts? That way, a parent chart can "never be surprised" by subchart hooks suddenly running in parallel?

Thoughts?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the change in ordering for sub-charts where runHooksInParallel is not set to true w.r.t parent/siblings:
Anybody that wants full compatibility with the existing execution order of hooks can achieve that by not setting runHooksInParallel to true. Where it is set to true (in the parent or any sub charts), there is necessarily a different order (the hooks that would be run at the start of a sequential execution would be finished before hooks towards the end of the sequential run, but now that wont be the case when executing in parallel) so the HIP does not aim to maintain the order that would have applied previously. There is I guess an alternative that would closer match the existing order - you could order the hooks as today and only parallelise hooks that are adjacent in the ordering that have runHooksInParallel set to true. I think this would limit the performance gain though as hooks that could be run in parallel will not be if a non-parrallel hook finds itself in the middle of them when ordered by kind/name.

Regarding the independence of the subcharts:
There is a third scenario - the subchart is completely independent of the parent or siblings, but the hooks for the sub-chart itself must be run sequentially. It is this scenario that I proposed the 'subChartsOnly' value. Your point below that this could be achieved instead by setting the weights appropriately is a valid alternative. The downside is it will potentially require more effort on the part of the sub chart designers to ensure their hook weights are set correctly to maintain the ordering they need before switching to runHooksInParallel to 'true'. It would potentially be much easier for them to just switch runHooksInParallel to 'subChartsOnly' as they would get the same ordering of their sub chart hooks w.r.t. each other without having to worry about the weights. However, if you were approaching this from a clean start I think the behaviour you suggest (requiring the chart designer to set the weights appropriately) would be the way to go so perhaps we should go in that direction. I also agree it reduces complexity

So in summary, I think we are in agreement - option 1 and document clearly the chart designer should set the weights appropriately where there is dependencies between hooks from parent/sibling charts and also highlight the consequences of setting runHooksInParallel for the ordering of sub chart hooks (compared with the existing kind/name based algorithm)


Option 2.
Add a new parameter to the chart metadata with name 'runHooksInParallel' that accepts three values - 'false', 'otherChartsOnly' and 'true'.
Behavior for 'true' and 'false' is as per option 1. above.
The value 'otherChartsOnly' is only relevant in the case of subcharts and shall indicate that the hooks belonging to the particular subchart have to be run sequentially but, it is ok for this to happen in parallel with the execution of hooks from other subcharts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really seeing a use-case for "otherChartsOnly".

Why wouldn't a subchart just adjust its weights to be sequential, it it wanted its hooks to run sequentially?. Wouldn't that achieve the same as "otherChartsOnly"? Perhaps there is a scenario I'm not thinking of?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really just to make migration easier (as outlined in above reply), but yes, I agree, should use hook weights for this scenario instead


For example, given a chart with subcharts as follows:

|Sub chart|Hook|Weight|
|---------|----|------|
|chart A |H1 |0 |
| |H2 |1 |
|chart B |H3 |0 |
| |H4 |0 |
| |H5 |1 |
|chart C |H6 |0 |
| |H7 |1 |


* With 'runHooksInParallel' set as follows:

|chart |runHooksInParallel|
|-------|------------------|
|chart A|true |
|chart B|true |
|chart C|true |

the execution order of the hooks shall be (hooks in same column are in parallel):

<table>
<thead>
<tr>
<th colspan="2" style="border-style: solid;">Order of Execution</th>
</tr>
</thead>
<tbody>
<tr><td colspan="2" style="border-style: solid;">-------------->-----------</td>
</tr>
<tr>
<td style="border-style: solid;">H1</td>
<td style="border-style: solid;">H2</td>
</tr>
<tr>
<td style="border-style: solid;">H3</td>
<td style="border-style: solid;">H5</td>
</tr>
<tr>
<td style="border-style: solid;">H4</td>
<td rowspan="2" style="vertical-align: middle;border-style: solid;">H7</td>
</tr>
<tr>
<td style="border-style: solid;">H6</td>
</tr>
</tbody>
</table>

i.e. Hooks H1, H3, H4 and H6 all having weight 0 are run in parallel, followed by hooks H2, H5 and H7 (all of which have weight 1)

* With 'runHooksInParallel' set as follows:

|chart |runHooksInParallel|
|-------|------------------|
|chart A|true |
|chart B|** false ** |
|chart C|true |

the execution order of the hooks shall be:

<table>
<thead>
<tr>
<th colspan="5" style="border-style: solid;">Order of Execution</th>
</tr>
</thead>
<tbody>
<tr><td colspan="5" style="border-style: solid;">---->----->----->---->---</td>
</tr>
<tr>
<td style="text-align:center;border-style: solid;">H1</td>
<td rowspan="2" style="vertical-align: middle;border-style: solid;">H3</td>
<td rowspan="2" style="vertical-align: middle;border-style: solid;">H4</td>
<td style="vertical-align: middle;border-style: solid;">H2</td>
<td rowspan="2" style="vertical-align: middle;border-style: solid;">H5</td>
</tr>
<tr>
<td style="text-align:center;border-style: solid;">H6</td>
<td style="vertical-align: middle;border-style: solid;">H7</td>
</tr>
</tbody>
</table>

i.e. H3 and H4 are now run sequentially after H1 and H6 because the subchart they belong to has runHooksInParallel set to 'false'. Similarly H5 is now run after H2 and H7

* With 'runHooksInParallel' set as follows:

|chart |runHooksInParallel |
|-------|----------------------|
|chart A|true |
|chart B|** otherChartsOnly ** |
|chart C|true |

the execution order of the hooks shall be:

<table>
<thead>
<tr>
<th colspan="4" style="border-style: solid;">Order of Execution</th>
</tr>
</thead>
<tbody>
<tr><td colspan="4" style="border-style: solid;">---->------>----->----</td>
</tr>
<tr>
<td colspan="2" style="text-align:center;border-style: solid;">H1</td>
<td rowspan="1" style="border-style: solid;">H2</td>
<td rowspan="3" style="vertical-align: middle;border-style: solid;">H5</td>
</tr>
<tr>
<td style="border-style: solid;">H3</td>
<td style="border-style: solid;">H4</td>
<td rowspan="2" style="vertical-align: middle;border-style: solid;">H7</td>
</tr>
<tr>
<td colspan="2" style="text-align:center;border-style: solid;">H6</td>
</tr>
</tbody>
</table>

i.e. as runHooksInParallel is now set to 'otherChartsOnly' for chart B, H3 and H4 are still run sequentially but in parallel with H1 and H6 which are from different subcharts


### Example Chart

apiVersion: v1
kind: Pod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this meant to be an example Chart.yaml ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, yes, will replace with chart.yaml example

metadata:
name: example-chart
runHooksInParallel: true
spec:
...

## Rationale

While it may be desirable to run hooks in parallel, particularly those of the same weight, doing so by default may cause issues for users depending on the existing serial behavior. Therefore it is proposed to keep the existing serial behavior by default and enable parallel behavior through the chart definition.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Therefore it is proposed to keep the existing serial behavior by default

I agree with this. Helm already has a specified order for running hooks (breaking on equal weight with alphabetical kind/name sorting, iirc). But IMHO, we should add to this HIP that the linter should warn if a chart has hooks with the same weight, but runHooksInParallel unset/false.

With the eye to the future, that we make parallel behaviour for the same weight the default behavior.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree


Allowing hooks of different weights to be executed in parallel is probably not consistent with the documented behavior of lower weight hooks being executed before higher weighted hooks and, therefore only hooks of the same weight shall be executed in parallel.

Option 2 makes the logic more complex, but allows the chart designer to specify the hooks for a subchart must be run sequentially without blocking the hooks for other subcharts where they are of no significance.

## Backwards compatibility

Proposed solution is fully backwards compatible with no changes in default behavior.

## Security implications

None

## How to teach this

Document in command line help and https://helm.sh/docs/topics/charts_hooks/.

## Reference implementation

PR submitted some time ago: https://github.com/helm/helm/pull/11804
May need to be updated depending on decisions taken on what is proposed here.

## Rejected ideas

Add a flag to control parallel behavior.
It is the chart developer that will have the knowledge of the application to determine if hooks can be run in parallel rather than the chart installer. Therefore, control should lie in the chart itself rather than the install (etc.) commands

## Open issues

|Issue |Proposal|Decision|
|-----------------------------------------------------------------------------------------|--------|--------|
|Should we allow hooks to be run in parallel |Yes | |
|Should the default behavior continue to be to execute in serial |Yes | |
|Should only hooks of the same weight be executed in parallel |Yes | |
|Is boolean value ok or should we allow more granularity, i.e. option1 or option 2 | |

## References

There exists an older closed HIP on hook parallelism, but its focus was on test hooks: https://github.com/helm/community/pull/165