Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HIP for running hooks in parallel #349

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

MichaelMorrisEst
Copy link

This PR opens a HIP to discuss running hooks in parallel.

A PR (helm/helm#11804) was opened some time ago to address this issue but has stalled. A comment in the PR (helm/helm#11804 (comment)) suggests a HIP would be useful to discuss the design before progressing the issue so I am creating this PR to that end.

Signed-off-by: MichaelMorris <michael.morris@est.tech>
Copy link
Contributor

@mattfarina mattfarina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I shared this in the weekly developer call. Sharing here so that it's documented.

hips/hip-0019.md Outdated

## Specification

Add a new flag to the install, upgrade, rollback and uninstall commands to enable parallel behavior for hook execution. The flag will be of integer type and will allow the user to specify the maximum number of hooks that can be executed in parallel.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider the different user roles for a Helm user. The person installing a chart may have little knowledge of how the package works. Installers of a chart aren't expected to know the business logic of the application. Instead, that knowledge is held by someone distributing the application.

Consider a classic package management situation like installing postgresql on a linux server. The one installing the package may not know where all the files need to live on the operating system, what configuration files need to go where, or the way this app should be added to startup as a daemon. Without having all that knowledge they are able to use the application.

Someone using the app may likely not know if hooks can be run in parallel. The one who would know that is the developer of the chart.

Can we bake this into the chart metadata so the responsibility is on the side of the chart developer instead of the chart user?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mattfarina
Proposed solution updated to address comment. Please take a look when you get time and let me know if there are any further comments

Signed-off-by: MichaelMorris <michael.morris@est.tech>
@pull-request-size pull-request-size bot added size/L and removed size/M labels Jun 21, 2024
Signed-off-by: MichaelMorris <michael.morris@est.tech>
Copy link
Contributor

@gjenkins8 gjenkins8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, agree, Helm should allow hooks to run in parallel (and it would have been super nice, if it had simply said that hooks of the same weight run in unspecified order to start with :) )

I think there just needs to be some thought around the edge case of subcharts. Hopefully my big comment makes sense.


## Rationale

While it may be desirable to run hooks in parallel, particularly those of the same weight, doing so by default may cause issues for users depending on the existing serial behavior. Therefore it is proposed to keep the existing serial behavior by default and enable parallel behavior through the chart definition.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Therefore it is proposed to keep the existing serial behavior by default

I agree with this. Helm already has a specified order for running hooks (breaking on equal weight with alphabetical kind/name sorting, iirc). But IMHO, we should add to this HIP that the linter should warn if a chart has hooks with the same weight, but runHooksInParallel unset/false.

With the eye to the future, that we make parallel behaviour for the same weight the default behavior.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree

### Example Chart

apiVersion: v1
kind: Pod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this meant to be an example Chart.yaml ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, yes, will replace with chart.yaml example

Option 2.
Add a new parameter to the chart metadata with name 'runHooksInParallel' that accepts three values - 'false', 'otherChartsOnly' and 'true'.
Behavior for 'true' and 'false' is as per option 1. above.
The value 'otherChartsOnly' is only relevant in the case of subcharts and shall indicate that the hooks belonging to the particular subchart have to be run sequentially but, it is ok for this to happen in parallel with the execution of hooks from other subcharts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really seeing a use-case for "otherChartsOnly".

Why wouldn't a subchart just adjust its weights to be sequential, it it wanted its hooks to run sequentially?. Wouldn't that achieve the same as "otherChartsOnly"? Perhaps there is a scenario I'm not thinking of?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really just to make migration easier (as outlined in above reply), but yes, I agree, should use hook weights for this scenario instead

Add a new parameter to the chart metadata of type boolean with name 'runHooksInParallel'.
When set to 'false' (default), the behavior will be as today (hooks run in sequence).
When set to 'true', hooks of the same weight for the chart shall be run in parallel.
In the case of subcharts, hooks of the same weight from different sub charts with 'runHooksInParallel' set to 'true' will be run in parallel. Hooks from subcharts with 'runHooksInParallel' to 'false' shall be run sequentially after hooks of the same weight that are permitted to run in parallel.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I (mostly) agree with this implementation. But for subcharts, there is a hidden complexity that isn't entirely spelt out in the HIP I think.

There is a subtle behavior change here for subcharts which have not specified runHooksInParallel: true. That is, their hook order will be changed ro run after hooks of the same weight that they may have run prior to due to the kind/name alphabetical sorting. If a parent/sibling chart specifies runHooksInParallel: true. And by allowing hooks to run in parallel, we are now suddenly forced to content with e.g. parent/sibling charts which expect hooks of the same weight to run in a specific order.

Strictly, to retain compatibility. We would need keep subchart hooks of the same weight to which have runHooksInParallel: false to run in the same order as they would have previously (wrt. other parent/sibling chart looks)(But, hooks of the same weight from chart(s) which have runHooksInParallel: true could run in parallel of course)

But, IMHO, this implementation introduces a lot of complexity (including the cognitive overhead of chart authors understanding it). And it may not be worth doing so.

The rationale is, that or subcharts, I think there are two scenarios:

  1. The subchart is completely independent of the parent chart and its sibling subcharts (think e.g. nginx, which might be a subchart in many places). And by setting runHooksInParallel: true, it just wants its hooks to run in parallel. And this is fine, since it is independent, and doesn't know/care about any other hooks in that its parent/sibling charts may be executing.
  2. The subchart is somehow coupled with its parent/siblings, and "knows" that its hooks running in parallel with other chart's hooks may cause problems.

For 1. IMHO, there is no problems with the chart simply setting runHooksInParallel: true of course. And hooks running in parallel.

For 2. IMHO, the chart author(s) should be required to coordinate hook weights. So that hooks which must run in order do so. And this should be a prerequisite for being able to set runHooksInParallel: true for a (sub)chart which has coupling with parent/sibling chart hooks.

The reason I say this, is to try and limit the complexity to Helm and chart users. As, iirc, Helm just runs all hooks for all (sub)charts in a single pass today. It doesn't group/order them by subcharts, and to strictly retain order, Helm would need to introduce that concept for this HIP. IMHO the complexity for wanting to run hooks in parallel should be pushed as much as possible to the chart authors, and the implementation/concepts keep as simple as possible e.g. "hooks from a (sub)chart with the same weight run in parallel".

In summary, I think the HIP might want to call out this arkwardness with parent/subcharts hook ordering, if a given (sub)chart introduces runHooksInParallel: true. And that it will be the chart authors responsibility to organise hook weights if there is a hook ordering problem between (sub)charts in their setup? I also think this scenario is likely the uncommon one that only power users will encounter.

Another option might be to propagate ``runHooksInParallel: false|true` upwards from parent charts to subcharts? That way, a parent chart can "never be surprised" by subchart hooks suddenly running in parallel?

Thoughts?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the change in ordering for sub-charts where runHooksInParallel is not set to true w.r.t parent/siblings:
Anybody that wants full compatibility with the existing execution order of hooks can achieve that by not setting runHooksInParallel to true. Where it is set to true (in the parent or any sub charts), there is necessarily a different order (the hooks that would be run at the start of a sequential execution would be finished before hooks towards the end of the sequential run, but now that wont be the case when executing in parallel) so the HIP does not aim to maintain the order that would have applied previously. There is I guess an alternative that would closer match the existing order - you could order the hooks as today and only parallelise hooks that are adjacent in the ordering that have runHooksInParallel set to true. I think this would limit the performance gain though as hooks that could be run in parallel will not be if a non-parrallel hook finds itself in the middle of them when ordered by kind/name.

Regarding the independence of the subcharts:
There is a third scenario - the subchart is completely independent of the parent or siblings, but the hooks for the sub-chart itself must be run sequentially. It is this scenario that I proposed the 'subChartsOnly' value. Your point below that this could be achieved instead by setting the weights appropriately is a valid alternative. The downside is it will potentially require more effort on the part of the sub chart designers to ensure their hook weights are set correctly to maintain the ordering they need before switching to runHooksInParallel to 'true'. It would potentially be much easier for them to just switch runHooksInParallel to 'subChartsOnly' as they would get the same ordering of their sub chart hooks w.r.t. each other without having to worry about the weights. However, if you were approaching this from a clean start I think the behaviour you suggest (requiring the chart designer to set the weights appropriately) would be the way to go so perhaps we should go in that direction. I also agree it reduces complexity

So in summary, I think we are in agreement - option 1 and document clearly the chart designer should set the weights appropriately where there is dependencies between hooks from parent/sibling charts and also highlight the consequences of setting runHooksInParallel for the ordering of sub chart hooks (compared with the existing kind/name based algorithm)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants