Skip to content
This repository has been archived by the owner on Nov 1, 2022. It is now read-only.

Make all Helm releases through the operator #1906

Merged
merged 6 commits into from
Apr 16, 2019
Merged

Conversation

hiddeco
Copy link
Member

@hiddeco hiddeco commented Apr 5, 2019

Fixes #1873, fixes #1820

This PR changes a couple of things in the Helm operator (details are commented in each commit), but in one sentence: it ensures all releases are made by the operator.

(If someone has a better one sentence recap of this PR; be my guest to change it)

@hiddeco hiddeco added the helm label Apr 5, 2019
@hiddeco hiddeco force-pushed the helm/refactor-hr-sync branch from c5377e9 to 0f34784 Compare April 5, 2019 17:36
@squaremo
Copy link
Member

squaremo commented Apr 8, 2019

  • Trigger HelmRelease update on repository change to trigger an update through the operator, instead of calling the blocking chs.reconcileReleaseDef(fhr). This can probably be done by adding, editing or removing an annotation on the HelmRelease, as those count as updates.

Per your comment, if we can do this via the controller/informer/thingdoer, that would be preferable I reckon .. IIRC though, it is tricky to get at the individual bits of a controller once you've constructed it, so it might require some contortions :-/

@hiddeco hiddeco force-pushed the helm/refactor-hr-sync branch 5 times, most recently from 3f60c90 to 1ed7486 Compare April 9, 2019 14:30
Before this change there was a ticker in the ChartChangeSync loop
which 'rereleased' all HelmReleases on the set interval, to prevent
mutations due to e.g. manual chart releases. This commit moves this
logic to the operator by utilizing the resync interval that can be set
on the shared informer.

On every resync interval, all objects known at that time to the
informer are redelivered as 'updates', causing the same effect as the
ticker process described above, but with several improvements.

1. The syncHandler validates if a HelmRelease still exists before
   attempting to run the release. This is an improvement compared to
   the blind run we did before, where in case a user had many
   HelmReleases, the Helm release could be removed at the time the
   release attempt was made.
2. We now act on information from one source (except for the
   `reconcileReleaseDef()` call on repository changes, see added
   comment).
@hiddeco hiddeco force-pushed the helm/refactor-hr-sync branch 3 times, most recently from 05cc815 to 0e503ed Compare April 9, 2019 16:16
@hiddeco hiddeco changed the title Refactor Helm operator sync loops Make all Helm releases through the controller Apr 9, 2019
@hiddeco hiddeco changed the title Make all Helm releases through the controller Make all Helm releases through the operator Apr 9, 2019
@hiddeco hiddeco requested review from squaremo and 2opremio April 9, 2019 16:36
@hiddeco
Copy link
Member Author

hiddeco commented Apr 9, 2019

Do not trigger update event on status changes

From my observations patching the subresource still triggers an update event @stefanprodan, from the brief discussion we had I thought this would not be the case. Is there something wrong with how I apply the patch or did we have a misunderstanding?


Besides that, PTAL @squaremo @2opremio, I think you will like it.

@squaremo
Copy link
Member

squaremo commented Apr 9, 2019

I think you will like it.

At +192 −209 that seems a near certainty :-)

Before this change, and due to the fact that we no longer compare
the spec of the given HelmRelease against the spec of the old one,
every status and condition update would enqueue a new release job.

By updating the subresource instead of the resource, we no longer
trigger the update event, causing no unwanted trigger of a release.
@hiddeco hiddeco force-pushed the helm/refactor-hr-sync branch from 1485c8c to 93a34f4 Compare April 11, 2019 12:55
@hiddeco hiddeco marked this pull request as ready for review April 11, 2019 12:56
@2opremio
Copy link
Contributor

@hiddeco Have you reviewed the large comment at the beginning of chartsync.go? I would guess it requires updating.

@hiddeco
Copy link
Member Author

hiddeco commented Apr 15, 2019

@2opremio err, no, that caught my eye but got lost in my mind. Will rewrite it, good catch.

hiddeco added 4 commits April 15, 2019 22:33
This commit changes a couple of things in how the operator processes
and acts on mirror changes.

1. On mirror changes, instead of first requesting all namespaces, and
   then collecting all HelmReleases (per namespace) from the API,
   all HelmReleases are now collected from the informer (which caches
   all HelmReleases the operator has seen), which already takes the
   namespace restriction into account.
2. Instead of looping over a set of _all_ HelmReleases, determining if
   they make use of one of the changed mirrors, and calculating what we
   should do based upon the changed mirror state; we now loop over the
   changed mirrors we received and request HelmReleases for each one of
   them, we then calculate what we should do once per mirror and
   execute it for each HelmRelease we requested earlier. This greatly
   reduces the amount of calculations we make without changing the
   outcome.
3. We no longer call the blocking `chs.reconcileReleaseDef()` for each
   HelmRelease but instead enqueue the HelmReleases in the working
   queue the operator consumes from. As the operator consuming items
   from the queue validates if a HelmRelease still exists just before
   executing release, this prevents cases where we would attempt to
   release a HelmRelease we still had in memory (due to collecting all
   HelmReleases at once), while the HelmRelease already had been
   removed.
This is a optimistic lock and prevents the operator from potentially
releasing an old revision of a `HelmRelease`. As there is a (small)
chance the spec is changed between the time we started preparing and
calculating the upgrade, and the actual upgrade itself.
And just broadcast the chart was synced, without implying success,
as the event will always fire regardless of the sync outcome.
@hiddeco hiddeco force-pushed the helm/refactor-hr-sync branch from 93a34f4 to 439787a Compare April 15, 2019 20:33
@hiddeco
Copy link
Member Author

hiddeco commented Apr 16, 2019

@2opremio I addressed the comment block in chartsync.go by rewriting history (5893fa0).

@2opremio
Copy link
Contributor

It LGTM, but I am still no expert on the helm operator so I may be missing problems.

@stefanprodan mind taking a look?

Copy link
Member

@stefanprodan stefanprodan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

great job @hiddeco

@hiddeco hiddeco merged commit ff93ae0 into master Apr 16, 2019
@hiddeco hiddeco deleted the helm/refactor-hr-sync branch April 16, 2019 11:22
hiddeco added a commit that referenced this pull request May 23, 2019
During the refactor of synchronization mechanics in #1906, this got
accidentally moved outside the condition that checks if we have a clone
or it needs updating, with the result that a release is always enqueued
on a signal from the mirror it is bound to.

Move it back inside the condition so we only enqueue releases we
actually have updates from git for.
hiddeco added a commit that referenced this pull request May 23, 2019
During the refactor of synchronization mechanics in #1906, this got
accidentally moved outside the condition that checks if we have a clone
or it needs updating, with the result that a release is always enqueued
on a signal from the mirror it is bound to.

Move it back inside the condition so we only enqueue releases we
actually have updates from git for.
hiddeco added a commit that referenced this pull request May 23, 2019
During the refactor of synchronization mechanics in #1906, this got
accidentally moved outside the condition that checks if we have a clone
or it needs updating, with the result that a release is always enqueued
on a signal from the mirror it is bound to.

Move it back inside the condition so we only enqueue releases we
actually have updates from git for.
hiddeco added a commit that referenced this pull request May 29, 2019
Before this change the operator would hang at random on "waiting for
informer caches to sync", without anything else happening or it timing
out.

After an extensive search in recently merged PRs, this seems to be
related to an change of order in #1906, where the informer would be
initialized before the operator had registered its event handlers.

In addition to reverting this change, reorder everything so that the
operator is started before anything else that may depend on it, so it
can start processing changes from the queue as soon as possible.
hiddeco added a commit that referenced this pull request May 29, 2019
Before this change the operator would hang at random on "waiting for
informer caches to sync", without anything else happening or it timing
out.

After an extensive search in recently merged PRs, this seems to be
related to an change of order in #1906, where the informer would be
initialized before the operator had registered its event handlers.

In addition to reverting this change and ensuring the informer cache is
synced before it is used, reorder everything so that the operator is
started before anything else that may depend on it, so it can start
processing changes from the queue as soon as possible.
hiddeco added a commit that referenced this pull request May 29, 2019
Before this change the operator would hang at random on "waiting for
informer caches to sync", without anything else happening or it timing
out.

After an extensive search in recently merged PRs, this seems to be
related to an change of order in #1906, where the informer would be
initialized before the operator had registered its event handlers.

In addition to reverting this change and ensuring the informer cache is
synced before it is used, reorder everything so that the operator is
started before anything else that may depend on it, so it can start
processing changes from the queue as soon as possible.
hiddeco added a commit that referenced this pull request May 29, 2019
Before this change the operator would hang at random on "waiting for
informer caches to sync", without anything else happening or it timing
out.

After an extensive search in recently merged PRs, this seems to be
related to an change of order in #1906, where the informer would be
initialized before the operator had registered its event handlers.

In addition to reverting this change and ensuring the informer cache is
synced before it is used, reorder everything so that the operator is
started before anything else that may depend on it, so it can start
processing changes from the queue as soon as possible.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
4 participants