Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial support for CRDs (upgrade) policies #250

Merged
merged 4 commits into from
Apr 20, 2021

Conversation

alex-berger
Copy link
Contributor

This PR adds preliminary support for Upgrading CRDs which are part of Helm Charts managed via HelmRelease objects.

See also fluxcd/flux2#1071 for more information.

Background & Motivation

Helm still does not provide any built-in solution to the CRD upgrade problem, which is well-known and
documented:

There is no support at this time for upgrading or deleting CRDs using Helm. This was an explicit decision after much community discussion due to the danger for unintentional data loss. Furthermore, there is currently no community consensus around how to handle CRDs and their lifecycle. As this evolves, Helm will add support for those use cases.

This limitation of Helm makes GitOps style "day 2 operations" of Kubernetes resources managed by
HelmRelease objects very difficult if not to say impossible. Currently, we have to manually
upgrade CRDs from HelmCharts referenced by HelmReleases, which is cumbersome and needs very
tight (timing) coordination between the commit to the GitOps repository and the manual CRD
upgrade on all systems that observe/apply that commit. This apporach is not only very error
prone it also might cause unnecessary long downtime of services.

Note, most of the Helm Charts that we install and operate are created and maintained by
3rd parties and are not under our control. Extracting all CRDs from every Helm Chart
upon each new release and manually applying those CRD resources is, as mentioned above,
very time intensive, error prone and cumbersome.

Our observation is, that most CRD upgrades are non-critical and only evolve an existing CRD
in a backward compatible fashion. Therefore, I am wondering whether it would make sense to
extends the HelmRelease resources with an opt-in flag to automatically upgrade CRD objects
(if needed).

@alex-berger
Copy link
Contributor Author

Unfortunately, I run out of ideas how to make the e2e test working with pull request. E2e tests work on my feature branches which are real branches that can be referenced by the GitRepository, but pull request are not real branches so it fails.

Copy link
Member

@hiddeco hiddeco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This already looks great, couple of minor comments and a suggestion that I would like to discuss. 🏅

internal/runner/runner.go Outdated Show resolved Hide resolved
docs/spec/v2beta1/helmreleases.md Outdated Show resolved Hide resolved
internal/runner/runner.go Outdated Show resolved Hide resolved
api/v2beta1/helmrelease_types.go Outdated Show resolved Hide resolved
@alex-berger alex-berger force-pushed the feature/crd-upgrade branch 3 times, most recently from 7dac7b8 to e32e706 Compare April 20, 2021 08:09
@stefanprodan stefanprodan added the enhancement New feature or request label Apr 20, 2021
Copy link
Member

@hiddeco hiddeco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the tests do now work?

docs/spec/v2beta1/helmreleases.md Outdated Show resolved Hide resolved
internal/runner/runner.go Outdated Show resolved Hide resolved
@alex-berger
Copy link
Contributor Author

alex-berger commented Apr 20, 2021

Looks like the tests do now work?

As mentioned above the test run on branches and tags, but unfortunately not on pull-requests. See my feature branch (the source branch of the PR at https://github.com/alex-berger/helm-controller/actions for test results).

I have not yet figured out whether there is a way to reference a GitHub pull-request as branch (I wasted 6 hours trying to find a way, but did not find one yet). As my time is limited I have given up and disabled the (new) tests for pull-requests, such that they only run on branches (and maybe tags).

Signed-off-by: Alexander Berger <alex-berger@gmx.ch>
Signed-off-by: Alexander Berger <alex-berger@gmx.ch>
Signed-off-by: Alexander Berger <alex-berger@gmx.ch>
Signed-off-by: Alexander Berger <alex-berger@gmx.ch>
Copy link
Member

@hiddeco hiddeco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am OK with the e2e approach for now, we should move many of the tests to Go anyway, as has been done with a lot of kustomize-controller.

Anyway, thanks a lot @alex-berger. 💯 🌻

@hiddeco hiddeco merged commit 9a049a1 into fluxcd:main Apr 20, 2021
@@ -58,7 +70,7 @@ type Runner struct {
// namespace configured to the provided values.
func NewRunner(getter genericclioptions.RESTClientGetter, storageNamespace string, logger logr.Logger) (*Runner, error) {
runner := &Runner{
logBuffer: NewLogBuffer(NewDebugLog(logger).Log, 0),
logBuffer: NewLogBuffer(NewDebugLog(logger).Log, 100),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I somehow missed this during review, and this change is not related to any of the CRD things. Can you explain why this was changed? As it changes the default of including the last 5 log lines (after de-duplication) in a failure condition and/or event to 100, which makes any failed event we produce now quite verbose.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I increased it for debugging during feature development and eventually forgot to decrease it again. 😳

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that means my restore PR is justified and I can continue with the release :-)

@hiddeco hiddeco changed the title Initial support for HelmRelease for upgrading CRDs Initial support for CRDs (upgrade) policies Apr 21, 2021
@joejulian
Copy link

Looks like I'm moments late to comment on this, but has the justifications for helm, itself, not supporting this been considered? https://github.com/helm/community/blob/main/hips/hip-0011.md

@hiddeco
Copy link
Member

hiddeco commented Apr 21, 2021

They have been considered, which is why we default to not enable the behavior as it can be potentially disastrous.

However, given that people use the helm-controller to automatically drive operations, having access to a configuration option that does the right thing if you know how your chart (or the application behind it) behaves on Custom Resource Definition upgrades, it is in my opinion justified.

@joejulian
Copy link

This feature would need to be able to add the conversion webhooks simultaneously. Links to them are defined in the CRD itself: see the cert-manager CRDs for example. Without them, the API server will reject create/change verbs to those CRDs.

@alex-berger alex-berger deleted the feature/crd-upgrade branch June 25, 2021 13:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants