Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore resources reconciled by older controllers #1286

Merged
merged 9 commits into from
Jul 22, 2019

Conversation

anyasabo
Copy link
Contributor

@anyasabo anyasabo commented Jul 19, 2019

Resolves #1241

  • If there is no existing annotation on a resource, it may be old (previously reconciled by a <0.9.0 controller), or it may be brand new. We update the annotation indicating which one by looking if there are any pods that belong to the CR already
  • If there is an existing annotation, we make sure it's at least a 0.9.0 pre-release version before continuing

Still need to do some manual testing but so far looks good. Opening up for early reviews now since there's a bit of time pressure.

Manual testing steps I used:
checked out 0.8.1 tag, dep ensure --vendor-only && make run, ran make samples and let it build the resources successfully.
checked out this branch, dep ensure --vendor-only && make run. Verified that it didn't act on the existing resources and added an annotation with 0.8.0-UNKNOWN.

}

// checkExistingResources returns a bool indicating if there are existing resources created for a given resource
func (r *ReconcileElasticsearch) checkExistingResources(es *elasticsearchv1alpha1.Elasticsearch) (bool, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make this generic we can probably just pass in the label selector

LabelSelector: selector,
Namespace: es.Namespace,
}
pods := v1.PodList{}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to select services here instead, which is common to all three APM, Kibana and ES

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that makes sense to me

}
if exist {
log.Info("Resource was previously reconciled by incompatible controller version and missing annotation, adding annotation", "controller_version", r.OperatorInfo.BuildInfo.Version, "namespace", es.Namespace, "es_name", es.Name)
err = annotation.UpdateControllerVersion(r.Client, es, "0.8.0-UNKNOWN")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to also set the common.k8s.elastic.co/pause: true annotation as well so that the association controllers stop working on these old resources. Alternatively implement version check on the annotation controllers as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with adding the version check to the kibana association controller -- I don't think the APM association controller needs it since it will only act on APM servers, which don't exist < 0.9.0 right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did include APM server in 0.8 it was just not announced because the implementation was incomplete

if err != nil {
return false, errors.Wrap(err, "Error parsing current version on resource")
}
minVersion, err := semver.NewVersion("0.9.0-ALPHA")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it is important, but could we avoid hardcoding a version here by saying version compatibility is given if major and minor are the same?

Also we have

type Version struct {
Major int
Minor int
Patch int
Label string
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I missed that version package. I am kind of okay with hardcoding package incompatibility since it should be the exception rather than the rule imo. Like in the future if we go from controller 1.2 -> 1.3 because we added a feature, we shouldn't make the assumption right now that controller 1.3 cannot reconcile existing CRs. Definitely possible I'm missing something though.

@anyasabo anyasabo changed the title (WIP) Ignore resources reconciled by older controllers Ignore resources reconciled by older controllers Jul 19, 2019
@anyasabo anyasabo marked this pull request as ready for review July 19, 2019 23:51
"github.com/elastic/cloud-on-k8s/operators/pkg/utils/k8s"
"github.com/pkg/errors"
v1 "k8s.io/api/core/v1"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we usually alias this as corev1

return false, err
}
// if we listed any services successfully, then we know this cluster was reconciled by an old version since any CRs reconciled by a 0.9.0+ operator would have a label
if len(svcs.Items) != 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return len(svcs.Items) != 0, nil

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥇

return false, err
}
// if there's no controller version annotation on the ES instance, then we need to see maybe the CR has been reconciled by an older, incompatible controller version
// TODO pass in the selector? we need to find out what object it is
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TODO seems redundant.

}
if exist {
log.Info("Resource was previously reconciled by incompatible controller version and missing annotation, adding annotation", "controller_version", r.OperatorInfo.BuildInfo.Version, "namespace", es.Namespace, "es_name", es.Name)
err = annotation.UpdateControllerVersion(r.Client, es, "0.8.0-UNKNOWN")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did include APM server in 0.8 it was just not announced because the implementation was incomplete

Copy link
Collaborator

@pebrc pebrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a few nits

if err != nil {
return reconcile.Result{}, err
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there is some potential for extracting these lines into a function of form

func(client k8s.Client, obj runtime.Object, selector labels.Selector, controllerVersion string) (*reconcile.Result, error)

But not sure if it is worth it, can definitely wait for a follow up PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There definitely is, I just went back and forth since it seemed like something that individual controllers might want to handle in their own way. But right now it is all the exact same 🤷‍♀

Copy link
Contributor

@sebgl sebgl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

if err != nil {
return false, errors.Wrap(err, "Error parsing current version on resource")
}
minVersion, err := version.Parse("0.9.0-ALPHA")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would probably deserve a constant somewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that but at least right now it is only used in this one function

@pebrc pebrc merged commit 9a2ac59 into elastic:master Jul 22, 2019
sebgl pushed a commit to sebgl/cloud-on-k8s that referenced this pull request Jul 22, 2019
Ignore resources reconciled by older controllers

Backport of 9a2ac59.
pebrc pushed a commit that referenced this pull request Jul 22, 2019
Ignore resources reconciled by older controllers

Backport of 9a2ac59.
@anyasabo anyasabo deleted the 1241/ignorerevs branch July 22, 2019 12:57
@pebrc pebrc added the >enhancement Enhancement of existing functionality label Jul 22, 2019
sebgl added a commit that referenced this pull request Jul 24, 2019
* Use the setvmmaxmapcount initcontainer by default in E2E tests (#1300)

Let's keep our default defaults :)

The setting is disabled explicitly for E2E tests where we enable a
restricted security context.

* Add docs for plugins, custom configuration files and secure settings (#1298)

* Allow license secret webhook to fail (#1301)

Webhooks on core k8s objects are just too debilitating in case our
webhook service fails. This sets the failure policy for the secret
webhook to ignore to strike a balance between UX (immediate feedback)
and keeping the users k8s cluster in a working state. Also we have an
additional validation run on controller level so this does not allow
circumventing our validation logic.

* Revert "Use the setvmmaxmapcount initcontainer by default in E2E tests (#1300)" (#1302)

This reverts commit fff1526.
This commit is breaking our E2E tests chain, which deploy a
PodSecurityPolicy by default. Any privileged init container will not
work.

I'll open an issue for a longer-term fix to properly handle this.

* Update quickstart (#1307)

* Update the name of the secret for the elastic user
* Bump the Elastic Stack version from 7.1.0 to 7.2.0

* Change Kibana readiness endpoint to return a 200 OK (#1309)

The previous endpoint returned an http code 302. While this is fine for
Kubernetes, some derived systems like GCP LoadBalancers mimic the
container readiness check for their own readiness check. Except GCP
Loadbalancers only work with status 200.

It's not up to us to adapt GCP LoadBalancers to K8s, but this is a
fairly trivial fix.

* Fix pod_forwarder to support two part DNS names, adjust e2e http_client (#1297)

* Fix pod_forwarder to support two part DNS names, adjust e2e http_client url

* Revert removing .svc in e2e http_client

* [DOC] Resources management and volume claim template (#1252)

* Add resources and persistent volume templates documentation

* Ignore resources reconciled by older controllers (#1286)

* Document PodDisruptionBudget section of the ES spec (#1306)

* Document PodDisruptionBudget section of the ES spec

I suspect this might slightly change in the feature depending on how we
handle the readiness check, so I'm keeping this doc minimal for now:

* what is a PDB, briefly (with a link)
* default PDB we apply
* how to set a different PDB
* how to disable the default PDB

* Move version out from Makefile (#1312)

* Add release note generation tool (#1314)

* no external dependencies
* inspects PRs by version label
* generates structured release notes in asciidoc grouped by type label

* Add console output to standalone apm sample (#1321)

* Update Quickstart to 0.9.0 (#1317)

* Update doc (#1319)

* Update persistent storage section
* Update kibana localhost url to use https
* Update k8s resources names in accessing-services doc
* Mention SSL browser warning
* Fix bulleted list

* Add CI job for nightly builds (#1248)

* Move version to a file

* Add CI implementation

* Update VERSION

* Depend on another PR for moving out version from Makefile

* Update Jenkinsfile

* Don't build and push operator image in bootstrap-gke (#1332)

We don't need to do that anymore, since we don't use an init container
based on the operator image.

* Remove Docker image publishing from devops-ci (#1339)

* Suppress output of certain commands from Makefile (#1342)

* Document how to disable TLS (#1341)

* Use new credentials for Docker registry (#1346)

* Workaround controller-runtime webhook upsert bug (#1337)

* Fix docs build on PR job (#1351)

* Fix docs build on PR job

* Cleanup workspace before doing other steps

* APM: remove "output" element and add elasticsearchRef (#1345)

* Don't rely on buggy metaObject Kind (#1324)

* Don't rely on buggy metaObject Kind

A bug in our client implementation may clear the object's Kind on
certain scenarios. See
kubernetes-sigs/controller-runtime#406.

Let's avoid that by fixing a constant Kind returned by a method call on
the resource.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement Enhancement of existing functionality v0.9.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ignore resources created in 0.8.x
3 participants