-
Notifications
You must be signed in to change notification settings - Fork 717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ignore resources reconciled by older controllers #1286
Conversation
} | ||
|
||
// checkExistingResources returns a bool indicating if there are existing resources created for a given resource | ||
func (r *ReconcileElasticsearch) checkExistingResources(es *elasticsearchv1alpha1.Elasticsearch) (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make this generic we can probably just pass in the label selector
LabelSelector: selector, | ||
Namespace: es.Namespace, | ||
} | ||
pods := v1.PodList{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to select services here instead, which is common to all three APM, Kibana and ES
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, that makes sense to me
} | ||
if exist { | ||
log.Info("Resource was previously reconciled by incompatible controller version and missing annotation, adding annotation", "controller_version", r.OperatorInfo.BuildInfo.Version, "namespace", es.Namespace, "es_name", es.Name) | ||
err = annotation.UpdateControllerVersion(r.Client, es, "0.8.0-UNKNOWN") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to also set the common.k8s.elastic.co/pause: true
annotation as well so that the association controllers stop working on these old resources. Alternatively implement version check on the annotation controllers as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with adding the version check to the kibana association controller -- I don't think the APM association controller needs it since it will only act on APM servers, which don't exist < 0.9.0 right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We did include APM server in 0.8 it was just not announced because the implementation was incomplete
if err != nil { | ||
return false, errors.Wrap(err, "Error parsing current version on resource") | ||
} | ||
minVersion, err := semver.NewVersion("0.9.0-ALPHA") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if it is important, but could we avoid hardcoding a version here by saying version compatibility is given if major and minor are the same?
Also we have
cloud-on-k8s/operators/pkg/controller/common/version/version.go
Lines 15 to 20 in f7afa84
type Version struct { | |
Major int | |
Minor int | |
Patch int | |
Label string | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I missed that version package. I am kind of okay with hardcoding package incompatibility since it should be the exception rather than the rule imo. Like in the future if we go from controller 1.2 -> 1.3 because we added a feature, we shouldn't make the assumption right now that controller 1.3 cannot reconcile existing CRs. Definitely possible I'm missing something though.
"github.com/elastic/cloud-on-k8s/operators/pkg/utils/k8s" | ||
"github.com/pkg/errors" | ||
v1 "k8s.io/api/core/v1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we usually alias this as corev1
return false, err | ||
} | ||
// if we listed any services successfully, then we know this cluster was reconciled by an old version since any CRs reconciled by a 0.9.0+ operator would have a label | ||
if len(svcs.Items) != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return len(svcs.Items) != 0, nil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🥇
return false, err | ||
} | ||
// if there's no controller version annotation on the ES instance, then we need to see maybe the CR has been reconciled by an older, incompatible controller version | ||
// TODO pass in the selector? we need to find out what object it is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This TODO seems redundant.
} | ||
if exist { | ||
log.Info("Resource was previously reconciled by incompatible controller version and missing annotation, adding annotation", "controller_version", r.OperatorInfo.BuildInfo.Version, "namespace", es.Namespace, "es_name", es.Name) | ||
err = annotation.UpdateControllerVersion(r.Client, es, "0.8.0-UNKNOWN") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We did include APM server in 0.8 it was just not announced because the implementation was incomplete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just a few nits
if err != nil { | ||
return reconcile.Result{}, err | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there is some potential for extracting these lines into a function of form
func(client k8s.Client, obj runtime.Object, selector labels.Selector, controllerVersion string) (*reconcile.Result, error)
But not sure if it is worth it, can definitely wait for a follow up PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There definitely is, I just went back and forth since it seemed like something that individual controllers might want to handle in their own way. But right now it is all the exact same 🤷♀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
if err != nil { | ||
return false, errors.Wrap(err, "Error parsing current version on resource") | ||
} | ||
minVersion, err := version.Parse("0.9.0-ALPHA") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would probably deserve a constant somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about that but at least right now it is only used in this one function
Ignore resources reconciled by older controllers Backport of 9a2ac59.
* Use the setvmmaxmapcount initcontainer by default in E2E tests (#1300) Let's keep our default defaults :) The setting is disabled explicitly for E2E tests where we enable a restricted security context. * Add docs for plugins, custom configuration files and secure settings (#1298) * Allow license secret webhook to fail (#1301) Webhooks on core k8s objects are just too debilitating in case our webhook service fails. This sets the failure policy for the secret webhook to ignore to strike a balance between UX (immediate feedback) and keeping the users k8s cluster in a working state. Also we have an additional validation run on controller level so this does not allow circumventing our validation logic. * Revert "Use the setvmmaxmapcount initcontainer by default in E2E tests (#1300)" (#1302) This reverts commit fff1526. This commit is breaking our E2E tests chain, which deploy a PodSecurityPolicy by default. Any privileged init container will not work. I'll open an issue for a longer-term fix to properly handle this. * Update quickstart (#1307) * Update the name of the secret for the elastic user * Bump the Elastic Stack version from 7.1.0 to 7.2.0 * Change Kibana readiness endpoint to return a 200 OK (#1309) The previous endpoint returned an http code 302. While this is fine for Kubernetes, some derived systems like GCP LoadBalancers mimic the container readiness check for their own readiness check. Except GCP Loadbalancers only work with status 200. It's not up to us to adapt GCP LoadBalancers to K8s, but this is a fairly trivial fix. * Fix pod_forwarder to support two part DNS names, adjust e2e http_client (#1297) * Fix pod_forwarder to support two part DNS names, adjust e2e http_client url * Revert removing .svc in e2e http_client * [DOC] Resources management and volume claim template (#1252) * Add resources and persistent volume templates documentation * Ignore resources reconciled by older controllers (#1286) * Document PodDisruptionBudget section of the ES spec (#1306) * Document PodDisruptionBudget section of the ES spec I suspect this might slightly change in the feature depending on how we handle the readiness check, so I'm keeping this doc minimal for now: * what is a PDB, briefly (with a link) * default PDB we apply * how to set a different PDB * how to disable the default PDB * Move version out from Makefile (#1312) * Add release note generation tool (#1314) * no external dependencies * inspects PRs by version label * generates structured release notes in asciidoc grouped by type label * Add console output to standalone apm sample (#1321) * Update Quickstart to 0.9.0 (#1317) * Update doc (#1319) * Update persistent storage section * Update kibana localhost url to use https * Update k8s resources names in accessing-services doc * Mention SSL browser warning * Fix bulleted list * Add CI job for nightly builds (#1248) * Move version to a file * Add CI implementation * Update VERSION * Depend on another PR for moving out version from Makefile * Update Jenkinsfile * Don't build and push operator image in bootstrap-gke (#1332) We don't need to do that anymore, since we don't use an init container based on the operator image. * Remove Docker image publishing from devops-ci (#1339) * Suppress output of certain commands from Makefile (#1342) * Document how to disable TLS (#1341) * Use new credentials for Docker registry (#1346) * Workaround controller-runtime webhook upsert bug (#1337) * Fix docs build on PR job (#1351) * Fix docs build on PR job * Cleanup workspace before doing other steps * APM: remove "output" element and add elasticsearchRef (#1345) * Don't rely on buggy metaObject Kind (#1324) * Don't rely on buggy metaObject Kind A bug in our client implementation may clear the object's Kind on certain scenarios. See kubernetes-sigs/controller-runtime#406. Let's avoid that by fixing a constant Kind returned by a method call on the resource.
Resolves #1241
Still need to do some manual testing but so far looks good. Opening up for early reviews now since there's a bit of time pressure.
Manual testing steps I used:
checked out 0.8.1 tag,
dep ensure --vendor-only && make run
, ranmake samples
and let it build the resources successfully.checked out this branch,
dep ensure --vendor-only && make run
. Verified that it didn't act on the existing resources and added an annotation with0.8.0-UNKNOWN
.