-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enforce Elastic stack upgrade order #2600
Comments
One thing that was mentioned on #2353 (comment) is to reject (as opposed to delay) the version upgrade at the validation webhook level. I think we should not go with that approach since it makes the user experience a bit painful. As a user I would expect I can update both Elasticsearch and Kibana version at the same time in the YAML manifest, then let ECK deal with it. A while ago we tried to design the association controllers in such a way that they may run with different managed namespaces contexts. For example the Kibana controller may not have access to the Elasticsearch namespace, so it should not deal with Elasticsearch resources at all. I'm not sure where we stand now regarding this "constraint", but I think we should aim at keeping the various controllers responsibilities decoupled so that, for example:
Here is what I have in mind, I think it's pretty close to ideas @pebrc expressed in the first post: Currently running version reflected in the status Add an additional
Annotation set by the association controller in associated resources In each association controller (eg. the Kibana-Elasticsearch association controller), annotate each resource where the association is specified (eg. Kibana) with the version of the resource it is associated to (eg. Elasticsearch), retrieved from its status. For example, if an Elasticsearch resource status reports
This would be done generically by the association controller for any association: The APM-Kibana association controller would set the following annotation on the APM resource:
The Beat-ES association controller would set the following annotation on the Beat resource:
The Beat-Kibana association controller would set the following annotation on the beat resource:
The association controller also ensures the annotation is not set if there is no association specified. Delaying resources version upgrade As part of each resource reconciliation (eg. before updating a Deployment in the Kibana controller), inspect the annotations that make sense (eg.
This mechanism does not enforce a global stack upgrade order, but rather an implicit dependency graph between associated resources:
|
+1 to delaying rather then blocking. But, if delay is going to be significantly harder to implement then just blocking for now, maybe implement blocking first, then remove the blocking once delaying is ready? I didn't expect the breakage I saw when I updated both at once and kibana upgraded significantly faster then elastticsearch, causing an issue. |
Report the currently running version in the `status.version` field of each resource. In case a version upgrade is in progress for that resource, report the lowest running version. For Elasticsearch, we check both the Pods and the StatefulSets to report the lowest version. A StatefulSet might report a higher version whereas the Pods haven't been upgraded yet. It may also report a lower version and no Pods are running any more with that version. For other resources, we just check the Pods. Parts of the Status is refactored into a common `DeploymentStatus` for Kibana, ApmServer and EnterpriseSearch. Elasticsearch and Beat have different requirements which cannot fit this common place. This is a pre-req for strict stack version upgrade ordering (elastic#2600).
* Report the lowest running version in the status of each resource Report the currently running version in the `status.version` field of each resource. In case a version upgrade is in progress for that resource, report the lowest running version. For Elasticsearch, we check both the Pods and the StatefulSets to report the lowest version. A StatefulSet might report a higher version whereas the Pods haven't been upgraded yet. It may also report a lower version and no Pods are running any more with that version. For other resources, we just check the Pods. Parts of the Status is refactored into a common `DeploymentStatus` for Kibana, ApmServer and EnterpriseSearch. Elasticsearch and Beat have different requirements which cannot fit this common place. This is a pre-req for strict stack version upgrade ordering (#2600).
Differences between the design proposal above and the actual implementation:
|
Supersedes #2426 and #2353 (see there for more context and discussion)
Currently each stack application is reconciled independently of each other. When a user upgrades the version of a group of linked resources (Kibana, ES, APMServer) this is violating our own documented stack upgrade procedure which clearly defines an upgrade order. The problems described in #2426 and #2353 would be avoided if the upgrade order would be enforced by ECK:
For that the Kibana and APM controllers would have to
Problems/Questions:
The text was updated successfully, but these errors were encountered: