Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report the lowest running version in the status of each resource #3489

Merged
merged 5 commits into from
Jul 22, 2020

Conversation

sebgl
Copy link
Contributor

@sebgl sebgl commented Jul 20, 2020

Report the currently running version in the status.version field of
each resource. In case a version upgrade is in progress, report the
lowest running version.

For Elasticsearch, we check both the Pods and the StatefulSets to report
the lowest version. A StatefulSet might report a higher version whereas
the Pods haven't been upgraded yet. It may also report a lower version
and no Pods are running any more with that version.
For other resources, we just check the Pods.

Parts of the Status is refactored into a common DeploymentStatus for
Kibana, ApmServer and EnterpriseSearch. Elasticsearch and Beat have
different requirements which cannot fit this common place.

This is a pre-req for strict stack version upgrade ordering
(#2600).

Report the currently running version in the `status.version` field of
each resource. In case a version upgrade is in progress for that
resource, report the lowest running version.

For Elasticsearch, we check both the Pods and the StatefulSets to report
the lowest version. A StatefulSet might report a higher version whereas
the Pods haven't been upgraded yet. It may also report a lower version
and no Pods are running any more with that version.
For other resources, we just check the Pods.

Parts of the Status is refactored into a common `DeploymentStatus` for
Kibana, ApmServer and EnterpriseSearch. Elasticsearch and Beat have
different requirements which cannot fit this common place.

This is a pre-req for strict stack version upgrade ordering
(elastic#2600).
@@ -25,6 +27,8 @@ type ResourcesState struct {
CurrentPodsByPhase map[corev1.PodPhase][]corev1.Pod
// DeletingPods are all deleted Elasticsearch pods.
DeletingPods []corev1.Pod
// StatefulSets are all existing StatefulSets for the cluster.
StatefulSets sset.StatefulSetList
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may feel a bit strange to retrieve StatefulSets here for the only purpose of reporting the version in the status, but I think it's still way cleaner than propagating the k8s client to the function that updates the status data, that did not really need it.
I'm also thinking we should invest more in this struct, instead of retrieving the Pods and StatefulSets several times in a single reconciliation. See #3490.

Copy link
Contributor

@david-kow david-kow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall.

For Elasticsearch, we check both the Pods and the StatefulSets to report
the lowest version. A StatefulSet might report a higher version whereas
the Pods haven't been upgraded yet. It may also report a lower version
and no Pods are running any more with that version.

Isn't this the case for Deployments/DaemonSets too?

description: Health of the deployment.
type: string
version:
description: 'Version of the stack resource currently running. During
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this has quotes around it while other descriptions don't?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think because it is spread over more than 1 line. I tested with a single line in the comment: quotes are removed.
So I think it's fine.

pkg/controller/common/version/version.go Outdated Show resolved Hide resolved
@@ -265,6 +265,8 @@ ConfigSource references configuration settings.
|===




Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove?

Copy link
Contributor Author

@sebgl sebgl Jul 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know where it comes from, it is auto-generated this way 😕
I see other places in this file with a lot of empty lines.
@charith-elastic any idea?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They don't get rendered IIRC so I'm not sure it's worth spending too much time on

@sebgl
Copy link
Contributor Author

sebgl commented Jul 22, 2020

For Elasticsearch, we check both the Pods and the StatefulSets to report
the lowest version. A StatefulSet might report a higher version whereas
the Pods haven't been upgraded yet. It may also report a lower version
and no Pods are running any more with that version.

Isn't this the case for Deployments/DaemonSets too?

I think it is slightly different because we reconcile a single DaemonSet/Deployment per resource, whereas we have multiple StatefulSets per Elasticsearch resource:

  • We may be in a situation where a Deployment reports version N+1 and no Pods are running. In this case we keep the last observed version (eg. N). Once the Pods are running again they should be created in version N+1 so we can just use it at this point in time. I don't see a case where inspecting the Pods only would cause a version mismatch.
  • When dealing with StatefulSets we may have 2/3 StatefulSets updated to version N+1. If Pods of the third StatefulSet are temporarily not running, all Pods may report version N+1, but Pods of that 3rd StatefulSet may be recreated in version N anytime soon. So we should not report version N+1 until all Pods and StatefulSets report N+1.

Copy link
Collaborator

@pebrc pebrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

pkg/controller/common/version/version.go Show resolved Hide resolved
wantErr: false,
},
}
for _, tt := range tests {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, very easy to read. We should probably clean up the verbose ones I wrote above at some point (not necessarily this PR)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have one (or several) in mind specifically?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TestMin one that uses a lot of vertical space for the Version declarations which could be expressed in a more compact form for readability. But it is really not that important.

Copy link
Contributor

@thbkrkr thbkrkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

pkg/utils/k8s/k8sutils.go Outdated Show resolved Hide resolved
pkg/apis/common/v1/common.go Outdated Show resolved Hide resolved
@sebgl sebgl merged commit 2e06a7c into elastic:master Jul 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement Enhancement of existing functionality v1.3.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants