Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logstash - add ability to reload pipeline(s) without triggering full pod restart #6674

Merged
merged 13 commits into from
Apr 21, 2023

Conversation

robbavey
Copy link
Member

This commit adds the ability to reload logstash pipelines when the pipeline changes, without triggering a full restart of the pod, leveraging Logstash's ability to watch pipeline definitions, and reload automatically if a change is discovered.

A logstash config directory typically includes logstash.yml, pipelines.yml, jvm.options and log4j2.properties required to run logstash - while logstash can store the contents of a pipeline definition in any location, the pipelines.yml definition file, which states where these definition files are must be in the same config directory as the other setup files.

To enables us to have a mixture of copied and generated files in this config diretory, an initContainer is used, with a small script to prepare the config directory.

  • The script copies the contents of the /usr/share/logstash/config into the a shared config volume
  • Additionally softlinks are created into this volume from the pipelines.yml and logstash.yml secrets created by the logstash operator.

Additionally, we now do not include changes to pipelines in the configuration hash that triggers a pod reload, but instead write the config.reload.automatic: true setting in to logstash.yml

Note that triggering a reload is not immediate - there may be a delay measured in minutes between the pipeline definition being changed, and that being reflected in a pipeline reload.

@botelastic botelastic bot added the triage label Apr 10, 2023
@thbkrkr thbkrkr added the >enhancement Enhancement of existing functionality label Apr 11, 2023
@botelastic botelastic bot removed the triage label Apr 11, 2023
@robbavey robbavey mentioned this pull request Apr 11, 2023
@kaisecheng
Copy link
Contributor

@robbavey I tried to run logstash.yaml sample and update the Secret logstash-sample-ls-pipeline with kubectl apply -f, but unable to update the pipeline to test the reload. Do you have any trick to update the pipeline?

@robbavey
Copy link
Member Author

@kaisecheng It takes a surprisingly long time - I've seen it take up to 3 minutes for the change to get picked up. For manual testing, I typically make and apply the change then watch the pod logs.

@pebrc
Copy link
Collaborator

pebrc commented Apr 11, 2023

It takes a surprisingly long time - I've seen it take up to 3 minutes for the change to get picked up

This could be due to the kubelets sync interval which is 60 seconds plus some jitter and I guess there is probably also some delay on the Logstash side before the change is detected in the filesystem. We use a trick for Elasticsearch to force an immediate sync for the secrets where timely sync is important: we annotate the Pods (we use a timestamp but the actual annotation is irrelevant) which forces a sync.

I am not sure if pipeline changes need such urgent propagation to warrant such extra complexity but I thought I mention it:

annotation.MarkPodAsUpdated(ctx, c, pod)

Copy link
Contributor

@kaisecheng kaisecheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking care of the pipeline reload issue. It works as expected. I have tested pipeline updates through pipelines and pipelinesRef. Both take 1 ~ 2 minutes locally to reflect the change. The e2e test looks good. I run it with export E2E_TAGS=e2e; make e2e-local TESTS_MATCH=TestPipelineConfigLogstash.

The only question on my mind is whether we should enable config.reload.automatic: true in ECK by default, because it is not enabled in docker nor any distribution

@robbavey
Copy link
Member Author

@kaisecheng It's a good question, and worth discussing -

I think when a change is made to a pipeline or pipelineRef in the CRD, we can choose to

  • Restart the pipeline(s)
  • Restart the Pod
  • Do nothing

My thoughts here are that a change to a definition in the CRD, such as a change to a pipeline or pipelineRef is a deliberate action, that a user might expect to be applied on without manual intervention - similar to a change in config or configRef, which would trigger a restart of a pod.

This change allows that change to be acted upon with as little disruption as we can get away with - we can now limit the change to pipeline(s), rather than restarting the whole pod. If we don't set config.reload.automatic, a user would have to intervene manually, and delete the pod(s) to have the pipeline change come into effect, if we don't restart the pod.

But, let's discuss

@kaisecheng
Copy link
Contributor

Agree that having pipeline auto-reload is a better experience. Sadly, it is not the default in Logstash. When config.reload.automatic is set to false, changes in pipeline do not restart the pipeline nor restart the pod.

Users may expect pod restart when pipeline changes, just like changing in config configRef results in a pod restart.
I think making pod restart when reload is set to false and pipeline restart when reload is set to true is truly making reloading a configurable flag. It can be done by passing the boolean flag to reconcilePipeline() to update configHash.

I am still not sure whether changing the default reload behavior only in ECK is right, but open to this change.

@robbavey
Copy link
Member Author

@barkbay @pebrc Ready for review by ECK team. I implemented the suggested optimization to speed up pipeline loading - thanks for the tip!

@kaisecheng and I discussed the config.reload.automatic: true setting and we decided to keep the implementation as is - set the config.reload.automatic setting to true to allow pipeline changes to be reacted upon automatically, but we will note this in the documentation, and explain how to turn off automatic reloads.

@robbavey robbavey requested review from pebrc and barkbay April 20, 2023 21:31
@pebrc
Copy link
Collaborator

pebrc commented Apr 21, 2023

buildkite test this -f p=gke,t=TestLogstashPipelineReload -m s=8.7.0

Copy link
Collaborator

@pebrc pebrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

pkg/controller/logstash/driver.go Outdated Show resolved Hide resolved
pkg/controller/logstash/initcontainer.go Outdated Show resolved Hide resolved
)
},
})
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

diff --git a/pkg/controller/common/reconciler/secret.go b/pkg/controller/common/reconciler/secret.go
index 0b6026f87..50004fd80 100644
--- a/pkg/controller/common/reconciler/secret.go
+++ b/pkg/controller/common/reconciler/secret.go
@@ -30,11 +30,17 @@ const (
 	SoftOwnerKindLabel      = "eck.k8s.elastic.co/owner-kind"
 )
 
+func WithPostUpdate(f func()) func(p *Params) {
+	return func(p *Params) {
+		p.PostUpdate = f
+	}
+}
+
 // ReconcileSecret creates or updates the actual secret to match the expected one.
 // Existing annotations or labels that are not expected are preserved.
-func ReconcileSecret(ctx context.Context, c k8s.Client, expected corev1.Secret, owner client.Object) (corev1.Secret, error) {
+func ReconcileSecret(ctx context.Context, c k8s.Client, expected corev1.Secret, owner client.Object, opts ...func(*Params)) (corev1.Secret, error) {
 	var reconciled corev1.Secret
-	if err := ReconcileResource(Params{
+	params := Params{
 		Context:    ctx,
 		Client:     c,
 		Owner:      owner,
@@ -54,7 +60,11 @@ func ReconcileSecret(ctx context.Context, c k8s.Client, expected corev1.Secret,
 			reconciled.Annotations = maps.Merge(reconciled.Annotations, expected.Annotations)
 			reconciled.Data = expected.Data
 		},
-	}); err != nil {
+	}
+	for _, opt := range opts {
+		opt(&params)
+	}
+	if err := ReconcileResource(params); err != nil {
 		return corev1.Secret{}, err
 	}
 	return reconciled, nil
diff --git a/pkg/controller/logstash/pipeline.go b/pkg/controller/logstash/pipeline.go
index 6cbfee388..447ed7b8b 100644
--- a/pkg/controller/logstash/pipeline.go
+++ b/pkg/controller/logstash/pipeline.go
@@ -41,7 +41,13 @@ func reconcilePipeline(params Params) error {
 		},
 	}
 
-	if err := reconcileSecretWithFastUpdate(params, expected); err != nil {
+	if _, err := reconciler.ReconcileSecret(params.Context, params.Client, expected, &params.Logstash,
+		reconciler.WithPostUpdate(func() {
+			annotation.MarkPodsAsUpdated(params.Context, params.Client,
+				client.InNamespace(params.Logstash.Namespace),
+				NewLabelSelectorForLogstash(params.Logstash),
+			)
+		})); err != nil {
 		return err
 	}
 	return nil

If we want to reuse the existing secret reconciliation we could add a slice of option functions at the end

Co-authored-by: Peter Brachwitz <peter.brachwitz@gmail.com>
Copy link
Contributor

@thbkrkr thbkrkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Damn, I had reviewed this two days ago but didn't submit it sorry.
Looks good! I spotted a few minor things, nothing blocking to merge.

pkg/controller/logstash/initcontainer.go Outdated Show resolved Hide resolved
pkg/controller/logstash/initcontainer.go Outdated Show resolved Hide resolved
pkg/controller/logstash/initcontainer.go Outdated Show resolved Hide resolved
"github.com/elastic/cloud-on-k8s/v2/pkg/controller/common/labels"
"github.com/elastic/cloud-on-k8s/v2/pkg/controller/common/reconciler"
"github.com/elastic/cloud-on-k8s/v2/pkg/controller/common/tracing"
"github.com/elastic/cloud-on-k8s/v2/pkg/controller/logstash/pipelines"

"github.com/elastic/cloud-on-k8s/v2/pkg/utils/maps"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: group imports (max 3 groups: stdlib / external deps / internal deps).

Comment on lines 50 to 51
// This function reconciles the secret, but then adds a postUpdate step to mark the pods as updated
// to trigger a quicker reload of the updated secret than waiting for the kubelet sync interval to kick in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// This function reconciles the secret, but then adds a postUpdate step to mark the pods as updated
// to trigger a quicker reload of the updated secret than waiting for the kubelet sync interval to kick in
// This function reconciles the secret, but then adds a postUpdate step to mark the pods as updated
// to trigger a quicker reload of the updated secret rather than waiting for the kubelet sync to kick in.

Comment on lines +91 to +94
// We intentionally DO NOT pass the configHash here. We don't want to consider the pipeline definitions in the
// hash of the config to ensure that a pipeline change does not automatically trigger a restart
// of the pod, but allows Logstash's automatic reload of pipelines to take place
if err := reconcilePipeline(params); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we pass the configHash when config.reload.automaticequals false?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good question. I'm erring on the side of 'no' at the moment, but I think this is something that could change after the technical preview depending on feedback.

My reasoning on this is that the false (default) value of non-k8s logstash doesn't react to pipeline changes at all, and to change this semantic to restart logstash completely on pipeline changes feels like very different behaviour.

Thinking about how we could add flexibility, I wonder if we might want to introduce something for ECK here, along the lines of:

config.reload.restart_policy: detected_only|all|none, which would either set config.reload.automatic: true for detected_only, and false for all or none, passing the configHash if the value is all, and not if it is none.

cc @flexitrev, @roaksoax, @jsvd

Co-authored-by: Thibault Richard <thbkrkr@users.noreply.github.com>
@robbavey
Copy link
Member Author

Ready for another round @pebrc @thbkrkr

@robbavey robbavey merged commit f984355 into elastic:feature/logstash Apr 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement Enhancement of existing functionality :logstash
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants