Add cron job that does defrag without considering threshold #37

walle · 2022-06-13T13:41:41Z

It can be beneficial to defrag even if the threshold is not yet met, so
we add a command line flag that can take the values hourly, daily or
weekly and will force a defrag at this interval. The default value is
set to weekly.
The implementation is not robust, defrags can be missed due to
restarting etc.

A new failure scenario is added as we validate the interval and fail to
start if the value is invalid.

It can be beneficial to defrag even if the threshold is not yet met, so we add a command line flag that can take the values hourly, daily or weekly and will force a defrag at this interval. The default value is set to weekly. The implementation is not robust, defrags can be missed due to restarting etc. A new failure scenario is added as we validate the interval and fail to start if the value is invalid.

Mojachieee

I've left a few comments about context's and tests, but it looks good overall

Mojachieee · 2022-06-13T15:08:40Z

controllers/etcdcluster_controller.go

@@ -1025,6 +1048,14 @@ func nextOutdatedPeer(cluster *etcdv1alpha1.EtcdCluster, peers *etcdv1alpha1.Etc
 	return nil
 }

+func scheduleMapKeyFor(cluster *etcdv1alpha1.EtcdCluster) string {
+	return string(cluster.UID)


nit: it might be worth making this string(cluster.UID) + defrag.

Just so it's clear what this key does and to stop us accidentally overlapping keys if we add more cron jobs in the future

Good idea, will add it.

Mojachieee · 2022-06-13T15:13:57Z

controllers/etcdcluster_controller.go

-func (r *EtcdClusterReconciler) defragCronJob(log logr.Logger, cluster *etcdv1alpha1.EtcdCluster, members *[]etcd.Member) {
-	ctx, cancel := context.WithTimeout(context.Background(), time.Second*25)
+func (r *EtcdClusterReconciler) defragWithThresholdCronJob(log logr.Logger, cluster *etcdv1alpha1.EtcdCluster) {
+	ctx, cancel, etcdClient, members, ok := r.setupDefragDeps(log, cluster)


I think it's a bit cleaner to create both the required context's outside of setupDefragDeps, so it's really clear what context is being used where and what timeouts are set.

It would also be nice to add a comment on why we need a new context (i.e "we can't use reconciles ctx, as it will have likely expired when this cronjob is ran").

I agree that it was a bit strange to return the ctx, cancel from the method (also I think there was a bug with defer cancel() as well). I've moved the creation of the defrag context to each method https://github.com/storageos/etcd-cluster-operator/pull/37/files#diff-4d256d176300fb45905cd320c72758685ad4f268407980cfd0564bb59504bd26R1444 I did not move the tls context out though, imo it belongs in the method as a implementation detail. But happy to move it outside and pass it in if you think it's clearer.

Mojachieee · 2022-06-13T15:14:35Z

controllers/etcdcluster_controller.go

+}
+
+func (r *EtcdClusterReconciler) defragCronJob(log logr.Logger, cluster *etcdv1alpha1.EtcdCluster) {
+	ctx, cancel, etcdClient, members, ok := r.setupDefragDeps(log, cluster)


Same point about creating the context's outside of setupDefragDeps

Mojachieee · 2022-06-13T15:14:59Z

controllers/etcdcluster_controller.go

+	defer cancel()
+	defer etcdClient.Close()
+
+	ctx = context.Background()


Here we overwrite the ctx we're meant to be using, which means we're not setting a timeout on our defrag call

Mojachieee · 2022-06-13T15:21:44Z

internal/interval/validate_test.go

+		t.Run(tc.name, func(t *testing.T) {
+			out, err := ValidateForDefrag(tc.input)
+
+			if tc.err != nil && err == nil {


I would recommend using testify's assertions here
https://github.com/stretchr/testify

assert.Equal(t, tc.err, err)

The v1.6.1 version did not support wrapped errors, I updated the library to v.1.7.2 which has methods for checking wrapped errors. That said, the methods do not handle both nil and error cases in the same method so I added a small helper method https://github.com/storageos/etcd-cluster-operator/pull/37/files#diff-f4cb5f477ae509d72e01582438b6e1b118c6695ac2287f765d3e0f10c19460c3R31

Mojachieee · 2022-06-13T15:21:51Z

internal/interval/validate_test.go

+			if tc.err != nil && err == nil {
+				t.Errorf("expected an error")
+			}
+			if err != nil && tc.err == nil {


Same point about testify assertions here

Mojachieee · 2022-06-13T15:21:55Z

internal/interval/validate_test.go

+				}
+			}
+
+			if out != tc.output {


Same point about testify assertions here

Mojachieee · 2022-06-13T15:22:36Z

main.go

@@ -91,13 +99,21 @@ func main() {
 	// See https://github.com/improbable-eng/etcd-cluster-operator/issues/171
 	ctrl.SetLogger(zap.New(zap.UseDevMode(true)))

+	validatedInterval, err := interval.ValidateForDefrag(defragWithoutThresholdInterval)
+	if err != nil {
+		setupLog.Error(err, "invalid defrag-without-threshold-interval configuration")


It might be worth printing out the supplied value in this error message

It is part of the error we are getting e.g.

ERROR setup invalid defrag-without-threshold-interval configuration {"error": "invalid interval: every 5s"}

Should we move it up to log it as key:value here as well?

Ah yep, I misread.

Ignore me :)

Update the version of the testify library to v1.7.2 (from v1.6.1).

Mojachieee

LGTM - except the one point raised

Mojachieee · 2022-06-14T11:13:21Z

controllers/etcdcluster_controller.go

+	// create a new context with a long timeout for the defrag operation.
+	ctx, cancel := context.WithTimeout(context.Background(), defragTimeout)
+	defer cancel()
+
 	err := defragger.Defrag(ctx, members, etcdClient, log)
 	if err != nil {
 		log.Error(err, "failed to defrag")
 	}
 }

 func (r *EtcdClusterReconciler) setupDefragDeps(log logr.Logger, cluster *etcdv1alpha1.EtcdCluster) (


Might be worth adding a comment that "it's the callers responsibility to close the return etcd client"

Mojachieee

🚢

walle requested a review from a team as a code owner June 13, 2022 13:41

Mojachieee reviewed Jun 13, 2022

View reviewed changes

walle added 4 commits June 13, 2022 17:39

Do not overwrite context

cbb6ab1

Update map key function to include label

871d992

Move defrag context out from setup deps method

383b519

Use assert library in test

35ead41

Update the version of the testify library to v1.7.2 (from v1.6.1).

Mojachieee previously approved these changes Jun 14, 2022

View reviewed changes

Add comment about caller responsibility

1a05b8d

walle dismissed Mojachieee’s stale review via 1a05b8d June 14, 2022 11:28

Mojachieee self-requested a review June 14, 2022 11:29

Mojachieee approved these changes Jun 14, 2022

View reviewed changes

walle merged commit 07c92ea into main Jun 14, 2022

walle deleted the add-defrag-without-threshold-cron-job branch June 14, 2022 12:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cron job that does defrag without considering threshold #37

Add cron job that does defrag without considering threshold #37

walle commented Jun 13, 2022

Mojachieee left a comment

Mojachieee Jun 13, 2022

walle Jun 13, 2022

Mojachieee Jun 13, 2022

walle Jun 14, 2022

Mojachieee Jun 13, 2022

Mojachieee Jun 13, 2022

Mojachieee Jun 13, 2022

walle Jun 14, 2022

Mojachieee Jun 13, 2022

Mojachieee Jun 13, 2022

Mojachieee Jun 13, 2022

walle Jun 13, 2022

Mojachieee Jun 13, 2022

Mojachieee left a comment

Mojachieee Jun 14, 2022

Mojachieee left a comment

Add cron job that does defrag without considering threshold #37

Add cron job that does defrag without considering threshold #37

Conversation

walle commented Jun 13, 2022

Mojachieee left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Mojachieee left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Mojachieee left a comment

Choose a reason for hiding this comment