Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

es-monitoring-association controller should not update the status when no association #4985

Closed
thbkrkr opened this issue Oct 21, 2021 · 4 comments · Fixed by #4986
Closed
Labels
>bug Something isn't working

Comments

@thbkrkr
Copy link
Contributor

thbkrkr commented Oct 21, 2021

Deploy the quickstart, you will see this event:

> k describe es quickstart | tail -1
  Normal  AssociationStatusChange  5m10s  es-monitoring-association-controller  Association status changed from [] to []

The es-monitoring association controller updated the Association status because the status changed, from no association to no association. When we compare if the status has changed:

if !reflect.DeepEqual(oldStatus, newStatus) {

We get that the statuses are not equal because:

It seems harmless and not to be a problem, though it remains totally useless and pollutes events for nothing.

And now is where it gets interesting. On minishift, this is problematic. I don't know why yet. The es-monitoring association controller enters an infinite loop of reconciliation loop preventing the Elasticsearch resource from being reconciled by the elasticsearch controller.

💥

log of the controllers
2021-10-21T10:08:01.360+0200    INFO    elasticsearch-controller        Starting reconciliation run     {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 5, "namespace": "myproject", "es_name": "abcd"}
2021-10-21T10:08:01.434+0200    INFO    association.es-monitoring-association-controller        Ending reconciliation run       {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 67, "namespace": "myproject", "es-mon_name": "abcd", "took": "121.080041ms"}
2021-10-21T10:08:01.434+0200    INFO    association.es-monitoring-association-controller        Starting reconciliation run     {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 68, "namespace": "myproject", "es-mon_name": "abcd"}
2021-10-21T10:08:01.463+0200    INFO    association.es-monitoring-association-controller        Ending reconciliation run       {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 68, "namespace": "myproject", "es-mon_name": "abcd", "took": "29.196069ms"}
2021-10-21T10:08:01.463+0200    INFO    association.es-monitoring-association-controller        Starting reconciliation run     {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 69, "namespace": "myproject", "es-mon_name": "abcd"}
2021-10-21T10:08:01.511+0200    INFO    association.es-monitoring-association-controller        Ending reconciliation run       {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 69, "namespace": "myproject", "es-mon_name": "abcd", "took": "47.805473ms"}
2021-10-21T10:08:01.511+0200    INFO    association.es-monitoring-association-controller        Starting reconciliation run     {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 70, "namespace": "myproject", "es-mon_name": "abcd"}
2021-10-21T10:08:01.565+0200    INFO    association.es-monitoring-association-controller        Ending reconciliation run       {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 70, "namespace": "myproject", "es-mon_name": "abcd", "took": "53.831072ms"}
2021-10-21T10:08:01.565+0200    INFO    association.es-monitoring-association-controller        Starting reconciliation run     {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 71, "namespace": "myproject", "es-mon_name": "abcd"}
2021-10-21T10:08:01.614+0200    INFO    association.es-monitoring-association-controller        Ending reconciliation run       {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 71, "namespace": "myproject", "es-mon_name": "abcd", "took": "49.01398ms"}
2021-10-21T10:08:01.614+0200    INFO    association.es-monitoring-association-controller        Starting reconciliation run     {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 72, "namespace": "myproject", "es-mon_name": "abcd"}
2021-10-21T10:08:01.666+0200    INFO    association.es-monitoring-association-controller        Ending reconciliation run       {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 72, "namespace": "myproject", "es-mon_name": "abcd", "took": "51.674537ms"}
2021-10-21T10:08:01.666+0200    INFO    association.es-monitoring-association-controller        Starting reconciliation run     {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 73, "namespace": "myproject", "es-mon_name": "abcd"}
2021-10-21T10:08:01.707+0200    INFO    driver  Creating master node    {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "namespace": "myproject", "statefulset_name": "abcd-es-master", "actualReplicas": 0, "targetReplicas": 1}
2021-10-21T10:08:01.708+0200    INFO    zen2    Setting `cluster.initial_master_nodes`  {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "namespace": "myproject", "es_name": "abcd", "cluster.initial_master_nodes": "abcd-es-master-0"}
2021-10-21T10:08:01.711+0200    INFO    association.es-monitoring-association-controller        Ending reconciliation run       {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 73, "namespace": "myproject", "es-mon_name": "abcd", "took": "44.505225ms"}
2021-10-21T10:08:01.711+0200    INFO    association.es-monitoring-association-controller        Starting reconciliation run     {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 74, "namespace": "myproject", "es-mon_name": "abcd"}
2021-10-21T10:08:01.822+0200    INFO    association.es-monitoring-association-controller        Ending reconciliation run       {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 74, "namespace": "myproject", "es-mon_name": "abcd", "took": "111.273826ms"}
2021-10-21T10:08:01.822+0200    INFO    association.es-monitoring-association-controller        Starting reconciliation run     {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 75, "namespace": "myproject", "es-mon_name": "abcd"}
2021-10-21T10:08:01.866+0200    INFO    elasticsearch-controller        Ending reconciliation run       {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 5, "namespace": "myproject", "es_name": "abcd", "took": "505.699683ms"}
2021-10-21T10:08:01.866+0200    INFO    elasticsearch-controller        Starting reconciliation run     {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 6, "namespace": "myproject", "es_name": "abcd"}
2021-10-21T10:08:01.926+0200    INFO    association.es-monitoring-association-controller        Ending reconciliation run       {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 75, "namespace": "myproject", "es-mon_name": "abcd", "took": "103.379065ms"}
2021-10-21T10:08:01.926+0200    INFO    association.es-monitoring-association-controller        Starting reconciliation run     {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 76, "namespace": "myproject", "es-mon_name": "abcd"}
2021-10-21T10:08:01.965+0200    INFO    association.es-monitoring-association-controller        Ending reconciliation run       {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 76, "namespace": "myproject", "es-mon_name": "abcd", "took": "38.896823ms"}
2021-10-21T10:08:01.965+0200    INFO    association.es-monitoring-association-controller        Starting reconciliation run     {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 77, "namespace": "myproject", "es-mon_name": "abcd"}
2021-10-21T10:08:02.012+0200    INFO    association.es-monitoring-association-controller        Ending reconciliation run       {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 77, "namespace": "myproject", "es-mon_name": "abcd", "took": "47.613014ms"}
2021-10-21T10:08:02.012+0200    INFO    association.es-monitoring-association-controller        Starting reconciliation run     {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 78, "namespace": "myproject", "es-mon_name": "abcd"}
2021-10-21T10:08:02.061+0200    INFO    association.es-monitoring-association-controller        Ending reconciliation run       {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 78, "namespace": "myproject", "es-mon_name": "abcd", "took": "48.717603ms"}
2021-10-21T10:08:02.061+0200    INFO    association.es-monitoring-association-controller        Starting reconciliation run     {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 79, "namespace": "myproject", "es-mon_name": "abcd"}
2021-10-21T10:08:02.114+0200    INFO    association.es-monitoring-association-controller        Ending reconciliation run       {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 79, "namespace": "myproject", "es-mon_name": "abcd", "took": "52.550053ms"}
2021-10-21T10:08:02.114+0200    INFO    association.es-monitoring-association-controller        Starting reconciliation run     {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 80, "namespace": "myproject", "es-mon_name": "abcd"}
2021-10-21T10:08:02.161+0200    INFO    association.es-monitoring-association-controller        Ending reconciliation run       {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 80, "namespace": "myproject", "es-mon_name": "abcd", "took": "46.949866ms"}
2021-10-21T10:08:02.161+0200    INFO    association.es-monitoring-association-controller        Starting reconciliation run     {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 81, "namespace": "myproject", "es-mon_name": "abcd"}
2021-10-21T10:08:02.210+0200    INFO    association.es-monitoring-association-controller        Ending reconciliation run       {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 81, "namespace": "myproject", "es-mon_name": "abcd", "took": "49.697927ms"}
2021-10-21T10:08:02.210+0200    INFO    association.es-monitoring-association-controller        Starting reconciliation run     {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 82, "namespace": "myproject", "es-mon_name": "abcd"}
2021-10-21T10:08:02.244+0200    INFO    driver  Creating master node    {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "namespace": "myproject", "statefulset_name": "abcd-es-master", "actualReplicas": 0, "targetReplicas": 1}
2021-10-21T10:08:02.244+0200    INFO    zen2    Setting `cluster.initial_master_nodes`  {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "namespace": "myproject", "es_name": "abcd", "cluster.initial_master_nodes": "abcd-es-master-0"}
2021-10-21T10:08:02.261+0200    INFO    association.es-monitoring-association-controller        Ending reconciliation run       {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 82, "namespace": "myproject", "es-mon_name": "abcd", "took": "50.786008ms"}
2021-10-21T10:08:02.261+0200    INFO    association.es-monitoring-association-controller        Starting reconciliation run     {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 83, "namespace": "myproject", "es-mon_name": "abcd"}
2021-10-21T10:08:02.363+0200    INFO    association.es-monitoring-association-controller        Ending reconciliation run       {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 83, "namespace": "myproject", "es-mon_name": "abcd", "took": "101.882346ms"}
2021-10-21T10:08:02.363+0200    INFO    association.es-monitoring-association-controller        Starting reconciliation run     {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 84, "namespace": "myproject", "es-mon_name": "abcd"}
2021-10-21T10:08:02.411+0200    INFO    elasticsearch-controller        Ending reconciliation run       {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 6, "namespace": "myproject", "es_name": "abcd", "took": "544.430907ms"}
2021-10-21T10:08:02.411+0200    INFO    elasticsearch-controller        Starting reconciliation run     {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 7, "namespace": "myproject", "es_name": "abcd"}
2021-10-21T10:08:02.461+0200    INFO    association.es-monitoring-association-controller        Ending reconciliation run       {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 84, "namespace": "myproject", "es-mon_name": "abcd", "took": "97.690607ms"}
2021-10-21T10:08:02.461+0200    INFO    association.es-monitoring-association-controller        Starting reconciliation run     {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 85, "namespace": "myproject", "es-mon_name": "abcd"}
2021-10-21T10:08:02.511+0200    INFO    association.es-monitoring-association-controller        Ending reconciliation run       {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 85, "namespace": "myproject", "es-mon_name": "abcd", "took": "50.284633ms"}
2021-10-21T10:08:02.511+0200    INFO    association.es-monitoring-association-controller        Starting reconciliation run     {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 86, "namespace": "myproject", "es-mon_name": "abcd"}
2021-10-21T10:08:02.562+0200    INFO    association.es-monitoring-association-controller        Ending reconciliation run       {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 86, "namespace": "myproject", "es-mon_name": "abcd", "took": "50.861007ms"}
2021-10-21T10:08:02.562+0200    INFO    association.es-monitoring-association-controller        Starting reconciliation run     {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 87, "namespace": "myproject", "es-mon_name": "abcd"}
2021-10-21T10:08:02.613+0200    INFO    association.es-monitoring-association-controller        Ending reconciliation run       {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 87, "namespace": "myproject", "es-mon_name": "abcd", "took": "50.56976ms"}
2021-10-21T10:08:02.613+0200    INFO    association.es-monitoring-association-controller        Starting reconciliation run     {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 88, "namespace": "myproject", "es-mon_name": "abcd"}
2021-10-21T10:08:02.661+0200    INFO    association.es-monitoring-association-controller        Ending reconciliation run       {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 88, "namespace": "myproject", "es-mon_name": "abcd", "took": "48.048538ms"}
2021-10-21T10:08:02.661+0200    INFO    association.es-monitoring-association-controller        Starting reconciliation run     {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 89, "namespace": "myproject", "es-mon_name": "abcd"}
2021-10-21T10:08:02.711+0200    INFO    association.es-monitoring-association-controller        Ending reconciliation run       {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 89, "namespace": "myproject", "es-mon_name": "abcd", "took": "50.446649ms"}
2021-10-21T10:08:02.711+0200    INFO    association.es-monitoring-association-controller        Starting reconciliation run     {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 90, "namespace": "myproject", "es-mon_name": "abcd"}
2021-10-21T10:08:02.731+0200    INFO    driver  Creating master node    {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "namespace": "myproject", "statefulset_name": "abcd-es-master", "actualReplicas": 0, "targetReplicas": 1}
2021-10-21T10:08:02.731+0200    INFO    zen2    Setting `cluster.initial_master_nodes`  {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "namespace": "myproject", "es_name": "abcd", "cluster.initial_master_nodes": "abcd-es-master-0"}
2021-10-21T10:08:02.765+0200    INFO    association.es-monitoring-association-controller        Ending reconciliation run       {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 90, "namespace": "myproject", "es-mon_name": "abcd", "took": "53.144913ms"}
2021-10-21T10:08:02.765+0200    INFO    association.es-monitoring-association-controller        Starting reconciliation run     {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 91, "namespace": "myproject", "es-mon_name": "abcd"}
2021-10-21T10:08:02.870+0200    INFO    association.es-monitoring-association-controller        Ending reconciliation run       {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 91, "namespace": "myproject", "es-mon_name": "abcd", "took": "105.735779ms"}
2021-10-21T10:08:02.871+0200    INFO    association.es-monitoring-association-controller        Starting reconciliation run     {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 92, "namespace": "myproject", "es-mon_name": "abcd"}
2021-1i0-21T10:08:02.911+0200    INFO    elasticsearch-controller        Ending reconciliation run       {"service.version": "2.0.0-SNAPSHOT+af14eaf8", "iteration": 7, "namespace": "myproject", "es_name": "abcd", "took": "499.955276ms"}

In addition to fixing the issue, we should check that wherever we use reflect.DeepEqual with variables of type Map, we are protected from this.

@thbkrkr
Copy link
Contributor Author

thbkrkr commented Nov 3, 2021

I spent a little time trying to figure out the root cause of this bug but got nothing but I wanted to share what I saw as it is quite confusing.

I kept only the EsMonitoring association controller:

diff --git a/cmd/manager/main.go b/cmd/manager/main.go
index 6fb174105..d0913476d 100644
--- a/cmd/manager/main.go
+++ b/cmd/manager/main.go
@@ -569,7 +569,7 @@ func startOperator(ctx context.Context) error {
                accessReviewer = rbac.NewPermissiveAccessReviewer()
        }
 
-       if err := registerControllers(mgr, params, accessReviewer); err != nil {
+       if err := associationctl.AddEsMonitoring(mgr, accessReviewer, params); err != nil {

And I replaced the Reconcile(ctx, req) method of the association controller with this (get the object and print its resource version, update its status, get the object again and print its resource version):

func (r *Reconciler) Reconcile(ctx context.Context, request reconcile.Request) (reconcile.Result, error) {
	uuid := uuid2.NewUUID()
	associated := r.AssociatedObjTemplate()

	if err := r.Client.Get(context.Background(), request.NamespacedName, associated); err != nil {
		return reconcile.Result{}, err
	}
	fmt.Println("------")
	fmt.Println("@@ 1 GetResourceVersion", uuid, " - ", associated.GetResourceVersion())

	if err := r.Status().Update(ctx, associated); err != nil {
		return reconcile.Result{}, err
	}

	if err := r.Client.Get(context.Background(), request.NamespacedName, associated); err != nil {
		return reconcile.Result{}, err
	}
	fmt.Println("@@ 2 GetResourceVersion", uuid, " - ", associated.GetResourceVersion())

	return reconcile.Result{}, nil
}

Here is the result:

@@ 1 GetResourceVersion cdaa3ee2-1f3e-40b5-b026-830be061379a  -  196410
@@ 2 GetResourceVersion cdaa3ee2-1f3e-40b5-b026-830be061379a  -  196661                                    
------                                                                                                     
@@ 1 GetResourceVersion e222ed06-585e-4dc3-be72-f59dd3a9f5bf  -  196661
@@ 2 GetResourceVersion e222ed06-585e-4dc3-be72-f59dd3a9f5bf  -  196662                                    
------                                                                                                     
@@ 1 GetResourceVersion 92ecf984-9068-4110-9f36-6320e5a7f5f6  -  196662
@@ 2 GetResourceVersion 92ecf984-9068-4110-9f36-6320e5a7f5f6  -  196662                                    
------                                                                                                     
@@ 1 GetResourceVersion 445bb0f3-51af-4c79-9228-625ff8531231  -  196663
@@ 2 GetResourceVersion 445bb0f3-51af-4c79-9228-625ff8531231  -  196664                                    
------                                                                                                     
@@ 1 GetResourceVersion 3f1effc3-aef4-42e5-90a0-471f56c996c3  -  196664
@@ 2 GetResourceVersion 3f1effc3-aef4-42e5-90a0-471f56c996c3  -  196665                                    
------                                                                                                     
@@ 1 GetResourceVersion 8d19d2ef-e6ef-4200-909b-daa2f610f42d  -  196665
@@ 2 GetResourceVersion 8d19d2ef-e6ef-4200-909b-daa2f610f42d  -  196666                                    
------                                                                                                     
@@ 1 GetResourceVersion 1fb15338-3f7f-42cd-9fac-34c0e9da85cc  -  196666
@@ 2 GetResourceVersion 1fb15338-3f7f-42cd-9fac-34c0e9da85cc  -  196667                                    
------                                                                                                     
@@ 1 GetResourceVersion 1037b4b0-b1b9-47b2-95b9-f1ba7b35c9b5  -  196667
@@ 2 GetResourceVersion 1037b4b0-b1b9-47b2-95b9-f1ba7b35c9b5  -  196667                                    
------                                                                                                     
@@ 1 GetResourceVersion 9a17b8c1-0df3-4f26-8593-130089cfe40d  -  196668
@@ 2 GetResourceVersion 9a17b8c1-0df3-4f26-8593-130089cfe40d  -  196669                                    
------                                                                                                     
@@ 1 GetResourceVersion 93f2c37d-14be-4f67-99a6-24b2227e0e7b  -  196669
@@ 2 GetResourceVersion 93f2c37d-14be-4f67-99a6-24b2227e0e7b  -  196670                                    
------                                                                                                     
@@ 1 GetResourceVersion a8a158a4-5376-49d6-a9e9-8eb064ea3561  -  196670
@@ 2 GetResourceVersion a8a158a4-5376-49d6-a9e9-8eb064ea3561  -  196672

The resource version appears to be updated from 1 in most cases, sometimes from 2 and sometimes from zero, which may indicate that something else has updated the object? Honestly, I don't know what to think.

To compare, on GKE for example, the resource version is never incremented and I get only once:

@@ 1 GetResourceVersion 728dc619-6200-40d6-8ddb-54a65a0aaab2  -  130107251
@@ 2 GetResourceVersion 728dc619-6200-40d6-8ddb-54a65a0aaab2  -  130107251

@barkbay
Copy link
Contributor

barkbay commented Nov 3, 2021

To compare, on GKE for example, the resource version is never incremented and I get only once

I think this is the expected behaviour, updating a subresource should not update generation or resourceVersion. It should still trigger watchers though.

The way the issue has been fixed (using len to compare the actual and the expected status) reminds me this kind of issue: kubernetes/kubernetes#43203 but I can't tell if OCP 3.11 is affected.

@thbkrkr
Copy link
Contributor Author

thbkrkr commented Nov 3, 2021

OCP 3.11 is not affected, only minishift.

@mgiritli
Copy link

Landed here from https://discuss.elastic.co/t/elastic-operator-not-creating-elasticsearch-in-openshift-3-11/286847/3

This is happening on bare metal k8s and eventually we are unable to install es. Pod remains in "Init:CrashLoopBackOff" state. Is there any way to go around this before the new release?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants