Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky TestSmoke failure #1048

Closed
thbkrkr opened this issue Jun 11, 2019 · 6 comments
Closed

Flaky TestSmoke failure #1048

thbkrkr opened this issue Jun 11, 2019 · 6 comments
Assignees
Labels
>bug Something isn't working

Comments

@thbkrkr
Copy link
Contributor

thbkrkr commented Jun 11, 2019

Sometimes TestSmoke fails.

21:52:27      --- FAIL: TestSmoke/Kibana_deployment_should_be_set (300.00s)
21:52:27          require.go:794: 
21:52:27              	Error Trace:	testutils.go:42
21:52:27              	Error:      	Received unexpected error:
21:52:27              	            	deployments.apps "kibana-sample-kb" not found
21:52:27              	Test:       	TestSmoke/Kibana_deployment_should_be_set

https://devops-ci.elastic.co/job/cloud-on-k8s-pr/148/console

10:01:47          require.go:794: 
10:01:47              	Error Trace:	testutils.go:42
10:01:47              	Error:      	Received unexpected error:
10:01:47              	            	0 addresses found for endpoint kibana-sample-kb-http, expected 1
10:01:47              	Test:       	TestSmoke/Services_should_have_endpoints

https://devops-ci.elastic.co/job/cloud-on-k8s-pr/149/console

@thbkrkr thbkrkr changed the title TestSmoke failure Flaky TestSmoke failure Jun 11, 2019
@thbkrkr thbkrkr self-assigned this Jun 11, 2019
@thbkrkr thbkrkr added the >bug Something isn't working label Jun 11, 2019
@thbkrkr
Copy link
Contributor Author

thbkrkr commented Jun 11, 2019

Looks like Kibana is not ready because ES is not ready. ES logs:

{"type": "server", "timestamp": "2019-06-11T10:01:43,288+0000", "level": "WARN", 
	"component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "elasticsearch-sample", 
	"node.name": "elasticsearch-sample-es-5bfsk74wcn",  
	"message": "master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and [cluster.initial_master_nodes] is empty on this node: have discovered
	[{elasticsearch-sample-es-2d974hzqt9}{TwjUUrubQO-geZA5WKvVXg}{ivV_oUkiQOqSbLS_b0twuQ}{10.20.5.17}{10.20.5.17:9300}{ml.machine_memory=27394752512, ml.max_open_jobs=20, xpack.installed=true}, {elasticsearch-sample-es-9r4mgjd9cw}{ELN1D1uYSmGK1U9fG78bCA}{pHURpUzMSYKmCISaTrTaJw}{10.20.4.16}{10.20.4.16:9300}{ml.machine_memory=27394752512, ml.max_open_jobs=20, xpack.installed=true}]; discovery will continue using [127.0.0.1:9300, 127.0.0.1:9301, 127.0.0.1:9302, 127.0.0.1:9303, 127.0.0.1:9304, 10.20.5.17:9300, 10.20.4.16:9300] from hosts providers and [{elasticsearch-sample-es-5bfsk74wcn}{HDUtj2UxRzeODVWSX2LKJQ}{Lf7dd-5mQLCuKIrNzo2MZA}{10.20.3.18}{10.20.3.18:9300}{ml.machine_memory=27394752512, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 0, last-accepted version 0 in term 0"  }
{"type": "server", "timestamp": "2019-06-11T10:01:48,887+0000", "level": "DEBUG",
	 "component": "o.e.a.a.i.g.TransportGetIndexAction", "cluster.name": "elasticsearch-sample", 
	 "node.name": "elasticsearch-sample-es-5bfsk74wcn",  "message": "no known master node, scheduling a retry"  }
{"type": "server", "timestamp": "2019-06-11T10:01:48,902+0000", "level": "DEBUG", 
	"component": "o.e.a.a.i.t.g.TransportGetIndexTemplatesAction", "cluster.name": "elasticsearch-sample", 
	"node.name": "elasticsearch-sample-es-5bfsk74wcn",  "message": "timed out while retrying [indices:admin/template/get] after failure (timeout [30s])"  }
{"type": "server", "timestamp": "2019-06-11T10:01:48,903+0000", "level": "WARN", 
	"component": "r.suppressed", "cluster.name": "elasticsearch-sample", 
	"node.name": "elasticsearch-sample-es-5bfsk74wcn",  "message": "path: /_template/.kibana_task_manager, params: {name=.kibana_task_manager, filter_path=*.version}" , 
"stacktrace": ["org.elasticsearch.discovery.MasterNotDiscoveredException: null",
"at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$4.onTimeout(TransportMasterNodeAction.java:259) [elasticsearch-7.1.0.jar:7.1.0]",
"at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:322) [elasticsearch-7.1.0.jar:7.1.0]",
"at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:249) [elasticsearch-7.1.0.jar:7.1.0]",
"at org.elasticsearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:555) [elasticsearch-7.1.0.jar:7.1.0]",
"at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:681) [elasticsearch-7.1.0.jar:7.1.0]",
"at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]",
"at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]",
"at java.lang.Thread.run(Thread.java:835) [?:?]"] }
{"type": "server", "timestamp": "2019-06-11T10:01:51,909+0000", "level": "WARN", 
	"component": "r.suppressed", "cluster.name": "elasticsearch-sample", 
	"node.name": "elasticsearch-sample-es-5bfsk74wcn",  "message": "path: /.reporting-*/_search, params: {index=.reporting-*}" , 
"stacktrace": ["org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];",
"at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:191) ~[elasticsearch-7.1.0.jar:7.1.0]",
"at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedRaiseException(ClusterBlocks.java:177) ~[elasticsearch-7.1.0.jar:7.1.0]",
"at org.elasticsearch.action.search.TransportSearchAction.executeSearch(TransportSearchAction.java:466) ~[elasticsearch-7.1.0.jar:7.1.0]",
"at org.elasticsearch.action.search.TransportSearchAction.executeLocalSearch(TransportSearchAction.java:399) ~[elasticsearch-7.1.0.jar:7.1.0]",
"at org.elasticsearch.action.search.TransportSearchAction.lambda$doExecute$3(TransportSearchAction.java:211) ~[elasticsearch-7.1.0.jar:7.1.0]",
"at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61) [elasticsearch-7.1.0.jar:7.1.0]",
...

@thbkrkr
Copy link
Contributor Author

thbkrkr commented Jun 11, 2019

Stranger things:

  • 1/3 master node has initial_master_nodes configured.
  • the initial_master_nodes list contains two pods that does not exist.
> k get pods
NAME                                 READY   STATUS             RESTARTS   AGE
elasticsearch-sample-es-6xlrcwbbqs   1/1     Running            0          13m
elasticsearch-sample-es-dl95qqg4vw   1/1     Running            0          13m
elasticsearch-sample-es-nnhrwckh8j   1/1     Running            0          13m

> k get secret $(po 1)-config -o json | jq '.data["elasticsearch.yml"] | @base64d' -r | y2j | jq .cluster,.discovery
{
  "initial_master_nodes": [
    "elasticsearch-sample-es-6xlrcwbbqs",
    "elasticsearch-sample-es-qsxp9qstv4",
    "elasticsearch-sample-es-vfng5dmkv2"
  ],
  "name": "elasticsearch-sample"
}
{
  "zen": {
    "hosts_provider": "file",
    "minimum_master_nodes": "2"
  }
}

> k get secret $(po 2)-config -o json | jq '.data["elasticsearch.yml"] | @base64d' -r | y2j | jq .cluster,.discovery
{
  "name": "elasticsearch-sample"
}
{
  "zen": {
    "hosts_provider": "file",
    "minimum_master_nodes": "2"
  }
}


> k get secret $(po 3)-config -o json | jq '.data["elasticsearch.yml"] | @base64d' -r | y2j | jq .cluster,.discovery
{
  "name": "elasticsearch-sample"
}
{
  "zen": {
    "hosts_provider": "file",
    "minimum_master_nodes": "2"
}

@thbkrkr
Copy link
Contributor Author

thbkrkr commented Jun 11, 2019

Adding loggers shows a problem where PVC may be stolen by an other pod:

2019-06-11T15:20:55.386+0200    INFO    version7 ClusterInitialMasterNodesEnforcer 
	{"changes": ["elasticsearch-sample-es-j94srzpqnj", "elasticsearch-sample-es-m82zrl25h9", "elasticsearch-sample-es-s92pmqg896"], "states": []}
2019-06-11T15:20:55.747+0200    INFO    driver   Created pod     {"name": "elasticsearch-sample-es-j94srzpqnj", "namespace": "e2e"}
2019-06-11T15:20:55.747+0200    INFO    pvc      Found some orphaned PVC   {"name": "elasticsearch-sample-es-j94srzpqnj-elasticsearch-data"}
2019-06-11T15:20:55.814+0200    INFO    elasticsearch-controller        Recording event {EventType:Normal Reason:Created Message: Created pod elasticsearch-sample-es-j94srzpqnj}
2019-06-11T15:20:55.814+0200    ERROR   kubebuilder.controller  Reconciler error        
	{"controller": "elasticsearch-controller", "request": "e2e/elasticsearch-sample", "error": "pods \"elasticsearch-sample-es-j94srzpqnj\" already exists", "errorCauses": [{"error": "pods \"elasticsearch-sample-es-j94srzpqnj\" already exists"}]}
2019-06-11T15:20:55.814+0200    DEBUG   kubebuilder.manager.events      Normal  
	{"object": {"kind":"Elasticsearch","namespace":"e2e","name":"elasticsearch-sample","uid":"bd5345e5-8c4b-11e9-9d90-42010a840048","apiVersion":"elasticsearch.k8s.elastic.co/v1alpha1","resourceVersion":"1397385"}, "reason": "Created", "message": " Created pod elasticsearch-sample-es-j94srzpqnj"}

2019-06-11T15:20:57.437+0200    INFO    version7 ClusterInitialMasterNodesEnforcer 
	{"changes": ["elasticsearch-sample-es-h22zvtgdsp", "elasticsearch-sample-es-lxk9b8n75v"], "states": ["elasticsearch-sample-es-j94srzpqnj"]}
2019-06-11T15:20:57.726+0200    INFO    driver   Created pod     {"name": "elasticsearch-sample-es-h22zvtgdsp", "namespace": "e2e"}
2019-06-11T15:20:57.726+0200    INFO    pvc      Found some orphaned PVC   {"name": "elasticsearch-sample-es-h22zvtgdsp-elasticsearch-data"}
2019-06-11T15:20:57.793+0200    INFO    elasticsearch-controller        Recording event {EventType:Normal Reason:Created Message: Created pod elasticsearch-sample-es-h22zvtgdsp}
2019-06-11T15:20:57.794+0200    ERROR   kubebuilder.controller  Reconciler error        
	{"controller": "elasticsearch-controller", "request": "e2e/elasticsearch-sample", "error": "pods \"elasticsearch-sample-es-h22zvtgdsp\" already exists", "errorCauses": [{"error": "pods \"elasticsearch-sample-es-h22zvtgdsp\" already exists"}]}
2019-06-11T15:20:57.794+0200    DEBUG   kubebuilder.manager.events      Normal  
	{"object": {"kind":"Elasticsearch","namespace":"e2e","name":"elasticsearch-sample","uid":"bd5345e5-8c4b-11e9-9d90-42010a840048","apiVersion":"elasticsearch.k8s.elastic.co/v1alpha1","resourceVersion":"1397387"}, "reason": "Created", "message": " Created pod elasticsearch-sample-es-h22zvtgdsp"}

2019-06-11T15:20:59.392+0200    INFO    version7 ClusterInitialMasterNodesEnforcer 
	{"changes": ["elasticsearch-sample-es-vrbktpw5x2"], "states": ["elasticsearch-sample-es-h22zvtgdsp", "elasticsearch-sample-es-j94srzpqnj"]}
2019-06-11T15:20:59.704+0200    INFO    driver   Created pod     {"name": "elasticsearch-sample-es-vrbktpw5x2", "namespace": "e2e"}
2019-06-11T15:20:59.705+0200    INFO    elasticsearch-controller        Recording event {EventType:Normal Reason:Created Message: Created pod elasticsearch-sample-es-vrbktpw5x2}
2019-06-11T15:20:59.705+0200    DEBUG   kubebuilder.manager.events      Normal  
	{"object": {"kind":"Elasticsearch","namespace":"e2e","name":"elasticsearch-sample","uid":"bd5345e5-8c4b-11e9-9d90-42010a840048","apiVersion":"elasticsearch.k8s.elastic.co/v1alpha1","resourceVersion":"1397387"}, "reason": "Created", "message": " Created pod elasticsearch-sample-es-vrbktpw5x2"}
2019-06-11T15:21:00.403+0200    INFO    version7 ClusterInitialMasterNodesEnforcer 
	{"changes": [], "states": ["elasticsearch-sample-es-h22zvtgdsp", "elasticsearch-sample-es-j94srzpqnj", "elasticsearch-sample-es-vrbktpw5x2"]}

@thbkrkr
Copy link
Contributor Author

thbkrkr commented Jun 11, 2019

Should be fixed by #1052. I'm testing.

@thbkrkr
Copy link
Contributor Author

thbkrkr commented Jun 11, 2019

#1052 fixes this issue.

@thbkrkr
Copy link
Contributor Author

thbkrkr commented Jun 11, 2019

Closed by #1052.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant