clientv3/ordering TestUnresolvableOrderViolation fails #8624

gyuho · 2017-09-28T18:42:54Z

TestUnresolvableOrderViolation goes into infinite leader elections...

https://semaphoreci.com/coreos/etcd/branches/master/builds/2379

xiang90 · 2017-10-04T22:11:41Z

is this already fixed?

gyuho · 2017-10-04T22:16:38Z

Let me try to reproduce.

gyuho · 2017-10-05T14:07:26Z

Still happens https://semaphoreci.com/coreos/etcd/branches/master.

log-ordering.txt

xiang90 · 2017-10-05T17:51:40Z

/cc @lorneli would you like to take a look at this test failure?

lorneli · 2017-10-06T01:16:21Z

@xiang90 Yeah. I'll follow up this weekend.

lorneli · 2017-10-08T17:19:18Z

I haven't reproduced this failure locally, so have to depend on log in semaphore.

Based on log-ordering.txt, TestUnresolvableOrderViolation blocks on putting k-v pair to first member of the cluster, actually not related to clientv3/ordering code. Looks like a integration/etcd-server bug.

cfg := clientv3.Config{
    // ...
}
cli, err := clientv3.New(cfg)
    // ....
cli.SetEndpoints(clus.Members[0].GRPCAddr())
time.Sleep(1 * time.Second)
_, err = cli.Put(ctx, "foo", "bar")  // block here, server doesn't resp

I see many grpc warning lines about connection closing, printed by grpclog . For example, line 5163-5168 in log-ordering.txt(See below). Is there a guarantee that log generated by capnslog and grpclog happends in sequence? Log shows three nodes can't be dialed after etcd server is published, which is a little confusing...

2017-10-05 11:41:31.385241 I | etcdserver: published {Name:3998243288755946524 ClientURLs:[unix://127.0.0.1:2105415012]} to cluster 77a8bf0cb4e3ab13
2017-10-05 11:41:31.389455 I | etcdserver: published {Name:337497985577860612 ClientURLs:[unix://127.0.0.1:2104815012]} to cluster 77a8bf0cb4e3ab13
2017-10-05 11:41:31.393393 I | etcdserver: published {Name:3210302211912965408 ClientURLs:[unix://127.0.0.1:2105015012]} to cluster 77a8bf0cb4e3ab13
2017-10-05 11:41:31.393679 I | etcdserver: published {Name:3123809328254117252 ClientURLs:[unix://127.0.0.1:2105215012]} to cluster 77a8bf0cb4e3ab13
2017-10-05 11:41:31.399699 I | etcdserver: published {Name:1904611785361370535 ClientURLs:[unix://127.0.0.1:2105615012]} to cluster 77a8bf0cb4e3ab13
2017-10-05 11:41:31.414984 I | etcdserver: setting up the initial cluster version to 3.2
2017-10-05 11:41:31.423877 N | etcdserver/membership: set the initial cluster version to 3.2
2017-10-05 11:41:31.425623 N | etcdserver/membership: set the initial cluster version to 3.2
2017-10-05 11:41:31.430211 N | etcdserver/membership: set the initial cluster version to 3.2
2017-10-05 11:41:31.431297 N | etcdserver/membership: set the initial cluster version to 3.2
2017-10-05 11:41:31.432081 N | etcdserver/membership: set the initial cluster version to 3.2
WARNING: 2017/10/05 11:41:31 Failed to dial localhost:31238093282541172520: grpc: the connection is closing; please retry.
WARNING: 2017/10/05 11:41:31 Failed to dial localhost:31238093282541172520: grpc: the connection is closing; please retry.
WARNING: 2017/10/05 11:41:31 Failed to dial localhost:39982432887559465240: grpc: the connection is closing; please retry.
WARNING: 2017/10/05 11:41:31 Failed to dial localhost:39982432887559465240: grpc: the connection is closing; please retry.
WARNING: 2017/10/05 11:41:31 Failed to dial localhost:19046117853613705350: grpc: the connection is closing; please retry.
WARNING: 2017/10/05 11:41:31 Failed to dial localhost:19046117853613705350: grpc: the connection is closing; please retry.

xiang90 · 2017-10-09T03:17:24Z

probably we need to investigate why there is the 1 second sleep. it seems pretty random and arbitrary.

xiang90 · 2017-10-09T03:18:26Z

/cc @mangoslicer

mangoslicer · 2017-10-09T17:32:53Z

The 1 second sleep was to ensure that the endpoint was set to the first member. I'll try to reproduce the error locally and see if changing the 1 second delay or not setting the endpoint to the first member changes anything.

mkumatag · 2017-10-10T13:26:35Z

This is failing consistently in ppc64le platform - https://jenkins-etcd-public.prod.coreos.systems/job/etcd-ci-ppc64/

mkumatag · 2017-10-11T15:11:18Z

@gyuho I still see tests are failing, any idea why this issue is closed.?

gyuho · 2017-10-11T16:00:39Z

@mkumatag client integration tests aren't stable yet.
We are fixing those failures now with highest priority (#8678 and #8677).

Sorry!

gyuho added the area/testing label Sep 28, 2017

gyuho changed the title ~~clientv3/ordering unit test fails, TestUnresolvableOrderViolation~~ clientv3/ordering TestUnresolvableOrderViolation fails Oct 3, 2017

xiang90 added this to the v3.4.0 milestone Oct 6, 2017

gyuho mentioned this issue Oct 9, 2017

clientv3/balancer: handle network partition in health check #8669

Merged

gyuho self-assigned this Oct 9, 2017

gyuho mentioned this issue Oct 10, 2017

clientv3: reset unhealthy on updateAddrs #8674

Merged

gyuho closed this as completed Oct 10, 2017

gyuho mentioned this issue Oct 13, 2017

clientv3/ordering TestUnresolvableOrderViolation fails (after 'unhealthy' patch) #8694

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clientv3/ordering TestUnresolvableOrderViolation fails #8624

clientv3/ordering TestUnresolvableOrderViolation fails #8624

gyuho commented Sep 28, 2017 •

edited

Loading

xiang90 commented Oct 4, 2017

gyuho commented Oct 4, 2017

gyuho commented Oct 5, 2017 •

edited

Loading

xiang90 commented Oct 5, 2017

lorneli commented Oct 6, 2017

lorneli commented Oct 8, 2017 •

edited

Loading

xiang90 commented Oct 9, 2017

xiang90 commented Oct 9, 2017

mangoslicer commented Oct 9, 2017

mkumatag commented Oct 10, 2017

mkumatag commented Oct 11, 2017 •

edited

Loading

gyuho commented Oct 11, 2017 •

edited

Loading

clientv3/ordering TestUnresolvableOrderViolation fails #8624

clientv3/ordering TestUnresolvableOrderViolation fails #8624

Comments

gyuho commented Sep 28, 2017 • edited Loading

xiang90 commented Oct 4, 2017

gyuho commented Oct 4, 2017

gyuho commented Oct 5, 2017 • edited Loading

xiang90 commented Oct 5, 2017

lorneli commented Oct 6, 2017

lorneli commented Oct 8, 2017 • edited Loading

xiang90 commented Oct 9, 2017

xiang90 commented Oct 9, 2017

mangoslicer commented Oct 9, 2017

mkumatag commented Oct 10, 2017

mkumatag commented Oct 11, 2017 • edited Loading

gyuho commented Oct 11, 2017 • edited Loading

gyuho commented Sep 28, 2017 •

edited

Loading

gyuho commented Oct 5, 2017 •

edited

Loading

lorneli commented Oct 8, 2017 •

edited

Loading

mkumatag commented Oct 11, 2017 •

edited

Loading

gyuho commented Oct 11, 2017 •

edited

Loading