clientv3: only update initReq.rev == 0 with watch revision #7795

heyitsanthony · 2017-04-21T07:32:18Z

Always updating the initReq.rev on watch create will resume from the wrong
revision if initReq is ever nonzero.

xiang90 · 2017-04-21T18:15:08Z

clientv3/watch.go

@@ -615,11 +615,17 @@ func (w *watchGrpcStream) serveSubstream(ws *watcherStream, resumec chan struct{
 					// send first creation event only if requested
 					if ws.initReq.createdNotify {
 						ws.outc <- *wr
+						if ws.initReq.rev == 0 {


add some comments here? (when resume for a disconnection, we start watcher from a known previous revision not from the current revision, so do not update it to header.revision)

xiang90 · 2017-04-21T18:26:21Z

@heyitsanthony How did you find this bug? from a test failure?

heyitsanthony · 2017-04-21T18:27:28Z

@xiang90 from investigating #7709, but likely an unrelated issue.

xiang90 · 2017-04-21T18:29:41Z

clientv3/integration/watch_test.go

+	testWatchResumeCompacted(t, nil, nil)
+}
+
+func TestWatchResumeCompactedFrequentDisconnect(t *testing.T) {


is it possible to develop a dedicated test for the case we fix?

i feel TestWatchResumeCompacted is already hard enough to reason about (it is not deterministic. we have to sub case to care about internally)

It's non-deterministic by nature since it relies on disconnect behavior. If it's going to be deterministic I'd need to add failpoints to the testing infrastructure. If I wrote a separate test it'd be essentially duplicate what's there now.

I can try some ways to do deterministic without failpoints but it'll change the behavior that's being tested. OK, whatever.

xiang90 · 2017-04-21T18:32:03Z

The fix looks good to me.

fanminshi · 2017-04-21T22:10:34Z

This pr fixes my issue as well.

My issue:

send create req on initReq{key:foo, rev:2} and recv create resp back.
update initReq.rev to resp.Header.Revision let's say it is 100.
conn disconnect.
watchGrpcStream resends initReq{key:foo, rev:100}.
wch hangs because watch on rev 100 instead of rev 2.

It would be nice to have a simple test case that cover the above situation.
I can write one in a different pr when this gets merged.

fanminshi · 2017-04-21T22:26:01Z

clientv3/integration/watch_test.go

+	if resp, ok := <-wch; !ok || resp.Header.Revision != 4 {
+		t.Fatalf("got (%v, %v), expected create notification rev=4", resp, ok)
+	}
+	// pause wch


is it possible that before you DropConnections, watch receives and buffers event ("a", rev=3)?

yes, but I tested it without the fix and it fails reliably.

Would the following code fail due to having events buffered?

clus.Members[0].DropConnections() clus.Members[0].PauseConnections() select { case resp, ok := <-wch: t.Fatalf("wch should block, got (%+v, %v)", resp, ok) case <-time.After(100 * time.Millisecond): }

yes, I can make it skip instead of fatal, but CI is typically slower rather than faster

heyitsanthony · 2017-04-21T22:26:05Z

@fanminshi the current test essentially already does that, all that would have to change is:

if _, err := cli.Put(context.TODO(), "a", "4"); err != nil {

instead having:

if _, err := cli.Put(context.TODO(), "b", "4"); err != nil {

Testing based on a hang won't fail fast-- it'll have to wait for the timeout to catch the failure, instead of failing as soon as it retrieving a bad value. I don't understand how a second test would catch any new failure modes since if there's a timeout for some other reason, it'll still fail. Seems redundant.

fanminshi · 2017-04-21T22:35:46Z

@heyitsanthony understood!

xiang90 · 2017-04-21T22:38:32Z

@heyitsanthony LGTM. Thanks!

fanminshi · 2017-04-21T22:41:19Z

lgtm, thanks!

Resetting connections sometimes isn't enough; need to stop/resume accepting connections for some tests while keeping the member up.

…event

Always updating the initReq.rev on watch create will resume from the wrong revision if initReq is ever nonzero.

heyitsanthony added the backport/v3.3 label Apr 21, 2017

heyitsanthony force-pushed the dont-force-initrev branch from 2946c73 to fd7e951 Compare April 21, 2017 16:00

xiang90 reviewed Apr 21, 2017

View reviewed changes

heyitsanthony force-pushed the dont-force-initrev branch from fd7e951 to fad4de4 Compare April 21, 2017 18:21

xiang90 reviewed Apr 21, 2017

View reviewed changes

heyitsanthony force-pushed the dont-force-initrev branch from fad4de4 to 317a75a Compare April 21, 2017 20:46

heyitsanthony force-pushed the dont-force-initrev branch from 317a75a to f9efacb Compare April 21, 2017 22:19

fanminshi reviewed Apr 21, 2017

View reviewed changes

heyitsanthony force-pushed the dont-force-initrev branch from f9efacb to 6fea1c5 Compare April 21, 2017 22:40

heyitsanthony force-pushed the dont-force-initrev branch 3 times, most recently from 98203d4 to cf08d28 Compare April 22, 2017 03:20

Anthony Romano added 3 commits April 21, 2017 20:22

integration: add pause/unpause to client bridge

fe1ce3a

Resetting connections sometimes isn't enough; need to stop/resume accepting connections for some tests while keeping the member up.

clientv3/integration: test watch resume with disconnect before first …

ec47094

…event

clientv3: only update initReq.rev == 0 with creation watch revision

4ab818a

Always updating the initReq.rev on watch create will resume from the wrong revision if initReq is ever nonzero.

heyitsanthony force-pushed the dont-force-initrev branch from cf08d28 to 4ab818a Compare April 22, 2017 03:41

heyitsanthony merged commit 7da4516 into etcd-io:master Apr 22, 2017

heyitsanthony deleted the dont-force-initrev branch April 22, 2017 09:51

gyuho mentioned this pull request Apr 25, 2017

clientv3: set current revision to create rev regardless of CreateNotify #7804

Merged

gyuho removed the backport/v3.3 label Nov 22, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clientv3: only update initReq.rev == 0 with watch revision #7795

clientv3: only update initReq.rev == 0 with watch revision #7795

heyitsanthony commented Apr 21, 2017

xiang90 Apr 21, 2017

xiang90 commented Apr 21, 2017

heyitsanthony commented Apr 21, 2017

xiang90 Apr 21, 2017

heyitsanthony Apr 21, 2017

heyitsanthony Apr 21, 2017

xiang90 commented Apr 21, 2017

fanminshi commented Apr 21, 2017

fanminshi Apr 21, 2017

heyitsanthony Apr 21, 2017

fanminshi Apr 21, 2017

heyitsanthony Apr 21, 2017

heyitsanthony commented Apr 21, 2017

fanminshi commented Apr 21, 2017

xiang90 commented Apr 21, 2017

fanminshi commented Apr 21, 2017

clientv3: only update initReq.rev == 0 with watch revision #7795

clientv3: only update initReq.rev == 0 with watch revision #7795

Conversation

heyitsanthony commented Apr 21, 2017

Choose a reason for hiding this comment

xiang90 commented Apr 21, 2017

heyitsanthony commented Apr 21, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xiang90 commented Apr 21, 2017

fanminshi commented Apr 21, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

heyitsanthony commented Apr 21, 2017

fanminshi commented Apr 21, 2017

xiang90 commented Apr 21, 2017

fanminshi commented Apr 21, 2017