clientv3: correct `nextRev` on receiving progress notification response #15248

ahrtr · 2023-02-06T08:33:16Z

The manual ProgressRequest has issue, it may send inconsistent rev back to client. We should guard the watchRequest.rev from decreasing, otherwise, the client may receive duplicate events.

Confirmed that this PR can fix the duplicated event issue.

Signed-off-by: Benjamin Wang wachao@vmware.com

Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.

ahrtr · 2023-02-06T08:52:54Z

cc @mitake

codecov-commenter · 2023-02-06T09:10:37Z

Codecov Report

Merging #15248 (586eacc) into main (8f54d38) will decrease coverage by 0.18%.
The diff coverage is 100.00%.

❗ Current head 586eacc differs from pull request most recent head a9b508c. Consider uploading reports for the commit a9b508c to get more accurate results

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff             @@
##             main   #15248      +/-   ##
==========================================
- Coverage   74.94%   74.76%   -0.18%     
==========================================
  Files         416      416              
  Lines       34351    34352       +1     
==========================================
- Hits        25744    25683      -61     
- Misses       6981     7029      +48     
- Partials     1626     1640      +14

Flag	Coverage Δ
all	`74.76% <100.00%> (-0.18%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
server/storage/wal/wal.go	`57.45% <100.00%> (+0.07%)`	⬆️
server/proxy/grpcproxy/register.go	`69.76% <0.00%> (-20.94%)`	⬇️
server/storage/mvcc/watchable_store.go	`85.30% <0.00%> (-7.53%)`	⬇️
client/v3/namespace/watch.go	`87.87% <0.00%> (-6.07%)`	⬇️
client/pkg/v3/testutil/leak.go	`62.83% <0.00%> (-4.43%)`	⬇️
server/etcdserver/txn/util.go	`96.42% <0.00%> (-3.58%)`	⬇️
client/v3/concurrency/election.go	`79.68% <0.00%> (-2.35%)`	⬇️
server/etcdserver/api/rafthttp/msgappv2_codec.go	`73.04% <0.00%> (-1.74%)`	⬇️
pkg/proxy/server.go	`61.01% <0.00%> (-1.02%)`	⬇️
server/proxy/grpcproxy/lease.go	`88.23% <0.00%> (-0.91%)`	⬇️
... and 4 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

serathius · 2023-02-06T10:38:41Z

Not sure what's the purpose of this change. Bumping revision to handle manual progress request might have an adverse effect of confusing the client.
Server reporting lower revision is not expected and a strong indicated of something being really wrong. Client should not try to patch it as

I it would be better if client just hard failed and refuse to handle invalid response from server (panic, return error or reset watch).
cc @liggitt @wojtek-t @ptab

ahrtr · 2023-02-06T11:46:38Z

Note I am still in progress of debugging this issue, just marked the PR as draft.

ahrtr · 2023-02-07T03:00:34Z

It turns out to be a clientv3's issue.

When the clientv3 receives a progress notification response, it sets nextRev to the current revision (wr.Header.Revision), it isn't correct. It should set nextRev to wr.Header.Revision + 1. Otherwise, the client side may receive duplicated events when etcdserver restarted for whatever reason.

Let's work with an example,

Assuming the latest revision is 79, and the client side already received the event with rev 79;
etcdserver keeps sending progress notification with 79, and the client side update nextRev to 79 (It's the root cause of this issue);
If the etcdserver crashes for whatever reason, the client side re-sends the watchRequest with nextRev 79;
When the etcdserver starts again, it sends watchResponse starting from rev 79. Accordingly the client side receives the event with rev 79 twice.

Note that it's pretty easy to reproduce this issue using #15235. After applying this PR, it can't be reproduced anymore.

We need to backport this PR to both 3.4 and 3.5. @mitake @ptabor @serathius PTAL.

ahrtr · 2023-02-07T11:18:37Z

@wojtek-t does this issue have any impact on K8s?

serathius · 2023-02-07T14:01:41Z

@ahrtr As we have established, K8s doesn't use manual progress notify so it's not affected.

ahrtr · 2023-02-07T23:22:31Z

@ahrtr As we have established, K8s doesn't use manual progress notify so it's not affected.

This isn't correct. Both manual progress notification and timed progress notification have this issue. Note that it can only happen when etcdserver restarts (crashes and starts again) and the last watch response received by the client side is a progress notification. I already reproduced it using #15235 (with some minor modification). Please see the example workflow in #15248 (comment)

Could you evaluate the impact on K8s? cc @wojtek-t @dims @liggitt @nikhita @neolit123

serathius · 2023-02-08T08:52:01Z

I don't understand. #15235 just reproduces manual progress notification issue. If you did any modifications to it please publish it because again we are playing "she said he said".

I have already added verification for duplicated revisions in #15250 and validated timed progress notification #15257.

ptabor · 2023-02-08T09:14:02Z

client/v3/watch.go

@@ -880,12 +880,13 @@ func (w *watchGrpcStream) serveSubstream(ws *watcherStream, resumec chan struct{
 				}
 			} else {
 				// current progress of watch; <= store revision
-				nextRev = wr.Header.Revision
+				nextRev = wr.Header.Revision + 1


I haven't read this code deeply... but please make sure the 'fragment' field is properly interpreted:

etcd/api/etcdserverpb/rpc.proto

Line 835 in 586eacc

bool fragment = 7 [(versionpb.etcd_version_field)="3.4"];

I assume that in case of fragmented response, there might be multiple responses with the same revision.

It's a good question. The fragments have already been merged before the merged response being dispatched to the serveSubstream. The rough workflow is as below,

Firstly the client side merges all fragments if needed. See watch.go#L644-L648;

dispatch the response. See watch.go#L651;

The serveSubstream forwards the response to the subscriber. See watch.go#L845;

Er... this is going to break Kubernetes, which manually does this already.

agreed, please don't change the meaning of the data returned to offset by one

Actually this change may be correct -- look at line 887, which is the normal case. It adds one.

ok, if this doesn't change the revisions returned to clients

Right. It doesn't change anything being returned to client (again, etcd clientv3's user.)

I would expect test changes to come in the same PR as the fix -- like did this not break any unit tests? That would be very surprising. (but I'm not familiar with etcd practice)

I would expect test changes to come in the same PR as the fix -- like did this not break any unit tests? That would be very surprising. (but I'm not familiar with etcd practice)

Right. I am planning to add a test case today.

I was thinking to depend on the generic linearizablity test case, but it seems it makes more sense to add a specific test case to reproduce this issue.

That there isn't a unit test exercising this line is probably most of the problem...

ahrtr · 2023-02-08T10:18:24Z

I don't understand. #15235 just reproduces manual progress notification issue. If you did any modifications to it please publish it because again we are playing "she said he said".

First of all, please make sure if you need any clarification to the example workflow I mentioned in #15248 (comment). If it's clear to you, then I suppose it's pretty easy to intentionally create the scenario.

To avoid any confusion, please see #15262, please only read the third commit.

The first commit is directly coming from your PR (15235, rebased against main this morning my time);
the second commit is coming from etcdserver: Send requested progress notifications through watchStream (fix #15220) #15237, actually it isn't necessary.
I made some change in the third commit to reproduce the duplicated event issue.

Please see my local reproduction.

$ go test -run TestLinearizability
panic: duplicate operation in two responses on revision 40
---
first:
{
    "Header": {
        "cluster_id": 238453183653593855,
        "member_id": 14578408409545168728,
        "revision": 40,
        "raft_term": 3
    },
    "Events": [
        {
            "kv": {
                "key": "MQ==",
                "create_revision": 25,
                "mod_revision": 40,
                "version": 4,
                "value": "NDg="
            }
        }
    ],
    "CompactRevision": 0,
    "Canceled": false,
    "Created": false
}
---
second:
{
    "Header": {
        "cluster_id": 238453183653593855,
        "member_id": 14578408409545168728,
        "revision": 40,
        "raft_term": 4
    },
    "Events": [
        {
            "kv": {
                "key": "MQ==",
                "create_revision": 25,
                "mod_revision": 40,
                "version": 4,
                "value": "NDg="
            }
        }
    ],
    "CompactRevision": 0,
    "Canceled": false,
    "Created": false
}

goroutine 77 [running]:
go.etcd.io/etcd/tests/v3/linearizability.collectMemberWatchEvents({0x1e61d10, 0xc0003ba600}, 0xc0001a23f0?, 0xc0000421e0)
	/Users/wachao/go/src/github.com/ahrtr/etcd/tests/linearizability/watch.go:120 +0xb4a
go.etcd.io/etcd/tests/v3/linearizability.collectClusterWatchEvents.func1(0x0, 0x0?)
	/Users/wachao/go/src/github.com/ahrtr/etcd/tests/linearizability/watch.go:52 +0xc5
created by go.etcd.io/etcd/tests/v3/linearizability.collectClusterWatchEvents
	/Users/wachao/go/src/github.com/ahrtr/etcd/tests/linearizability/watch.go:49 +0x4cd
exit status 2
FAIL	go.etcd.io/etcd/tests/v3/linearizability	5.483s

ahrtr · 2023-02-08T20:54:06Z

To give all people more confidence, I also manually reproduced this issue against main branch using a simple program below.

package main

import (
	"context"
	"encoding/json"
	"fmt"

	"go.etcd.io/etcd/client/v3"
)

func main() {
	cfg := clientv3.Config{
		Endpoints: []string{"http://127.0.0.1:2379"},
		LogConfig: nil,
	}

	cli, err := clientv3.New(cfg)
	if err != nil {
		panic(err)
	}

	for wr := range cli.Watch(context.Background(), "k1", clientv3.WithProgressNotify()) {
		if wr.Err() != nil {
			fmt.Printf("Error: %v\n", wr.Err())
		} else {
			fmt.Println("------")
			respMsg, err := json.MarshalIndent(wr, "", " ")
			if err != nil {
				fmt.Printf("Failed to marshal response, error: %v\n", err)
			} else {
				fmt.Println(fmt.Sprintf("%s", respMsg))
			}
		}
	}
}

The steps are:

Start etcd: ./etcd --experimental-watch-progress-notify-interval="5s";
Run the program above: go run main.go ; You will see periodically progress notification every 5 seconds;
Execute ./etcdctl put k1 v1. The program started at step 2 received the response.

------
{
 "Header": {
  "cluster_id": 14841639068965178418,
  "member_id": 10276657743932975437,
  "revision": 2,
  "raft_term": 3
 },
 "Events": [
  {
   "kv": {
    "key": "azE=",
    "create_revision": 2,
    "mod_revision": 2,
    "version": 1,
    "value": "djE="
   }
  }
 ],
 "CompactRevision": 0,
 "Canceled": false,
 "Created": false
}

Wait 6 seconds ( please make sure > 5s to receive at least one progress notification), then stop etcd.
Start etcd again, then the client ( the program started at step 2) will receive exactly the same response as step 3.

ahrtr · 2023-02-08T21:12:18Z

It's also reproduced on etcd 3.5.7. Is this a big problem to K8s? @liggitt

Signed-off-by: Benjamin Wang <wachao@vmware.com>

ahrtr · 2023-02-09T02:46:34Z

Added an integration test case. Without the fix in this PR, the test case always fails. After applying the fix, the test case always succeeds.

It's pretty sure it fixes the duplicated event issue .

ahrtr · 2023-02-09T03:30:09Z

cc @mitake @ptabor @spzala as well, PTAL.

serathius

Looks good!
With the integration test it's clear that there was a bug here. Thanks for going additional mile.

Any inclination what is missing in linearizability test that prevented reproducing this? I expect that we didn't have as it's etcd failing scenario that is not as well covered by integration/e2e tests.

Failure injection is domain of functional/linearizability tests, so I would like to identify why we didn't reproduce it that way.

liggitt · 2023-02-09T12:43:53Z

It's also reproduced on etcd 3.5.7. Is this a big problem to K8s? @liggitt

umm... I'm trying to think how bad repeating the latest event in a watch stream could be... it's definitely not as bad as replaying non-latest historical events. It could definitely confuse a watcher that was taking action on observing transitions and didn't expect to see the same transition occur two times, but a quick sweep didn't turn up any obvious issues with in-tree watchers in kubernetes (we almost universally work on the latest observed object from a locally maintained cache, and replaying the latest event doesn't break that).

for this scenario to occur, would all of these would have to happen:

the API server opts into progress notifications (default behavior)
no writes to etcd have occurred since the last object create/update/delete write received by the etcd watcher (so the latest revision is the same as the revision on the last CRUD event received)
a progress notification is the last event received by the etcd watcher
something would have to trigger the etcd watcher reestablishing the watch (connection loss, etcd server restart, etc)

ahrtr · 2023-02-09T20:46:41Z

Any inclination what is missing in linearizability test that prevented reproducing this?

The linearizability test is sort of generic test, I don't surprise that it can't reliably reproduce this issue. It's exactly the reason why I implemented a dedicated integration test for this issue. For more details, please refer to #15248 (comment) and the third commit in #15262

umm... I'm trying to think how bad repeating the latest event in a watch stream could be... it's definitely not as bad as replaying non-latest historical events

Yes, it only replay the latest event.

for this scenario to occur, would all of these would have to happen:

1. the API server opts into progress notifications (default behavior)

2. no writes to etcd have occurred since the last object create/update/delete write received by the etcd watcher (so the latest revision is the same as the revision on the last CRUD event received)

3. a progress notification is the last event received by the etcd watcher

4. something would have to trigger the etcd watcher reestablishing the watch (connection loss, etcd server restart, etc)

Exactly!

ahrtr · 2023-02-09T20:51:34Z

umm... I'm trying to think how bad repeating the latest event in a watch stream could be... it's definitely not as bad as replaying non-latest historical events. It could definitely confuse a watcher that was taking action on observing transitions and didn't expect to see the same transition occur two times, but a quick sweep didn't turn up any obvious issues with in-tree watchers in kubernetes (we almost universally work on the latest observed object from a locally maintained cache, and replaying the latest event doesn't break that).

Thanks @liggitt for the feedback.

serathius · 2023-02-10T12:18:16Z

FYI Linearizability tests have reproduced the issue the even after the fix was merged!

Run https://github.com/etcd-io/etcd/actions/runs/4142560023/jobs/7163370423

From run actions/checkout

/usr/bin/git log -1 --format='%H'
'36fc3cae65cac1b844d2561471dc95b44753618d'

From logs

2023-02-10T09:53:31.8967095Z     logger.go:130: 2023-02-10T09:53:30.726Z	INFO	Recorded operations	{"count": 690}
2023-02-10T09:53:31.8969114Z     logger.go:130: 2023-02-10T09:53:30.726Z	INFO	Average traffic	{"qps": 278.4656727398674}
2023-02-10T09:53:31.8970286Z     logger.go:130: 2023-02-10T09:53:31.726Z	INFO	killing server...	{"name": "TestLinearizabilityIssue14370-test-0"}
2023-02-10T09:53:31.8971440Z     logger.go:130: 2023-02-10T09:53:31.727Z	INFO	stopping server...	{"name": "TestLinearizabilityIssue14370-test-0"}
2023-02-10T09:53:31.8972771Z     logger.go:130: 2023-02-10T09:53:31.730Z	INFO	stopped server.	{"name": "TestLinearizabilityIssue14370-test-0"}
2023-02-10T09:53:31.8973344Z     watch.go:99: Expect revision to grow by 1, last: 166, mod: 166
2023-02-10T09:53:31.8974457Z     logger.go:130: 2023-02-10T09:53:31.863Z	INFO	Saving visualization	{"path": "/tmp/linearizability/TestLinearizability_Issue14370/history.html"}
2023-02-10T09:53:31.8975158Z     logger.go:130: 2023-02-10T09:53:31.889Z	INFO	closing test cluster...
2023-02-10T09:53:31.8975976Z     logger.go:130: 2023-02-10T09:53:31.889Z	INFO	stopping server...	{"name": "TestLinearizabilityIssue14370-test-0"}
2023-02-10T09:53:31.8976770Z     logger.go:130: 2023-02-10T09:53:31.889Z	INFO	closing server...	{"name": "TestLinearizabilityIssue14370-test-0"}
2023-02-10T09:53:31.8977579Z     logger.go:130: 2023-02-10T09:53:31.889Z	INFO	stopping server...	{"name": "TestLinearizabilityIssue14370-test-0"}
2023-02-10T09:53:31.8978441Z     logger.go:130: 2023-02-10T09:53:31.889Z	INFO	removing directory	{"data-dir": "/tmp/TestLinearizabilityIssue143702810931773/001"}
2023-02-10T09:53:31.8979302Z     logger.go:130: 2023-02-10T09:53:31.892Z	INFO	closed test cluster.
2023-02-10T09:53:31.8979814Z --- FAIL: TestLinearizability (25.73s)
2023-02-10T09:53:31.9976172Z �[1m�[0;31mFAIL: (code:1):
2023-02-10T09:53:31.9977152Z     --- PASS: TestLinearizability/ClusterOfSize1/LowTraffic (3.95s)
2023-02-10T09:53:31.9978328Z     --- PASS: TestLinearizability/ClusterOfSize3/LowTraffic (6.70s)
2023-02-10T09:53:31.9979892Z     --- PASS: TestLinearizability/ClusterOfSize1/HighTraffic (4.49s)
2023-02-10T09:53:31.9980995Z     --- PASS: TestLinearizability/ClusterOfSize3/HighTraffic (6.29s)
2023-02-10T09:53:31.9981901Z     --- FAIL: TestLinearizability/Issue14370 (4.29s)

ptabor · 2023-02-10T12:25:11Z

@serathius Do you mean they reproduced the bigger issue of #15271 or something else ?

ahrtr · 2023-02-10T12:25:22Z

2023-02-10T09:53:31.8973344Z watch.go:99: Expect revision to grow by 1, last: 166, mod: 166

I saw similar error message with the help of #15272. Do not get time to dig into it so far. It isn't confirmed that whether it's a test issue or production issue for now. Will take a deep dive later.

ahrtr · 2023-02-10T12:29:59Z

It's my top priority next week.

serathius · 2023-02-10T13:48:51Z

@serathius Do you mean they reproduced the bigger issue of #15271 or something else ?

It's totally a different beast as they have different failpoint triggers.

ahrtr · 2023-02-12T07:38:12Z

Please all people in this loop do not panic, it's confirmed that it's just a test issue. FYI. #15288

Just to be more clear, I mean the issue mentioned in #15271 is just a test issue.

ahrtr mentioned this pull request Feb 6, 2023

etcdserver: Send requested progress notifications through watchStream (fix #15220) #15237

Merged

ahrtr marked this pull request as draft February 6, 2023 11:06

ahrtr force-pushed the guard_watch_rev_20230206 branch from 57b829a to a9b508c Compare February 7, 2023 02:44

ahrtr added backport/v3.4 backport/v3.5 labels Feb 7, 2023

ahrtr marked this pull request as ready for review February 7, 2023 02:48

ahrtr changed the title ~~clientv3: guard watchRequest.rev from decreasing due to RequestProgress issue~~ clientv3: correct nextRev on receiving progress notification response Feb 7, 2023

ptabor reviewed Feb 8, 2023

View reviewed changes

ahrtr force-pushed the guard_watch_rev_20230206 branch from a9b508c to 283d0df Compare February 9, 2023 02:40

clientv3: correct the nextRev on receving progress notification response

3ca4700

Signed-off-by: Benjamin Wang <wachao@vmware.com>

ahrtr force-pushed the guard_watch_rev_20230206 branch from 283d0df to 3ca4700 Compare February 9, 2023 02:42

serathius approved these changes Feb 9, 2023

View reviewed changes

serathius mentioned this pull request Feb 9, 2023

Add more gofailpoints to validate critical parts of the code #14735

Open

8 tasks

ahrtr merged commit 36fc3ca into etcd-io:main Feb 10, 2023

This was referenced Feb 10, 2023

[3.5] clientv3: correct the nextRev on receving progress notification response #15274

Merged

[3.4] clientv3: correct the nextRev on receving progress notification response #15275

Merged

changelog: add items for the duplicated events issue #15276

Merged

serathius mentioned this pull request Mar 30, 2023

Plans for v3.5.8 release #15585

Closed

ahrtr mentioned this pull request May 5, 2023

Update the api guarantees based on Kubernetes-etcd implicit contract document etcd-io/website#676

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clientv3: correct `nextRev` on receiving progress notification response #15248

clientv3: correct `nextRev` on receiving progress notification response #15248

ahrtr commented Feb 6, 2023 •

edited

Loading

ahrtr commented Feb 6, 2023

codecov-commenter commented Feb 6, 2023 •

edited

Loading

serathius commented Feb 6, 2023 •

edited

Loading

ahrtr commented Feb 6, 2023

ahrtr commented Feb 7, 2023 •

edited

Loading

ahrtr commented Feb 7, 2023

serathius commented Feb 7, 2023

ahrtr commented Feb 7, 2023

serathius commented Feb 8, 2023

ptabor Feb 8, 2023

ahrtr Feb 8, 2023

logicalhan Feb 8, 2023

liggitt Feb 8, 2023

lavalamp Feb 8, 2023 •

edited

Loading

ahrtr Feb 8, 2023 •

edited

Loading

lavalamp Feb 9, 2023

ahrtr Feb 9, 2023

ahrtr Feb 9, 2023

lavalamp Feb 9, 2023

ahrtr commented Feb 8, 2023 •

edited

Loading

ahrtr commented Feb 8, 2023

ahrtr commented Feb 8, 2023

ahrtr commented Feb 9, 2023

ahrtr commented Feb 9, 2023

serathius left a comment

liggitt commented Feb 9, 2023

ahrtr commented Feb 9, 2023

ahrtr commented Feb 9, 2023

serathius commented Feb 10, 2023 •

edited

Loading

ptabor commented Feb 10, 2023

ahrtr commented Feb 10, 2023

ahrtr commented Feb 10, 2023

serathius commented Feb 10, 2023

ahrtr commented Feb 12, 2023 •

edited

Loading

clientv3: correct nextRev on receiving progress notification response #15248

clientv3: correct nextRev on receiving progress notification response #15248

Conversation

ahrtr commented Feb 6, 2023 • edited Loading

ahrtr commented Feb 6, 2023

codecov-commenter commented Feb 6, 2023 • edited Loading

Codecov Report

serathius commented Feb 6, 2023 • edited Loading

ahrtr commented Feb 6, 2023

ahrtr commented Feb 7, 2023 • edited Loading

ahrtr commented Feb 7, 2023

serathius commented Feb 7, 2023

ahrtr commented Feb 7, 2023

serathius commented Feb 8, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lavalamp Feb 8, 2023 • edited Loading

Choose a reason for hiding this comment

ahrtr Feb 8, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahrtr commented Feb 8, 2023 • edited Loading

ahrtr commented Feb 8, 2023

ahrtr commented Feb 8, 2023

ahrtr commented Feb 9, 2023

ahrtr commented Feb 9, 2023

serathius left a comment

Choose a reason for hiding this comment

liggitt commented Feb 9, 2023

ahrtr commented Feb 9, 2023

ahrtr commented Feb 9, 2023

serathius commented Feb 10, 2023 • edited Loading

ptabor commented Feb 10, 2023

ahrtr commented Feb 10, 2023

ahrtr commented Feb 10, 2023

serathius commented Feb 10, 2023

ahrtr commented Feb 12, 2023 • edited Loading

clientv3: correct `nextRev` on receiving progress notification response #15248

clientv3: correct `nextRev` on receiving progress notification response #15248

ahrtr commented Feb 6, 2023 •

edited

Loading

codecov-commenter commented Feb 6, 2023 •

edited

Loading

serathius commented Feb 6, 2023 •

edited

Loading

ahrtr commented Feb 7, 2023 •

edited

Loading

lavalamp Feb 8, 2023 •

edited

Loading

ahrtr Feb 8, 2023 •

edited

Loading

ahrtr commented Feb 8, 2023 •

edited

Loading

serathius commented Feb 10, 2023 •

edited

Loading

ahrtr commented Feb 12, 2023 •

edited

Loading