Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clientv3: correct nextRev on receiving progress notification response #15248

Merged
merged 1 commit into from
Feb 10, 2023

Conversation

ahrtr
Copy link
Member

@ahrtr ahrtr commented Feb 6, 2023

The manual ProgressRequest has issue, it may send inconsistent rev back to client. We should guard the watchRequest.rev from decreasing, otherwise, the client may receive duplicate events.

Confirmed that this PR can fix the duplicated event issue.

Signed-off-by: Benjamin Wang wachao@vmware.com

Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.

@ahrtr
Copy link
Member Author

ahrtr commented Feb 6, 2023

cc @mitake

@codecov-commenter
Copy link

codecov-commenter commented Feb 6, 2023

Codecov Report

Merging #15248 (586eacc) into main (8f54d38) will decrease coverage by 0.18%.
The diff coverage is 100.00%.

❗ Current head 586eacc differs from pull request most recent head a9b508c. Consider uploading reports for the commit a9b508c to get more accurate results

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff             @@
##             main   #15248      +/-   ##
==========================================
- Coverage   74.94%   74.76%   -0.18%     
==========================================
  Files         416      416              
  Lines       34351    34352       +1     
==========================================
- Hits        25744    25683      -61     
- Misses       6981     7029      +48     
- Partials     1626     1640      +14     
Flag Coverage Δ
all 74.76% <100.00%> (-0.18%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
server/storage/wal/wal.go 57.45% <100.00%> (+0.07%) ⬆️
server/proxy/grpcproxy/register.go 69.76% <0.00%> (-20.94%) ⬇️
server/storage/mvcc/watchable_store.go 85.30% <0.00%> (-7.53%) ⬇️
client/v3/namespace/watch.go 87.87% <0.00%> (-6.07%) ⬇️
client/pkg/v3/testutil/leak.go 62.83% <0.00%> (-4.43%) ⬇️
server/etcdserver/txn/util.go 96.42% <0.00%> (-3.58%) ⬇️
client/v3/concurrency/election.go 79.68% <0.00%> (-2.35%) ⬇️
server/etcdserver/api/rafthttp/msgappv2_codec.go 73.04% <0.00%> (-1.74%) ⬇️
pkg/proxy/server.go 61.01% <0.00%> (-1.02%) ⬇️
server/proxy/grpcproxy/lease.go 88.23% <0.00%> (-0.91%) ⬇️
... and 4 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@serathius
Copy link
Member

serathius commented Feb 6, 2023

Not sure what's the purpose of this change. Bumping revision to handle manual progress request might have an adverse effect of confusing the client.
Server reporting lower revision is not expected and a strong indicated of something being really wrong. Client should not try to patch it as

I it would be better if client just hard failed and refuse to handle invalid response from server (panic, return error or reset watch).
cc @liggitt @wojtek-t @ptab

@ahrtr ahrtr marked this pull request as draft February 6, 2023 11:06
@ahrtr
Copy link
Member Author

ahrtr commented Feb 6, 2023

Note I am still in progress of debugging this issue, just marked the PR as draft.

@ahrtr ahrtr marked this pull request as ready for review February 7, 2023 02:48
@ahrtr ahrtr changed the title clientv3: guard watchRequest.rev from decreasing due to RequestProgress issue clientv3: correct nextRev on receiving progress notification response Feb 7, 2023
@ahrtr
Copy link
Member Author

ahrtr commented Feb 7, 2023

It turns out to be a clientv3's issue.

When the clientv3 receives a progress notification response, it sets nextRev to the current revision (wr.Header.Revision), it isn't correct. It should set nextRev to wr.Header.Revision + 1. Otherwise, the client side may receive duplicated events when etcdserver restarted for whatever reason.

Let's work with an example,

  1. Assuming the latest revision is 79, and the client side already received the event with rev 79;
  2. etcdserver keeps sending progress notification with 79, and the client side update nextRev to 79 (It's the root cause of this issue);
  3. If the etcdserver crashes for whatever reason, the client side re-sends the watchRequest with nextRev 79;
  4. When the etcdserver starts again, it sends watchResponse starting from rev 79. Accordingly the client side receives the event with rev 79 twice.

Note that it's pretty easy to reproduce this issue using #15235. After applying this PR, it can't be reproduced anymore.

We need to backport this PR to both 3.4 and 3.5. @mitake @ptabor @serathius PTAL.

@ahrtr
Copy link
Member Author

ahrtr commented Feb 7, 2023

@wojtek-t does this issue have any impact on K8s?

@serathius
Copy link
Member

@ahrtr As we have established, K8s doesn't use manual progress notify so it's not affected.

@ahrtr
Copy link
Member Author

ahrtr commented Feb 7, 2023

@ahrtr As we have established, K8s doesn't use manual progress notify so it's not affected.

This isn't correct. Both manual progress notification and timed progress notification have this issue. Note that it can only happen when etcdserver restarts (crashes and starts again) and the last watch response received by the client side is a progress notification. I already reproduced it using #15235 (with some minor modification). Please see the example workflow in #15248 (comment)

Could you evaluate the impact on K8s? cc @wojtek-t @dims @liggitt @nikhita @neolit123

@serathius
Copy link
Member

I don't understand. #15235 just reproduces manual progress notification issue. If you did any modifications to it please publish it because again we are playing "she said he said".

I have already added verification for duplicated revisions in #15250 and validated timed progress notification #15257.

@@ -880,12 +880,13 @@ func (w *watchGrpcStream) serveSubstream(ws *watcherStream, resumec chan struct{
}
} else {
// current progress of watch; <= store revision
nextRev = wr.Header.Revision
nextRev = wr.Header.Revision + 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't read this code deeply... but please make sure the 'fragment' field is properly interpreted:

bool fragment = 7 [(versionpb.etcd_version_field)="3.4"];

I assume that in case of fragmented response, there might be multiple responses with the same revision.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good question. The fragments have already been merged before the merged response being dispatched to the serveSubstream. The rough workflow is as below,

  1. Firstly the client side merges all fragments if needed. See watch.go#L644-L648;
  2. dispatch the response. See watch.go#L651;
  3. The serveSubstream forwards the response to the subscriber. See watch.go#L845;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Er... this is going to break Kubernetes, which manually does this already.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, please don't change the meaning of the data returned to offset by one

Copy link

@lavalamp lavalamp Feb 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this change may be correct -- look at line 887, which is the normal case. It adds one.

Copy link
Member Author

@ahrtr ahrtr Feb 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, if this doesn't change the revisions returned to clients

Right. It doesn't change anything being returned to client (again, etcd clientv3's user.)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect test changes to come in the same PR as the fix -- like did this not break any unit tests? That would be very surprising. (but I'm not familiar with etcd practice)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect test changes to come in the same PR as the fix -- like did this not break any unit tests? That would be very surprising. (but I'm not familiar with etcd practice)

Right. I am planning to add a test case today.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking to depend on the generic linearizablity test case, but it seems it makes more sense to add a specific test case to reproduce this issue.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That there isn't a unit test exercising this line is probably most of the problem...

@ahrtr
Copy link
Member Author

ahrtr commented Feb 8, 2023

I don't understand. #15235 just reproduces manual progress notification issue. If you did any modifications to it please publish it because again we are playing "she said he said".

First of all, please make sure if you need any clarification to the example workflow I mentioned in #15248 (comment). If it's clear to you, then I suppose it's pretty easy to intentionally create the scenario.

To avoid any confusion, please see #15262, please only read the third commit.

Please see my local reproduction.

$ go test -run TestLinearizability
panic: duplicate operation in two responses on revision 40
---
first:
{
    "Header": {
        "cluster_id": 238453183653593855,
        "member_id": 14578408409545168728,
        "revision": 40,
        "raft_term": 3
    },
    "Events": [
        {
            "kv": {
                "key": "MQ==",
                "create_revision": 25,
                "mod_revision": 40,
                "version": 4,
                "value": "NDg="
            }
        }
    ],
    "CompactRevision": 0,
    "Canceled": false,
    "Created": false
}
---
second:
{
    "Header": {
        "cluster_id": 238453183653593855,
        "member_id": 14578408409545168728,
        "revision": 40,
        "raft_term": 4
    },
    "Events": [
        {
            "kv": {
                "key": "MQ==",
                "create_revision": 25,
                "mod_revision": 40,
                "version": 4,
                "value": "NDg="
            }
        }
    ],
    "CompactRevision": 0,
    "Canceled": false,
    "Created": false
}

goroutine 77 [running]:
go.etcd.io/etcd/tests/v3/linearizability.collectMemberWatchEvents({0x1e61d10, 0xc0003ba600}, 0xc0001a23f0?, 0xc0000421e0)
	/Users/wachao/go/src/github.com/ahrtr/etcd/tests/linearizability/watch.go:120 +0xb4a
go.etcd.io/etcd/tests/v3/linearizability.collectClusterWatchEvents.func1(0x0, 0x0?)
	/Users/wachao/go/src/github.com/ahrtr/etcd/tests/linearizability/watch.go:52 +0xc5
created by go.etcd.io/etcd/tests/v3/linearizability.collectClusterWatchEvents
	/Users/wachao/go/src/github.com/ahrtr/etcd/tests/linearizability/watch.go:49 +0x4cd
exit status 2
FAIL	go.etcd.io/etcd/tests/v3/linearizability	5.483s

@ahrtr
Copy link
Member Author

ahrtr commented Feb 8, 2023

To give all people more confidence, I also manually reproduced this issue against main branch using a simple program below.

package main

import (
	"context"
	"encoding/json"
	"fmt"

	"go.etcd.io/etcd/client/v3"
)

func main() {
	cfg := clientv3.Config{
		Endpoints: []string{"http://127.0.0.1:2379"},
		LogConfig: nil,
	}

	cli, err := clientv3.New(cfg)
	if err != nil {
		panic(err)
	}

	for wr := range cli.Watch(context.Background(), "k1", clientv3.WithProgressNotify()) {
		if wr.Err() != nil {
			fmt.Printf("Error: %v\n", wr.Err())
		} else {
			fmt.Println("------")
			respMsg, err := json.MarshalIndent(wr, "", " ")
			if err != nil {
				fmt.Printf("Failed to marshal response, error: %v\n", err)
			} else {
				fmt.Println(fmt.Sprintf("%s", respMsg))
			}
		}
	}
}

The steps are:

  1. Start etcd: ./etcd --experimental-watch-progress-notify-interval="5s";
  2. Run the program above: go run main.go ; You will see periodically progress notification every 5 seconds;
  3. Execute ./etcdctl put k1 v1. The program started at step 2 received the response.
------
{
 "Header": {
  "cluster_id": 14841639068965178418,
  "member_id": 10276657743932975437,
  "revision": 2,
  "raft_term": 3
 },
 "Events": [
  {
   "kv": {
    "key": "azE=",
    "create_revision": 2,
    "mod_revision": 2,
    "version": 1,
    "value": "djE="
   }
  }
 ],
 "CompactRevision": 0,
 "Canceled": false,
 "Created": false
}
  1. Wait 6 seconds ( please make sure > 5s to receive at least one progress notification), then stop etcd.
  2. Start etcd again, then the client ( the program started at step 2) will receive exactly the same response as step 3.

@ahrtr
Copy link
Member Author

ahrtr commented Feb 8, 2023

It's also reproduced on etcd 3.5.7. Is this a big problem to K8s? @liggitt

Signed-off-by: Benjamin Wang <wachao@vmware.com>
@ahrtr
Copy link
Member Author

ahrtr commented Feb 9, 2023

Added an integration test case. Without the fix in this PR, the test case always fails. After applying the fix, the test case always succeeds.

It's pretty sure it fixes the duplicated event issue .

@ahrtr
Copy link
Member Author

ahrtr commented Feb 9, 2023

cc @mitake @ptabor @spzala as well, PTAL.

Copy link
Member

@serathius serathius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!
With the integration test it's clear that there was a bug here. Thanks for going additional mile.

Any inclination what is missing in linearizability test that prevented reproducing this? I expect that we didn't have as it's etcd failing scenario that is not as well covered by integration/e2e tests.

Failure injection is domain of functional/linearizability tests, so I would like to identify why we didn't reproduce it that way.

@liggitt
Copy link
Contributor

liggitt commented Feb 9, 2023

It's also reproduced on etcd 3.5.7. Is this a big problem to K8s? @liggitt

umm... I'm trying to think how bad repeating the latest event in a watch stream could be... it's definitely not as bad as replaying non-latest historical events. It could definitely confuse a watcher that was taking action on observing transitions and didn't expect to see the same transition occur two times, but a quick sweep didn't turn up any obvious issues with in-tree watchers in kubernetes (we almost universally work on the latest observed object from a locally maintained cache, and replaying the latest event doesn't break that).

for this scenario to occur, would all of these would have to happen:

  1. the API server opts into progress notifications (default behavior)
  2. no writes to etcd have occurred since the last object create/update/delete write received by the etcd watcher (so the latest revision is the same as the revision on the last CRUD event received)
  3. a progress notification is the last event received by the etcd watcher
  4. something would have to trigger the etcd watcher reestablishing the watch (connection loss, etcd server restart, etc)

@ahrtr
Copy link
Member Author

ahrtr commented Feb 9, 2023

Any inclination what is missing in linearizability test that prevented reproducing this?

The linearizability test is sort of generic test, I don't surprise that it can't reliably reproduce this issue. It's exactly the reason why I implemented a dedicated integration test for this issue. For more details, please refer to #15248 (comment) and the third commit in #15262

umm... I'm trying to think how bad repeating the latest event in a watch stream could be... it's definitely not as bad as replaying non-latest historical events

Yes, it only replay the latest event.

for this scenario to occur, would all of these would have to happen:

1. the API server opts into progress notifications (default behavior)

2. no writes to etcd have occurred since the last object create/update/delete write received by the etcd watcher (so the latest revision is the same as the revision on the last CRUD event received)

3. a progress notification is the last event received by the etcd watcher

4. something would have to trigger the etcd watcher reestablishing the watch (connection loss, etcd server restart, etc)

Exactly!

@ahrtr
Copy link
Member Author

ahrtr commented Feb 9, 2023

umm... I'm trying to think how bad repeating the latest event in a watch stream could be... it's definitely not as bad as replaying non-latest historical events. It could definitely confuse a watcher that was taking action on observing transitions and didn't expect to see the same transition occur two times, but a quick sweep didn't turn up any obvious issues with in-tree watchers in kubernetes (we almost universally work on the latest observed object from a locally maintained cache, and replaying the latest event doesn't break that).

Thanks @liggitt for the feedback.

@serathius
Copy link
Member

serathius commented Feb 10, 2023

FYI Linearizability tests have reproduced the issue the even after the fix was merged!

Run https://github.com/etcd-io/etcd/actions/runs/4142560023/jobs/7163370423

From run actions/checkout

/usr/bin/git log -1 --format='%H'
'36fc3cae65cac1b844d2561471dc95b44753618d'

From logs

2023-02-10T09:53:31.8967095Z     logger.go:130: 2023-02-10T09:53:30.726Z	INFO	Recorded operations	{"count": 690}
2023-02-10T09:53:31.8969114Z     logger.go:130: 2023-02-10T09:53:30.726Z	INFO	Average traffic	{"qps": 278.4656727398674}
2023-02-10T09:53:31.8970286Z     logger.go:130: 2023-02-10T09:53:31.726Z	INFO	killing server...	{"name": "TestLinearizabilityIssue14370-test-0"}
2023-02-10T09:53:31.8971440Z     logger.go:130: 2023-02-10T09:53:31.727Z	INFO	stopping server...	{"name": "TestLinearizabilityIssue14370-test-0"}
2023-02-10T09:53:31.8972771Z     logger.go:130: 2023-02-10T09:53:31.730Z	INFO	stopped server.	{"name": "TestLinearizabilityIssue14370-test-0"}
2023-02-10T09:53:31.8973344Z     watch.go:99: Expect revision to grow by 1, last: 166, mod: 166
2023-02-10T09:53:31.8974457Z     logger.go:130: 2023-02-10T09:53:31.863Z	INFO	Saving visualization	{"path": "/tmp/linearizability/TestLinearizability_Issue14370/history.html"}
2023-02-10T09:53:31.8975158Z     logger.go:130: 2023-02-10T09:53:31.889Z	INFO	closing test cluster...
2023-02-10T09:53:31.8975976Z     logger.go:130: 2023-02-10T09:53:31.889Z	INFO	stopping server...	{"name": "TestLinearizabilityIssue14370-test-0"}
2023-02-10T09:53:31.8976770Z     logger.go:130: 2023-02-10T09:53:31.889Z	INFO	closing server...	{"name": "TestLinearizabilityIssue14370-test-0"}
2023-02-10T09:53:31.8977579Z     logger.go:130: 2023-02-10T09:53:31.889Z	INFO	stopping server...	{"name": "TestLinearizabilityIssue14370-test-0"}
2023-02-10T09:53:31.8978441Z     logger.go:130: 2023-02-10T09:53:31.889Z	INFO	removing directory	{"data-dir": "/tmp/TestLinearizabilityIssue143702810931773/001"}
2023-02-10T09:53:31.8979302Z     logger.go:130: 2023-02-10T09:53:31.892Z	INFO	closed test cluster.
2023-02-10T09:53:31.8979814Z --- FAIL: TestLinearizability (25.73s)
2023-02-10T09:53:31.9976172Z �[1m�[0;31mFAIL: (code:1):
2023-02-10T09:53:31.9977152Z     --- PASS: TestLinearizability/ClusterOfSize1/LowTraffic (3.95s)
2023-02-10T09:53:31.9978328Z     --- PASS: TestLinearizability/ClusterOfSize3/LowTraffic (6.70s)
2023-02-10T09:53:31.9979892Z     --- PASS: TestLinearizability/ClusterOfSize1/HighTraffic (4.49s)
2023-02-10T09:53:31.9980995Z     --- PASS: TestLinearizability/ClusterOfSize3/HighTraffic (6.29s)
2023-02-10T09:53:31.9981901Z     --- FAIL: TestLinearizability/Issue14370 (4.29s)

@ptabor
Copy link
Contributor

ptabor commented Feb 10, 2023

@serathius Do you mean they reproduced the bigger issue of #15271 or something else ?

@ahrtr
Copy link
Member Author

ahrtr commented Feb 10, 2023

2023-02-10T09:53:31.8973344Z watch.go:99: Expect revision to grow by 1, last: 166, mod: 166

I saw similar error message with the help of #15272. Do not get time to dig into it so far. It isn't confirmed that whether it's a test issue or production issue for now. Will take a deep dive later.

@ahrtr
Copy link
Member Author

ahrtr commented Feb 10, 2023

It's my top priority next week.

@serathius
Copy link
Member

@serathius Do you mean they reproduced the bigger issue of #15271 or something else ?

It's totally a different beast as they have different failpoint triggers.

@ahrtr
Copy link
Member Author

ahrtr commented Feb 12, 2023

Please all people in this loop do not panic, it's confirmed that it's just a test issue. FYI. #15288

Just to be more clear, I mean the issue mentioned in #15271 is just a test issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

7 participants