etcdserver: Send requested progress notifications through watchStream (fix #15220) #15237

scpmw · 2023-02-02T22:16:05Z

Progress notifications requested using ProgressRequest were sent directly using the ctrlStream, which means that they could race against watch responses in the watchStream.

This would especially happen when the stream was not synced - e.g. if you requested a progress notification on a freshly created unsynced watcher, the notification would typically arrive indicating a revision for which not all watch responses had been sent.

This changes the behaviour so that v3rpc always goes through the watch stream, using a new RequestProgressAll function that closely matches the behaviour of the v3rpc code - i.e.

Generate a message with WatchId -1, indicating the revision for all watchers in the stream
Guarantee that a response is (eventually) sent

The latter might require us to defer the response until all watchers are synced, which is likely as it should be. Note that we do not guarantee that the number of progress notifications matches the number of requests, only that eventually at least one gets sent.

scpmw · 2023-02-02T22:22:44Z

Previous discussion on this commit: scpmw@8de2177#r98961857

ahrtr · 2023-02-03T01:41:32Z

Please signoff the commit and resolve the pipeline errors.

cc @mitake

ahrtr · 2023-02-03T01:42:29Z

$ git rebase HEAD~1 --signoff
$ git push origin guarantee_progress_notify_order -f

codecov-commenter · 2023-02-03T09:10:15Z

Codecov Report

Merging #15237 (c055f3c) into main (9505d72) will decrease coverage by 0.17%.
The diff coverage is 73.33%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff             @@
##             main   #15237      +/-   ##
==========================================
- Coverage   74.76%   74.59%   -0.17%     
==========================================
  Files         416      416              
  Lines       34351    34369      +18     
==========================================
- Hits        25682    25638      -44     
- Misses       7029     7075      +46     
- Partials     1640     1656      +16

Flag	Coverage Δ
all	`74.59% <73.33%> (-0.17%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
server/storage/mvcc/watchable_store.go	`87.11% <61.90%> (-2.85%)`	⬇️
server/etcdserver/api/v3rpc/watch.go	`84.12% <100.00%> (-1.59%)`	⬇️
server/storage/mvcc/watcher.go	`100.00% <100.00%> (ø)`
client/v3/concurrency/mutex.go	`61.03% <0.00%> (-5.20%)`	⬇️
client/v3/concurrency/session.go	`85.10% <0.00%> (-4.26%)`	⬇️
server/storage/mvcc/kvstore_compaction.go	`95.74% <0.00%> (-4.26%)`	⬇️
server/etcdserver/api/v3rpc/util.go	`70.96% <0.00%> (-3.23%)`	⬇️
client/v3/leasing/txn.go	`88.09% <0.00%> (-3.18%)`	⬇️
server/etcdserver/api/v3rpc/interceptor.go	`74.47% <0.00%> (-3.13%)`	⬇️
server/etcdserver/api/v3election/election.go	`66.66% <0.00%> (-2.78%)`	⬇️
... and 21 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

scpmw · 2023-02-03T09:17:26Z

Done, I think.

serathius

Didn't dig into code, however I run #15235 on this and found an issue with implementation.

Looks like same watch event is emitted in twice. From the local run:

panic: duplicate operation in two responses on revision 332
---
first:
{
    "Header": {
        "cluster_id": 4512387775551231959,
        "member_id": 11173105686495536258,
        "revision": 332,
        "raft_term": 2
    },
    "Events": [
        {
            "kv": {
                "key": "MA==",
                "create_revision": 2,
                "mod_revision": 332,
                "version": 78,
                "value": "MzI4"
            }
        }
    ],
    "CompactRevision": 0,
    "Canceled": false,
    "Created": false
}
---
second:
{
    "Header": {
        "cluster_id": 4512387775551231959,
        "member_id": 11173105686495536258,
        "revision": 333,
        "raft_term": 3
    },
    "Events": [
        {
            "kv": {
                "key": "MA==",
                "create_revision": 2,
                "mod_revision": 332,
                "version": 78,
                "value": "MzI4"
            }
        },
        {
            "kv": {
                "key": "Mw==",
                "create_revision": 3,
                "mod_revision": 333,
                "version": 92,
                "value": "MzMw"
            }
        }
    ],
    "CompactRevision": 0,
    "Canceled": false,
    "Created": false
}

This shows two consecutive watch responses that include same event. First watch returned events for revisions 332-332 and second 332-333.
This situation doesn't happen on main branch nor when manual progress notifications are disabled.

scpmw · 2023-02-03T17:02:14Z

Uh, that's concerning. This doesn't happen for my test script - how do I reproduce this? I tried make test-linearizability for your branch, but that seems to yield me a "connection refused" error from a goPanicFailpoint. I hope I'm not overlooking something obvious here?

serathius · 2023-02-03T19:39:39Z

Sorry I still need to document linearizability tests properly. Linearizability tests require a etcd server binary compiled with gofail enabled. To do that please run

make gofail-enable
make build
make gofail-diable

scpmw · 2023-02-05T14:28:08Z

Ah, thanks, that made it work. Yes, I am getting the same error, but this might be a separate issue... I disabled the progress notification entirely server-side (i.e. commented out the code answering ProgressRequest) for both my branch and the version from your branch, and after disabling the Expected at least one notify check it still reproduced your error. So fairly confident that my fix is not what made this appear (that would have been really puzzling).

I'll see whether I can dig some more to find out why it happened in the first place. Using my test script I was unable to reproduce responses getting sent twice, though interestingly enough I found other instances where the revision went backwards. This time it was on actual watch responses, not progress notifications though. Could even be a connection there - if your script does reconnects, a wrong revision in the header might lead to duplicates? In any case, need to make sure.

server/storage/mvcc/watchable_store.go

ahrtr · 2023-02-06T08:35:17Z

The PR #15248 should can fix the duplicate event issue. No matter how we fix the RequestProgress issue, we should guarantee that the watchRequest.rev never decrease. cc @mitake

ahrtr · 2023-02-06T09:10:01Z

Mark this PR as draft until it's confirmed that it indeed fixes the RequestProgress issue.

ahrtr · 2023-02-07T03:04:56Z

FYI. The root cause of the duplicated event issue: #15248 (comment)

ahrtr · 2023-02-08T06:08:36Z

Confirmed that this PR indeed can fix this issue #15220. But the solution has major issue, because it requires all watchers (from all clients) to be synced, obviously it isn't good. Instead, we should only require all the watchers which share the same gRPC stream to be synced.

We can manage to resolve it, but I don't see a good reason to continue to support it as we already have timed progress notification. It may complicate the overall implementation, and is also error prone. The source of the issue is coming from the addition of RequestProgress. Please read #9855 and kubernetes/kubernetes#59848 to get more background. The motivation of adding the RequestProgress is to get the answer to question "what's my progress", and accordingly know whether the local cache is up to date. Read kubernetes/kubernetes#59848 (comment) for a quick reference. But the existing timed progress notification should be enough.

I am thinking probably we should deprecate the manual RequestProgress. Actually K8s doesn't use the manual RequestProgress at all. See #15260. @mitake @ptabor @serathius @spzala
@wojtek-t @liggitt @dims any objections or comments?

liggitt · 2023-02-08T16:35:47Z

@wojtek-t or @jpbetz might know more about the plans for requested progress notification than I do... I thought there was still a desire to use that in Kubernetes and it just hadn't happened yet

scpmw · 2023-02-08T19:41:18Z

@ahrtr - ah, good point. I had overlooked that the watchable is shared between multiple watchStreams. That obviously meant that the fix was wrong. Should be fixed now, also addressing @mitake's comments.

Deprecating manual progress notifications would send me back to the drawing board on how to determine the database state of watched keys from a watch stream with multiple watchers - this was the only feature in the API that looked like it would cross-synchronise watchers (after all, the documentation effectively claimed it would!). So not a fan, but obviously not up to me in the end.

server/storage/mvcc/watchable_store.go

ahrtr · 2023-02-08T21:52:16Z

server/storage/mvcc/watcher.go

@@ -58,6 +58,13 @@ type WatchStream interface {
 	// of the watchers since the watcher is currently synced.
 	RequestProgress(id WatchID)

+	// RequestProgressAll requests a progress notification for the
+	// entire watcher group.  The responses will be sent through


Please change the comment to something like below, (not a native English speaker, please feel free to adjust the wording.)

RequestProgressAll requests a progress notification for all the watchers (substreams) which share the same gRPC stream. ......

Re-worded it. Felt like it might be the wrong level of abstraction to talk about gRPC (strictly speaking we don't know where the WatchResponses go) and substreams (that's a client-side idea?), so I left it just at "share the same stream".

server/storage/mvcc/watchable_store.go

ahrtr · 2023-02-08T22:04:07Z

this was the only feature in the API that looked like it would cross-synchronise watchers

Yes, this is still true. The overall complexity seems acceptable, and that's good news, so I am not insist on deprecating it. But the manual progress notification needs to require the write lock, so I think it may cause reduce of performance if users use it too much.

So the timed progress notification is still recommended.

ahrtr · 2023-04-05T11:53:03Z

@scpmw could you please backport this PR to 3.5 and 3.4?

Note: we need to get the fix included in 3.5.8 and 3.4.25.

scpmw · 2023-04-06T08:45:31Z

I can give it a try - separate PRs, I assume? And I'd assume I can skip the linearisability / robustness tests, because that would basically require also backporting the entire infrastructure?

ahrtr · 2023-04-06T08:51:43Z

I can give it a try - separate PRs, I assume? And I'd assume I can skip the linearisability / robustness tests, because that would basically require also backporting the entire infrastructure?

Thanks @scpmw . Yes, please submit two separate PRs, one for release-3.5, and the other for release-3.4 branch. Regarding to robustness test, It's OK to skip it to me, but defer to @serathius to comment.

ahrtr · 2023-04-06T09:47:20Z

@scpmw please backport this PR to release-3.5 firstly, because we need to release 3.5.8 firstly. Thank you!

ahrtr · 2023-04-07T10:41:29Z

@scpmw Please let me know if you need my assistance on this.

My goal is to get it done by the end of next Monday, so that we can get 3.5.8 out before KubeCon. @serathius Do you have concern on this?

serathius · 2023-04-07T12:31:44Z

If the original PR is done, I don't see anything wrong with others finishing work by backporting them. Let's wait till Monday for @scpmw to respond.

ahrtr · 2023-04-07T21:50:33Z

If the original PR is done, I don't see anything wrong with others finishing work by backporting them

Agreed, but @scpmw already said he can give it a try. I will take over if no any response till next Monday.

scpmw · 2023-04-10T21:53:47Z

You guys move fast :)

Haven't got to it yet, and likely won't until the end of the week. I'm not confident enough in my general Go skills to squeeze something like this into the margins around Easter... So if you need this done more quickly, please don't wait for me.

ahrtr · 2023-04-10T22:18:06Z

Thanks @scpmw for the feedback. Then let me take care of it.

…t doesn't match watch range. When implementing the fix for progress notifications (etcd-io#15237) we made a incorrect assumption that that unsynched watches will always get at least one event. Unsynched watches include not only slow watchers, but also newly created watches that requested old revision. In case that non of the events match watch filter, those newly created watches might become synched without any event going through. Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>

When implementing the fix for progress notifications (etcd-io#15237) we made a incorrect assumption that that unsynched watches will always get at least one event. Unsynched watches include not only slow watchers, but also newly created watches that requested current or older revision. In case that non of the events match watch filter, those newly created watches might become synched without any event going through. Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>

serathius · 2024-03-08T10:27:32Z

This PR introduced another bug #17557

@scpmw @ahrtr do you know why we decided to defer progress notification until an event? It makes sense that we don't send a progress notification if any watch is unsynced, but why we defer notification? There is no guarantee that event comes.

The latter might require us to defer the response until all watchers are synced, which is likely as it should be. Note that we do not guarantee that the number of progress notifications matches the number of requests, only that eventually at least one gets sent.

Eventually is good, but we currently get never.

When implementing the fix for progress notifications (etcd-io#15237) we made a incorrect assumption that that unsynched watches will always get at least one event. Unsynched watches include not only slow watchers, but also newly created watches that requested current or older revision. In case that non of the events match watch filter, those newly created watches might become synched without any event going through. Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>

scpmw · 2024-03-08T12:45:12Z

Right. Mea culpa - I actually found this issue myself. The only situation this really happens is indeed on creation, because typically the reason we aren't synchronised is because there are messages left to send. So I tested that it can be fixed by just sending an "empty" message downstream in that particular case (i.e. at https://github.com/etcd-io/etcd/blob/main/server/storage/mvcc/watchable_store.go#L378). But this struck me as a fairly ugly way to approach this and I subsequently got a bit stuck trying to think of a better way.

Your proposed solution is to just not send a response to a progress request if the stream isn't fully synchronised? I think the key comment is in your patch:

As the client is unsynchronised, the server will not sent any notification, as client can infer progress from events.

With your patch that is not quite true for two reasons. Firstly - ironically - in the problematic scenario (create a new watcher on a "quiet" key, immediately send progress notification request) the client gets nothing back at all, so the client can in fact not infer progress (inconsequential as it would be in this case).

Secondly - and more subtly, so apologies in advance if I get this slightly wrong - if you see "progress" as defined in https://github.com/etcd-io/etcd/blob/main/api/etcdserverpb/rpc.proto:

// For watch progress responses, the header.revision indicates progress. All future events
// received in this stream are guaranteed to have a higher revision number than the
// header.revision number.

Then that is something you cannot tell from standard events, as in a situation where we are watching many keys it is always possible that you will more events for a certain revision. Therefore there are situations where you really want to force a progress notification to get that particular guarantee, not because you are particularly worried that you might be falling behind enough that compaction might have caught up with you.

Now admittedely this could be characterised as somewhat misusing progress notifications, and your change isn't even catastrophic (you'd just have to poll a bit until you get a notification back). There might even be conceptually better ways to do this - e.g. have some sort of flag in the response of whether there are more events at this revision? However, that introduces a protocol change and wouldn't even be that much easier to implement. So that's where I got stuck on this question...

serathius · 2024-03-11T09:47:39Z

Then that is something you cannot tell from standard events, as in a situation where we are watching many keys it is always possible that you will more events for a certain revision. Therefore there are situations where you really want to force a progress notification to get that particular guarantee, not because you are particularly worried that you might be falling behind enough that compaction might have caught up with you.

In standard events you just use event.Kv.ModRevision to get progress.

When implementing the fix for progress notifications (etcd-io#15237) we made a incorrect assumption that that unsynched watches will always get at least one event. Unsynched watches include not only slow watchers, but also newly created watches that requested current or older revision. In case that non of the events match watch filter, those newly created watches might become synched without any event going through. Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>

scpmw mentioned this pull request Feb 2, 2023

Requested watcher progress notifications are not synchronised with stream #15220

Closed

scpmw force-pushed the guarantee_progress_notify_order branch from 785735a to 7e589e5 Compare February 3, 2023 08:23

scpmw force-pushed the guarantee_progress_notify_order branch from 7e589e5 to c055f3c Compare February 3, 2023 09:16

serathius requested changes Feb 3, 2023

View reviewed changes

mitake reviewed Feb 5, 2023

View reviewed changes

server/storage/mvcc/watchable_store.go Show resolved Hide resolved

mitake reviewed Feb 5, 2023

View reviewed changes

server/storage/mvcc/watchable_store.go Outdated Show resolved Hide resolved

ahrtr marked this pull request as draft February 6, 2023 07:07

serathius mentioned this pull request Feb 7, 2023

Robustness tests should validate if etcd binary supports gofail #15258

Closed

ahrtr mentioned this pull request Feb 8, 2023

clientv3: deprecate RequestProgress in Watcher interface #15260

Closed

ahrtr mentioned this pull request Feb 8, 2023

clientv3: correct nextRev on receiving progress notification response #15248

Merged

serathius mentioned this pull request Feb 8, 2023

Reproduce the duplicated event issue #15262

Closed

scpmw force-pushed the guarantee_progress_notify_order branch from c055f3c to 3e79020 Compare February 8, 2023 19:32

scpmw force-pushed the guarantee_progress_notify_order branch 2 times, most recently from 416eda7 to 1725aa9 Compare February 8, 2023 21:18

ahrtr reviewed Feb 8, 2023

View reviewed changes

ahrtr mentioned this pull request Apr 10, 2023

Plans for v3.4.25 #15680

Closed

This was referenced Apr 11, 2023

[3.5] etcdserver: guarantee order of requested progress notifications #15695

Merged

[3.4] etcdserver: guarantee order of requested progress notification #15697

Merged

serathius mentioned this pull request Feb 28, 2024

Etcd not respond to progress notification on newly created watch RPC until there is a event in a watch #17507

Closed

4 tasks

serathius mentioned this pull request Mar 8, 2024

Fix progress notification for watch that doesn't get any events #17557

Merged

This was referenced Mar 11, 2024

[release-3.5] Fix progress notification for watch that doesn't get any events #17566

Merged

[release-3.4] Fix progress notification for watch that doesn't get any events #17567

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

etcdserver: Send requested progress notifications through watchStream (fix #15220) #15237

etcdserver: Send requested progress notifications through watchStream (fix #15220) #15237

scpmw commented Feb 2, 2023

scpmw commented Feb 2, 2023

ahrtr commented Feb 3, 2023

ahrtr commented Feb 3, 2023

codecov-commenter commented Feb 3, 2023 •

edited

Loading

scpmw commented Feb 3, 2023

serathius left a comment •

edited

Loading

scpmw commented Feb 3, 2023

serathius commented Feb 3, 2023

scpmw commented Feb 5, 2023 •

edited

Loading

ahrtr commented Feb 6, 2023 •

edited

Loading

ahrtr commented Feb 6, 2023

ahrtr commented Feb 7, 2023

ahrtr commented Feb 8, 2023

liggitt commented Feb 8, 2023

scpmw commented Feb 8, 2023

ahrtr Feb 8, 2023 •

edited

Loading

scpmw Feb 12, 2023

ahrtr commented Feb 8, 2023

ahrtr commented Apr 5, 2023 •

edited

Loading

scpmw commented Apr 6, 2023

ahrtr commented Apr 6, 2023

ahrtr commented Apr 6, 2023

ahrtr commented Apr 7, 2023

serathius commented Apr 7, 2023 •

edited

Loading

ahrtr commented Apr 7, 2023

scpmw commented Apr 10, 2023

ahrtr commented Apr 10, 2023

serathius commented Mar 8, 2024

scpmw commented Mar 8, 2024 •

edited

Loading

serathius commented Mar 11, 2024

etcdserver: Send requested progress notifications through watchStream (fix #15220) #15237

etcdserver: Send requested progress notifications through watchStream (fix #15220) #15237

Conversation

scpmw commented Feb 2, 2023

scpmw commented Feb 2, 2023

ahrtr commented Feb 3, 2023

ahrtr commented Feb 3, 2023

codecov-commenter commented Feb 3, 2023 • edited Loading

Codecov Report

scpmw commented Feb 3, 2023

serathius left a comment • edited Loading

Choose a reason for hiding this comment

scpmw commented Feb 3, 2023

serathius commented Feb 3, 2023

scpmw commented Feb 5, 2023 • edited Loading

ahrtr commented Feb 6, 2023 • edited Loading

ahrtr commented Feb 6, 2023

ahrtr commented Feb 7, 2023

ahrtr commented Feb 8, 2023

liggitt commented Feb 8, 2023

scpmw commented Feb 8, 2023

ahrtr Feb 8, 2023 • edited Loading

Choose a reason for hiding this comment

scpmw Feb 12, 2023

Choose a reason for hiding this comment

ahrtr commented Feb 8, 2023

ahrtr commented Apr 5, 2023 • edited Loading

scpmw commented Apr 6, 2023

ahrtr commented Apr 6, 2023

ahrtr commented Apr 6, 2023

ahrtr commented Apr 7, 2023

serathius commented Apr 7, 2023 • edited Loading

ahrtr commented Apr 7, 2023

scpmw commented Apr 10, 2023

ahrtr commented Apr 10, 2023

serathius commented Mar 8, 2024

scpmw commented Mar 8, 2024 • edited Loading

serathius commented Mar 11, 2024

codecov-commenter commented Feb 3, 2023 •

edited

Loading

serathius left a comment •

edited

Loading

scpmw commented Feb 5, 2023 •

edited

Loading

ahrtr commented Feb 6, 2023 •

edited

Loading

ahrtr Feb 8, 2023 •

edited

Loading

ahrtr commented Apr 5, 2023 •

edited

Loading

serathius commented Apr 7, 2023 •

edited

Loading

scpmw commented Mar 8, 2024 •

edited

Loading