-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
etcdserver: Send requested progress notifications through watchStream (fix #15220) #15237
Conversation
Previous discussion on this commit: scpmw@8de2177#r98961857 |
Please signoff the commit and resolve the pipeline errors. cc @mitake |
|
785735a
to
7e589e5
Compare
Codecov Report
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more @@ Coverage Diff @@
## main #15237 +/- ##
==========================================
- Coverage 74.76% 74.59% -0.17%
==========================================
Files 416 416
Lines 34351 34369 +18
==========================================
- Hits 25682 25638 -44
- Misses 7029 7075 +46
- Partials 1640 1656 +16
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
7e589e5
to
c055f3c
Compare
Done, I think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't dig into code, however I run #15235 on this and found an issue with implementation.
Looks like same watch event is emitted in twice. From the local run:
panic: duplicate operation in two responses on revision 332
---
first:
{
"Header": {
"cluster_id": 4512387775551231959,
"member_id": 11173105686495536258,
"revision": 332,
"raft_term": 2
},
"Events": [
{
"kv": {
"key": "MA==",
"create_revision": 2,
"mod_revision": 332,
"version": 78,
"value": "MzI4"
}
}
],
"CompactRevision": 0,
"Canceled": false,
"Created": false
}
---
second:
{
"Header": {
"cluster_id": 4512387775551231959,
"member_id": 11173105686495536258,
"revision": 333,
"raft_term": 3
},
"Events": [
{
"kv": {
"key": "MA==",
"create_revision": 2,
"mod_revision": 332,
"version": 78,
"value": "MzI4"
}
},
{
"kv": {
"key": "Mw==",
"create_revision": 3,
"mod_revision": 333,
"version": 92,
"value": "MzMw"
}
}
],
"CompactRevision": 0,
"Canceled": false,
"Created": false
}
This shows two consecutive watch responses that include same event. First watch returned events for revisions 332-332 and second 332-333.
This situation doesn't happen on main branch nor when manual progress notifications are disabled.
Uh, that's concerning. This doesn't happen for my test script - how do I reproduce this? I tried |
Sorry I still need to document linearizability tests properly. Linearizability tests require a etcd server binary compiled with gofail enabled. To do that please run
|
Ah, thanks, that made it work. Yes, I am getting the same error, but this might be a separate issue... I disabled the progress notification entirely server-side (i.e. commented out the code answering I'll see whether I can dig some more to find out why it happened in the first place. Using my test script I was unable to reproduce responses getting sent twice, though interestingly enough I found other instances where the revision went backwards. This time it was on actual watch responses, not progress notifications though. Could even be a connection there - if your script does reconnects, a wrong revision in the header might lead to duplicates? In any case, need to make sure. |
The PR #15248 should can fix the duplicate event issue. No matter how we fix the RequestProgress issue, we should guarantee that the |
Mark this PR as draft until it's confirmed that it indeed fixes the |
FYI. The root cause of the duplicated event issue: #15248 (comment) |
Confirmed that this PR indeed can fix this issue #15220. But the solution has major issue, because it requires all watchers (from all clients) to be synced, obviously it isn't good. Instead, we should only require all the watchers which share the same gRPC stream to be synced. We can manage to resolve it, but I don't see a good reason to continue to support it as we already have timed progress notification. It may complicate the overall implementation, and is also error prone. The source of the issue is coming from the addition of RequestProgress. Please read #9855 and kubernetes/kubernetes#59848 to get more background. The motivation of adding the I am thinking probably we should deprecate the manual |
c055f3c
to
3e79020
Compare
@ahrtr - ah, good point. I had overlooked that the Deprecating manual progress notifications would send me back to the drawing board on how to determine the database state of watched keys from a watch stream with multiple watchers - this was the only feature in the API that looked like it would cross-synchronise watchers (after all, the documentation effectively claimed it would!). So not a fan, but obviously not up to me in the end. |
416eda7
to
1725aa9
Compare
server/storage/mvcc/watcher.go
Outdated
@@ -58,6 +58,13 @@ type WatchStream interface { | |||
// of the watchers since the watcher is currently synced. | |||
RequestProgress(id WatchID) | |||
|
|||
// RequestProgressAll requests a progress notification for the | |||
// entire watcher group. The responses will be sent through |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please change the comment to something like below, (not a native English speaker, please feel free to adjust the wording.)
RequestProgressAll requests a progress notification for all
the watchers (substreams) which share the same gRPC stream.
......
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-worded it. Felt like it might be the wrong level of abstraction to talk about gRPC (strictly speaking we don't know where the WatchResponses go) and substreams (that's a client-side idea?), so I left it just at "share the same stream".
Yes, this is still true. The overall complexity seems acceptable, and that's good news, so I am not insist on deprecating it. But the manual progress notification needs to require the write lock, so I think it may cause reduce of performance if users use it too much. So the timed progress notification is still recommended. |
@scpmw could you please backport this PR to 3.5 and 3.4? Note: we need to get the fix included in 3.5.8 and 3.4.25. |
I can give it a try - separate PRs, I assume? And I'd assume I can skip the linearisability / robustness tests, because that would basically require also backporting the entire infrastructure? |
Thanks @scpmw . Yes, please submit two separate PRs, one for release-3.5, and the other for release-3.4 branch. Regarding to robustness test, It's OK to skip it to me, but defer to @serathius to comment. |
@scpmw please backport this PR to release-3.5 firstly, because we need to release |
@scpmw Please let me know if you need my assistance on this. My goal is to get it done by the end of next Monday, so that we can get 3.5.8 out before KubeCon. @serathius Do you have concern on this? |
If the original PR is done, I don't see anything wrong with others finishing work by backporting them. Let's wait till Monday for @scpmw to respond. |
Agreed, but @scpmw already said he can give it a try. I will take over if no any response till next Monday. |
You guys move fast :) Haven't got to it yet, and likely won't until the end of the week. I'm not confident enough in my general Go skills to squeeze something like this into the margins around Easter... So if you need this done more quickly, please don't wait for me. |
Thanks @scpmw for the feedback. Then let me take care of it. |
…t doesn't match watch range. When implementing the fix for progress notifications (etcd-io#15237) we made a incorrect assumption that that unsynched watches will always get at least one event. Unsynched watches include not only slow watchers, but also newly created watches that requested old revision. In case that non of the events match watch filter, those newly created watches might become synched without any event going through. Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
When implementing the fix for progress notifications (etcd-io#15237) we made a incorrect assumption that that unsynched watches will always get at least one event. Unsynched watches include not only slow watchers, but also newly created watches that requested current or older revision. In case that non of the events match watch filter, those newly created watches might become synched without any event going through. Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
When implementing the fix for progress notifications (etcd-io#15237) we made a incorrect assumption that that unsynched watches will always get at least one event. Unsynched watches include not only slow watchers, but also newly created watches that requested current or older revision. In case that non of the events match watch filter, those newly created watches might become synched without any event going through. Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
This PR introduced another bug #17557 @scpmw @ahrtr do you know why we decided to defer progress notification until an event? It makes sense that we don't send a progress notification if any watch is unsynced, but why we defer notification? There is no guarantee that event comes.
Eventually is good, but we currently get never. |
When implementing the fix for progress notifications (etcd-io#15237) we made a incorrect assumption that that unsynched watches will always get at least one event. Unsynched watches include not only slow watchers, but also newly created watches that requested current or older revision. In case that non of the events match watch filter, those newly created watches might become synched without any event going through. Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
Right. Mea culpa - I actually found this issue myself. The only situation this really happens is indeed on creation, because typically the reason we aren't synchronised is because there are messages left to send. So I tested that it can be fixed by just sending an "empty" message downstream in that particular case (i.e. at https://github.com/etcd-io/etcd/blob/main/server/storage/mvcc/watchable_store.go#L378). But this struck me as a fairly ugly way to approach this and I subsequently got a bit stuck trying to think of a better way. Your proposed solution is to just not send a response to a progress request if the stream isn't fully synchronised? I think the key comment is in your patch:
With your patch that is not quite true for two reasons. Firstly - ironically - in the problematic scenario (create a new watcher on a "quiet" key, immediately send progress notification request) the client gets nothing back at all, so the client can in fact not infer progress (inconsequential as it would be in this case). Secondly - and more subtly, so apologies in advance if I get this slightly wrong - if you see "progress" as defined in https://github.com/etcd-io/etcd/blob/main/api/etcdserverpb/rpc.proto:
Then that is something you cannot tell from standard events, as in a situation where we are watching many keys it is always possible that you will more events for a certain revision. Therefore there are situations where you really want to force a progress notification to get that particular guarantee, not because you are particularly worried that you might be falling behind enough that compaction might have caught up with you. Now admittedely this could be characterised as somewhat misusing progress notifications, and your change isn't even catastrophic (you'd just have to poll a bit until you get a notification back). There might even be conceptually better ways to do this - e.g. have some sort of flag in the response of whether there are more events at this revision? However, that introduces a protocol change and wouldn't even be that much easier to implement. So that's where I got stuck on this question... |
In standard events you just use |
When implementing the fix for progress notifications (etcd-io#15237) we made a incorrect assumption that that unsynched watches will always get at least one event. Unsynched watches include not only slow watchers, but also newly created watches that requested current or older revision. In case that non of the events match watch filter, those newly created watches might become synched without any event going through. Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
When implementing the fix for progress notifications (etcd-io#15237) we made a incorrect assumption that that unsynched watches will always get at least one event. Unsynched watches include not only slow watchers, but also newly created watches that requested current or older revision. In case that non of the events match watch filter, those newly created watches might become synched without any event going through. Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
When implementing the fix for progress notifications (etcd-io#15237) we made a incorrect assumption that that unsynched watches will always get at least one event. Unsynched watches include not only slow watchers, but also newly created watches that requested current or older revision. In case that non of the events match watch filter, those newly created watches might become synched without any event going through. Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
When implementing the fix for progress notifications (etcd-io#15237) we made a incorrect assumption that that unsynched watches will always get at least one event. Unsynched watches include not only slow watchers, but also newly created watches that requested current or older revision. In case that non of the events match watch filter, those newly created watches might become synched without any event going through. Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
When implementing the fix for progress notifications (etcd-io#15237) we made a incorrect assumption that that unsynched watches will always get at least one event. Unsynched watches include not only slow watchers, but also newly created watches that requested current or older revision. In case that non of the events match watch filter, those newly created watches might become synched without any event going through. Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
When implementing the fix for progress notifications (etcd-io#15237) we made a incorrect assumption that that unsynched watches will always get at least one event. Unsynched watches include not only slow watchers, but also newly created watches that requested current or older revision. In case that non of the events match watch filter, those newly created watches might become synched without any event going through. Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
When implementing the fix for progress notifications (etcd-io#15237) we made a incorrect assumption that that unsynched watches will always get at least one event. Unsynched watches include not only slow watchers, but also newly created watches that requested current or older revision. In case that non of the events match watch filter, those newly created watches might become synched without any event going through. Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
Progress notifications requested using
ProgressRequest
were sent directly using thectrlStream
, which means that they could race against watch responses in thewatchStream
.This would especially happen when the stream was not synced - e.g. if you requested a progress notification on a freshly created unsynced watcher, the notification would typically arrive indicating a revision for which not all watch responses had been sent.
This changes the behaviour so that v3rpc always goes through the watch stream, using a new RequestProgressAll function that closely matches the behaviour of the v3rpc code - i.e.
Generate a message with WatchId -1, indicating the revision for all watchers in the stream
Guarantee that a response is (eventually) sent
The latter might require us to defer the response until all watchers are synced, which is likely as it should be. Note that we do not guarantee that the number of progress notifications matches the number of requests, only that eventually at least one gets sent.