Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproduce watch starvation and event loss #17535

Conversation

chaochn47
Copy link
Member

@chaochn47 chaochn47 commented Mar 6, 2024

Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.

Test box is using m5.4xlarge instance type which has 16vCPU and 64GiB Memory

export GOWORK=off
make build
rm $HOME/watch-lost.log

pushd tests/e2e && EXPECT_DEBUG=true go test -v -timeout=2h -run TestWatchOnStreamMultiplex | tee $HOME/watch-lost.log

Set numOfDirectWatches to 800

There is a gap of events not being sent out.

13006     logger.go:130: 2024-03-06T01:21:32.267Z     INFO    watch-cache     got watch response      {"event-type": "DELETE", "key": "/registry/pods/196/33", "rev": 13001}
13007     logger.go:130: 2024-03-06T01:21:32.267Z     INFO    watch-cache     got watch response      {"event-type": "PUT", "key": "/registry/pods/197/33", "rev": 13002}
13008     logger.go:130: 2024-03-06T01:21:32.569Z     DEBUG   compaction      compacted rev   {"compact-revision": 79244}
13009     logger.go:130: 2024-03-06T01:21:33.819Z     INFO    watch-cache     got watch response      {"event-type": "PUT", "key": "/registry/pods/102/438", "rev": 174748}
13010     logger.go:130: 2024-03-06T01:21:33.820Z     INFO    watch-cache     got watch response      {"event-type": "PUT", "key": "/registry/pods/149/439", "rev": 174749}
13011     logger.go:130: 2024-03-06T01:21:33.823Z     INFO    watch-cache     got watch response      {"event-type": "PUT", "key": "/registry/pods/64/437", "rev": 174750}
13012     logger.go:130: 2024-03-06T01:21:38.505Z     INFO    watch-cache     got watch response      {"event-type": "PUT", "key": "/registry/pods/173/438", "rev": 174751}

And stuck at receiving 85669 number of events but there are at least 247214 mutation requests sent successfully to server.

logger.go:130: 2024-03-06T01:25:50.702Z	WARN	compareEvents	watch events received is lagging behind	{"watch-events-received": 85669, "events-triggered": 247214}

After changing numOfDirectWatches to 100

    logger.go:130: 2024-03-06T03:27:47.664Z	INFO	compareEvents	The number of events watch cache received is high than or equal to events triggered on client side	{"watch-cache-received": 191156, "traffic-triggered": 190962}
    logger.go:130: 2024-03-06T03:27:47.664Z	DEBUG	watch-cache	watcher exited
    logger.go:130: 2024-03-06T03:27:47.664Z	INFO	closing test cluster...
    logger.go:130: 2024-03-06T03:27:47.665Z	INFO	stoping server...	{"name": "test-0"}
    logger.go:130: 2024-03-06T03:27:47.684Z	INFO	stopped server.	{"name": "test-0"}
    logger.go:130: 2024-03-06T03:27:47.684Z	INFO	closing server...	{"name": "test-0"}
    logger.go:130: 2024-03-06T03:27:47.684Z	INFO	stoping server...	{"name": "test-0"}
    logger.go:130: 2024-03-06T03:27:47.684Z	INFO	removing directory	{"data-dir": "/tmp/TestWatchDelayOnStreamMultiplex1761534390/002"}
    logger.go:130: 2024-03-06T03:27:47.715Z	INFO	closed test cluster.
--- PASS: TestWatchDelayOnStreamMultiplex (80.68s)
PASS

@k8s-ci-robot
Copy link

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@chaochn47 chaochn47 force-pushed the release-3.5-reproduce-watch-event-starvation branch 2 times, most recently from 334e373 to 6e370d3 Compare March 6, 2024 01:39
Signed-off-by: Chao Chen <chaochn@amazon.com>
@chaochn47 chaochn47 force-pushed the release-3.5-reproduce-watch-event-starvation branch from 6e370d3 to 49447be Compare March 6, 2024 05:23
@chaochn47
Copy link
Member Author

After the commit that logs more information about events are being skipped

etcd server log

{
    "level": "info",
    "ts": "2024-03-06T07:21:38.691929Z",
    "caller": "mvcc/watchable_store.go:371",
    "msg": "watcher has skipped events",
    "watcher-next-rev-to-accept": 2726,
    "watcher-min-rev-of-next-events-batch-to-sent": 184694,
    "watch-id": 0
}

Test log

   2723    2024-03-06T07:21:25.354Z   watch-cache     got watch response      {"event-type": "DELETE", "key": "/registry/pods/17/7", "rev": 2724}
   2724     2024-03-06T07:21:25.354Z  watch-cache     got watch response      {"event-type": "PUT", "key": "/registry/pods/35/7", "rev": 2725}
   2725     2024-03-06T07:21:38.695Z  watch-cache     got watch response      {"event-type": "DELETE", "key": "/registry/pods/64/463", "rev": 184694}
   2726     2024-03-06T07:21:38.695Z  watch-cache     got watch response      {"event-type": "PUT", "key": "/registry/pods/59/463", "rev": 184695}

The problem is:

If the shared watch response channel is full and this watcher next event revision is compacted, this slow/unsync'd watcher would skip all the events pending to send out due to this logic.

if w.minRev < compactRev {
select {
case w.ch <- WatchResponse{WatchID: w.id, CompactRevision: compactRev}:
w.compacted = true
wg.delete(w)
default:
// retry next time
}


func TestWatchDelayOnStreamMultiplex(t *testing.T) {
e2e.BeforeTest(t)
clus, err := e2e.NewEtcdProcessCluster(t, &e2e.EtcdProcessClusterConfig{ClusterSize: 1, LogLevel: "info"})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can reproduce it as well when ClientHttpSeparate: true. I was thinking that CPU resource or networking bandwitdh might impact this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for confirming.

@fuweid
Copy link
Member

fuweid commented Mar 6, 2024

If the shared watch response channel is full and this watcher next event revision is compacted, this slow/unsync'd watcher would skip all the events pending to send out due to this logic.

Maybe the client should close the watch and watch start from compact revision?
Because it's impossible to get the data between minRev and compactedRevsion.

rev := gresp.Header.Revision
watchCacheWatchOpts := append([]clientv3.OpOption{clientv3.WithCreatedNotify(), clientv3.WithRev(rev), clientv3.WithProgressNotify()}, commonWatchOpts...)
//lastTimeGotResponse := time.Now()
for wres := range c.Watch(rootCtx, watchCacheWatchKeyPrefix, watchCacheWatchOpts...) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my local, I didn't receive compact revision from watch channel.

Copy link
Member

@fuweid fuweid Mar 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assume that all the watchers need to send compact revision and the watcherGroup.choose returns math.Int64.
If the watch chan is full that is mentioned by @chaochn47, the client won't receive the ErrCompact error.
And the minRev will be updated to curRev + 1. Next time, the watcher will pick up [lastCurRev+1, curRev] events and miss [compactRevision, lastCurRev].

for w := range wg.watchers {
w.minRev = curRev + 1

If the watcherGroup.choose returns math.Int64, maybe we should skip syncWatchers and retry next time.
At least, the client should receive the ErrCompact.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not guaranteed the chooseAll would return the math.MaxInt64.

For example some watchers in the picked 512 watchers progress is faster than compactRev. The returned rev would be the minimum of revision that is bigger than compactRev.

if minRev > w.minRev {
minRev = w.minRev
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I was trying to use the following change to skip update minRev for compacted watcher.

// server/mvcc/watchable_store.go

 package mvcc


func (s *watchableStore) syncWatchers() int {
       // ...
       for w := range wg.watchers {
                if w.minRev < compactionRev {
                      // skip it and retry it later since w.ch is full now
                      continue
                }
		w.minRev = curRev + 1
                // ...
       }
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aha, got it. This should be a correct change. After testing, the compaction response could still be delayed as much as few minutes and the watch cache is stale for that long.

dev-dsk-chaochn-2c-a26acd76 % grep "compacted rev" $HOME/watch-lost.log
    logger.go:130: 2024-03-08T05:40:46.304Z	DEBUG	compaction	compacted rev	{"compact-revision": 45704}
    logger.go:130: 2024-03-08T05:40:57.027Z	DEBUG	compaction	compacted rev	{"compact-revision": 90291}
    logger.go:130: 2024-03-08T05:41:07.175Z	DEBUG	compaction	compacted rev	{"compact-revision": 131525}
    logger.go:130: 2024-03-08T05:41:16.660Z	DEBUG	compaction	compacted rev	{"compact-revision": 170723}

(24-03-08 5:42:17) <0> [~]
dev-dsk-chaochn-2c-a26acd76 % grep "got watch response error" $HOME/watch-lost.log
    logger.go:130: 2024-03-08T05:41:26.364Z	WARN	watch-cache	got watch response error	{"last-received-events-kv-mod-revision": 10644, "compact-revision": 170723, "error": "etcdserver: mvcc: required revision has been compacted"}

I was thinking we should send the compacted error watch response to another channel so it won't be blocked that long.

case wresp, ok := <-sws.watchStream.Chan():

but anyway we can make incremental changes to improve this.

Signed-off-by: Chao Chen <chaochn@amazon.com>
@chaochn47 chaochn47 force-pushed the release-3.5-reproduce-watch-event-starvation branch from 9df8bb5 to fa93dc3 Compare March 8, 2024 05:57
@chaochn47 chaochn47 closed this Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants