Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests: deflake TestV3WatchRestoreSnapshotUnsync #15667

Conversation

fuweid
Copy link
Member

@fuweid fuweid commented Apr 7, 2023

The TestV3WatchRestoreSnapshotUnsync setups three members' cluster. Before serving any update requests from client, after leader elected, each member will have index 8 log: 3 x ConfChange + 3 x ClusterMemberAttrSet + 1 x ClusterVersionSet.

Based on the config (SnapshotCount: 10, CatchUpCount: 5), we need to file update requests to trigger snapshot at least twice.

T1: L(snapshot-index: 11, compacted-index: 6) F_m0(index: 8)
T2: L(snapshot-index: 22, compacted-index: 17) F_m0(index: 8, out of date)

After member0 recovers from network partition, it will reject leader's request and return hint (index:8, term:x). If it happens after second snapshot, leader will find out the index:8 is out of date and force to transfer snapshot.

However, the client only files 15 update requests and leader doesn't finish the process of snapshot in time. Since the last of compacted-index is 6, leader can still replicate index:9 to member0 instead of snapshot.

cd tests/integration
CLUSTER_DEBUG=true go test -v -count=1 -run TestV3WatchRestoreSnapshotUnsync ./
...

INFO    m2.raft 3da8ba707f1a21a4 became leader at term 2        {"member": "m2"}
...
INFO    m2      triggering snapshot     {"member": "m2", "local-member-id": "3da8ba707f1a21a4", "local-member-applied-index": 22, "local-member-snapshot-index": 11, "local-member-snapshot-count": 10, "snapshot-forced": false}
...

cluster.go:1359: network partition between: 99626fe5001fde8b <-> 1c964119da6db036
cluster.go:1359: network partition between: 99626fe5001fde8b <-> 3da8ba707f1a21a4
cluster.go:416: WaitMembersForLeader

INFO    m0.raft 99626fe5001fde8b became follower at term 2      {"member": "m0"}
INFO    m0.raft raft.node: 99626fe5001fde8b elected leader 3da8ba707f1a21a4 at term 2   {"member": "m0"}
DEBUG   m2.raft 3da8ba707f1a21a4 received MsgAppResp(rejected, hint: (index 8, term 2)) from 99626fe5001fde8b for index 23      {"member": "m2"}
DEBUG   m2.raft 3da8ba707f1a21a4 decreased progress of 99626fe5001fde8b to [StateReplicate match=8 next=9 inflight=15]  {"member": "m2"}

DEBUG   m0      Applying entries        {"member": "m0", "num-entries": 15}
DEBUG   m0      Applying entry  {"member": "m0", "index": 9, "term": 2, "type": "EntryNormal"}

....

INFO    m2      saved snapshot  {"member": "m2", "snapshot-index": 22}
INFO    m2      compacted Raft logs     {"member": "m2", "compact-index": 17}

To fix this issue, the patch uses log monitor to watch "compacted Raft log" and
expect that two members should compact log twice.

Fixes: #15545

Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.

@@ -69,7 +73,7 @@ func TestV3WatchRestoreSnapshotUnsync(t *testing.T) {
t.Fatal(errW)
}
if err := wStream.Send(&pb.WatchRequest{RequestUnion: &pb.WatchRequest_CreateRequest{
CreateRequest: &pb.WatchCreateRequest{Key: []byte("foo"), StartRevision: 5}}}); err != nil {
CreateRequest: &pb.WatchCreateRequest{Key: []byte("foo"), StartRevision: 25}}}); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, is watching from 25 revision counted as "old revision that were created in synced watcher group in the first place"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Background is a 5 years old issue #9281

Copy link
Member Author

@fuweid fuweid Apr 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the link! Not sure I understand it correctly: we don't defrag compact it so the revision is still available in kv store. Is it correct?

Copy link
Member

@chaochn47 chaochn47 Apr 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, from my understanding, as long as the requested revision is not compacted, the key value is still available.

image

Here is a simple diagram. /cc @ahrtr @serathius in case the understanding or direction is wrong ==

@chaochn47
Copy link
Member

Great find!

// T2: L(snapshot-index: 22, compacted-index: 17), F_m0(index:8, out of date)
// T3: L(snapshot-index: 33, compacted-index: 28), F_mo(index:8, out of date)
//
// Since the snapshot is handled in GoAttach, we need to trigger
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key part for leader to send snapshot is to to get leader to compact index 8. If you change SnapshotCatchUpEntries to 1, this can be achieved in 1 call of EtcdServer.snapshot.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. It was my first patch. But I run into other error.

diff --git a/tests/integration/v3_watch_restore_test.go b/tests/integration/v3_watch_restore_test.go
index bdebeacfc..552f4aecf 100644
--- a/tests/integration/v3_watch_restore_test.go
+++ b/tests/integration/v3_watch_restore_test.go
@@ -57,7 +57,7 @@ func TestV3WatchRestoreSnapshotUnsync(t *testing.T) {
        clus := integration.NewCluster(t, &integration.ClusterConfig{
                Size:                   3,
                SnapshotCount:          10,
-               SnapshotCatchUpEntries: 5,
+               SnapshotCatchUpEntries: 1,
        })
        defer clus.Terminate(t)
➜  integration git:(main) ✗ taskset -c 0,1 go test -v -count=1000 -timeout=700m -failfast -run TestV3WatchRestoreSnapshotUnsync ./

... 
    logger.go:130: 2023-04-08T09:24:52.907+0800 INFO    m1      saved snapshot  {"member": "m1", "snapshot-index": 22}
    logger.go:130: 2023-04-08T09:24:52.907+0800 INFO    m1      compacted Raft logs     {"member": "m1", "compact-index": 21}
    logger.go:130: 2023-04-08T09:24:52.907+0800 INFO    m0.raft d6a69b975d0a7658 became follower at term 2      {"member": "m0"}
    logger.go:130: 2023-04-08T09:24:52.907+0800 INFO    m0.raft raft.node: d6a69b975d0a7658 elected leader c0c978b536d663c2 at term 2   {"member": "m0"}
{"level":"warn","ts":"2023-04-08T09:24:52.90773+0800","logger":"etcd-client","caller":"v3/retry_interceptor.go:65","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc00233d860/localhost:m0","method":"/etcdserverpb.KV/Range","attempt":0,"error":"rpc error: code = Unavailable desc = etcdserver: leader changed"}
    logger.go:130: 2023-04-08T09:24:52.911+0800 INFO    m2      saved snapshot  {"member": "m2", "snapshot-index": 22}
    logger.go:130: 2023-04-08T09:24:52.911+0800 INFO    m2      skip compaction since there is an inflight snapshot     {"member": "m2"}
    logger.go:130: 2023-04-08T09:24:52.911+0800 INFO    m2      sending database snapshot       {"member": "m2", "snapshot-index": 22, "remote-peer-id": "d6a69b975d0a7658", "bytes": 33666, "size": "34 kB"}
    logger.go:130: 2023-04-08T09:24:52.911+0800 INFO    m2      sending merged snapshot {"member": "m2", "from": "c0c978b536d663c2", "to": "d6a69b975d0a7658", "bytes": 33666, "size": "34 kB"}
    logger.go:130: 2023-04-08T09:24:52.911+0800 INFO    m2      sent database snapshot to writer        {"member": "m2", "bytes": 24576, "size": "25 kB"}
    logger.go:130: 2023-04-08T09:24:52.911+0800 INFO    m0      receiving database snapshot     {"member": "m0", "local-member-id": "d6a69b975d0a7658", "remote-snapshot-sender-id": "c0c978b536d663c2", "incoming-snapshot-index": 22, "incoming-snapshot-message-size-bytes": 9090, "incoming-snapshot-message-size": "9.1 kB"}

...

    logger.go:130: 2023-04-08T09:24:52.915+0800 INFO    m0      restoring lease store   {"member": "m0"}
    logger.go:130: 2023-04-08T09:24:52.917+0800 INFO    m0      restored lease store    {"member": "m0"}
    logger.go:130: 2023-04-08T09:24:52.917+0800 INFO    m0      restoring mvcc store    {"member": "m0"}
    logger.go:130: 2023-04-08T09:24:52.917+0800 INFO    m0      kvstore restored        {"member": "m0", "current-rev": 15}
    logger.go:130: 2023-04-08T09:24:52.917+0800 INFO    m0      restored mvcc store     {"member": "m0", "consistent-index": 22}
    logger.go:130: 2023-04-08T09:24:52.917+0800 INFO    m0      restoring alarm store   {"member": "m0"}
    logger.go:130: 2023-04-08T09:24:52.917+0800 INFO    m0      closing old backend file        {"member": "m0"}
    logger.go:130: 2023-04-08T09:24:52.918+0800 INFO    m0      restored alarm store    {"member": "m0"}
    logger.go:130: 2023-04-08T09:24:52.918+0800 INFO    m0      restoring auth store    {"member": "m0"}
    logger.go:130: 2023-04-08T09:24:52.918+0800 INFO    m0      restored auth store     {"member": "m0"}
    logger.go:130: 2023-04-08T09:24:52.918+0800 INFO    m0      restoring v2 store      {"member": "m0"}
    logger.go:130: 2023-04-08T09:24:52.919+0800 INFO    m0      restored v2 store       {"member": "m0"}

...

    v3_watch_restore_test.go:128: sleeping for 2 seconds DONE
    v3_watch_restore_test.go:153: wStream.Recv error: expected 12 events, got [kv:<key:"foo" create_revision:2 mod_revision:5 version:4 value:"bar" >  kv:<key:"foo" create_revision:2 mod_revision:6 version:5 value:"bar" >  kv:<key:"foo" create_revision:2 mod_revision:7 version:6 value:"bar" >  kv:<key:"foo" create_revision:2 mod_revision:8 version:7 value:"bar" >  kv:<key:"foo" create_revision:2 mod_revision:9 version:8 value:"bar" >  kv:<key:"foo" create_revision:2 mod_revision:10 version:9 value:"bar" >  kv:<key:"foo" create_revision:2 mod_revision:11 version:10 value:"bar" >  kv:<key:"foo" create_revision:2 mod_revision:12 version:11 value:"bar" >  kv:<key:"foo" create_revision:2 mod_revision:13 version:12 value:"bar" >  kv:<key:"foo" create_revision:2 mod_revision:14 version:13 value:"bar" >  kv:<key:"foo" create_revision:2 mod_revision:15 version:14 value:"bar" > ]

m0 should recover to have (consistent-index:23, current-rev:16). But it got (consisten-index:22, curent-rev:15).
It is timing issue.

merged := s.createMergedSnapshotMessage(m, ep.appliedt, ep.appliedi, ep.confState)

The server uses old index which doesn't match the snapshot content. Not sure that it is bug or fault-tolerant. Finally, it can catch-up. However, the watch event is not expected. cc @ahrtr

So I try to make this patch to let it has time to handle memory compact.
I am still testing it with limit core by taskset -c. Will update it later.
(If it doesn't work, I will switch to watch the log...)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, the watch event is not expected. cc @ahrtr

Sorry, I do not get the point, and also did not get time dig into it for now. Please file a separate issue if you think it's an issue or something you can't explain or understand.

_, err := kvc.Put(context.TODO(), &pb.PutRequest{Key: []byte("foo"), Value: []byte("bar")})
if err != nil {
t.Errorf("#%d: couldn't put key (%v)", i, err)
}
}
// The 33 is latest snapshot index.
ensureTriggerSnapshot(t, clus.Members[initialLead], 33, 5*time.Second)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally this should wait for leader to compact index 8, but there is no way to check for that.
Writing snapshot files isn't enough to trigger snapshot send. For example if you set SnapshotCatchUpEntries to smth high, new version of the test will still fail. Maybe you can add a comment that implicitly it's also waiting for compaction.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switch to a solution which monitors testing log to ensure cluster has compacted the raft log. please take a look. Thanks

@fuweid fuweid force-pushed the deflake-issue-15545-TestV3WatchRestoreSnapshotUnsync branch 2 times, most recently from c179c95 to 4632d29 Compare April 8, 2023 09:04
@fuweid fuweid marked this pull request as ready for review April 8, 2023 13:38
Copy link

@lavacat lavacat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, some nitpicks

tests/integration/v3_watch_restore_test.go Outdated Show resolved Hide resolved
tests/integration/v3_watch_restore_test.go Outdated Show resolved Hide resolved
}

// testingLogfMonitor is to monitor t.Logf output.
type testingLogfMonitor struct {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice implementation, didn't know it's possible to get to all logs in integration (there is AssertProcessLogs for e2e).

It seams like this is implemented to be reused in other tests. Should this be moved to util_test.go? cc @serathius

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

func memberLogger(t testutil.TB, name string) *zap.Logger {

I think we can extend the zaptest.Core with tee function. It can be aligned with AssertProcessLogs for process expecter~

tests/integration/v3_watch_restore_test.go Outdated Show resolved Hide resolved
tests/integration/v3_watch_restore_test.go Outdated Show resolved Hide resolved
@fuweid fuweid force-pushed the deflake-issue-15545-TestV3WatchRestoreSnapshotUnsync branch 2 times, most recently from 193eead to df83241 Compare April 9, 2023 12:30
tests/integration/v3_watch_restore_test.go Show resolved Hide resolved
Comment on lines 98 to 99
// NOTE: In 3 members cluster, after initial lead has been elected,
// there are 3xConfChange + 3xMemberAttrSet + 1xClusterVersionSet logs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't correct. Note that when starting a new cluster with 3 members, each member will apply 3 ConfChange directly at the beginning before a leader is elected.

Please read https://github.com/etcd-io/raft/blob/918455b897764d8e4d12af70657280365ffaaa04/bootstrap.go#L76-L78

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I will update it later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated!

Comment on lines 136 to 138
if lead != initialLead {
t.Fatalf("expected leader index (%v), but got (%v)", initialLead, lead)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, the leader might change, although usually it will not. So suggest to remove it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will revert this part.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated! Please take a look. Thanks!

@ahrtr
Copy link
Member

ahrtr commented Apr 10, 2023

Thanks @fuweid

The lineCountExpecter and testingLogfMonitor seem like a generic solution, should be moved into test framework, so that it can be reused.

But my immediate feeling is it's a little over complicated, can we just increase the timeout (e.g. to 20s)? And also wait more time when receiving events: (1) add a timeout, e.g. 15s, and (2) cache the partial event.

go func() {
cresp, cerr := wStream.Recv()
if cerr != nil {
errc <- cerr
return
}
// from start revision 5 to latest revision 16
if len(cresp.Events) != 12 {
errc <- fmt.Errorf("expected 12 events, got %+v", cresp.Events)
return
}
errc <- nil
}()

@fuweid
Copy link
Member Author

fuweid commented Apr 10, 2023

The lineCountExpecter and testingLogfMonitor seem like a generic solution, should be moved into test framework, so that it can be reused.

Hi @ahrtr, I can do it in the follow-up. This is to fix the flaky test. I want to keep it small. And I also want to add function helper for in-process member so that we can capture the log for a given member, instead of test case level. Does it make senses to you?

But my immediate feeling is it's a little over complicated, can we just increase the timeout (e.g. to 20s)? And also wait more time when receiving events: (1) add a timeout, e.g. 15s, and (2) cache the partial event.

Basically, the problem is that after recover network partition, the leader hasn't compacted the raft log yet. The member 0 receives the heartbeat but it will refuse the appendEntries and return hint (Term 2, Index: 8). And then leader still can find the (index:8) so that leader applies the index:(9,10,...) to member 0 instead of snapshot.

We can sleep few seconds before recover network partition. But it might be flaky somehow and the case takes more time.
And there is no way to check the server has compacted raft log without watching log. So based on my concern, I think we should use expect-style to verify log, just like what we do in e2e. WDYT?

@ahrtr
Copy link
Member

ahrtr commented Apr 10, 2023

The lineCountExpecter and testingLogfMonitor seem like a generic solution, should be moved into test framework, so that it can be reused.

Hi @ahrtr, I can do it in the follow-up.

OK.

Basically, the problem is that after recover network partition, the leader hasn't compacted the raft log yet.

Got it. The reason is etcdserver creates snapshot & compact raft log asynchronously.

@ahrtr
Copy link
Member

ahrtr commented Apr 10, 2023

Overall looks good to me. Great finding. @fuweid Please resolve the minor comments.

The TestV3WatchRestoreSnapshotUnsync setups three members' cluster.
Before serving any update requests from client, after leader elected,
each member will have index 8 log: 3 x ConfChange +
3 x ClusterMemberAttrSet + 1 x ClusterVersionSet.

Based on the config (SnapshotCount: 10, CatchUpCount: 5), we need to
file update requests to trigger snapshot at least twice.

T1: L(snapshot-index: 11, compacted-index:  6) F_m0(index: 8)
T2: L(snapshot-index: 22, compacted-index: 17) F_m0(index: 8, out of date)

After member0 recovers from network partition, it will reject leader's
request and return hint (index:8, term:x). If it happens after
second snapshot, leader will find out the index:8 is out of date and
force to transfer snapshot.

However, the client only files 15 update requests and leader doesn't
finish the process of snapshot in time. Since the last of
compacted-index is 6, leader can still replicate index:9 to member0
instead of snapshot.

```bash
cd tests/integration
CLUSTER_DEBUG=true go test -v -count=1 -run TestV3WatchRestoreSnapshotUnsync ./
...

INFO    m2.raft 3da8ba707f1a21a4 became leader at term 2        {"member": "m2"}
...
INFO    m2      triggering snapshot     {"member": "m2", "local-member-id": "3da8ba707f1a21a4", "local-member-applied-index": 22, "local-member-snapshot-index": 11, "local-member-snapshot-count": 10, "snapshot-forced": false}
...

cluster.go:1359: network partition between: 99626fe5001fde8b <-> 1c964119da6db036
cluster.go:1359: network partition between: 99626fe5001fde8b <-> 3da8ba707f1a21a4
cluster.go:416: WaitMembersForLeader

INFO    m0.raft 99626fe5001fde8b became follower at term 2      {"member": "m0"}
INFO    m0.raft raft.node: 99626fe5001fde8b elected leader 3da8ba707f1a21a4 at term 2   {"member": "m0"}
DEBUG   m2.raft 3da8ba707f1a21a4 received MsgAppResp(rejected, hint: (index 8, term 2)) from 99626fe5001fde8b for index 23      {"member": "m2"}
DEBUG   m2.raft 3da8ba707f1a21a4 decreased progress of 99626fe5001fde8b to [StateReplicate match=8 next=9 inflight=15]  {"member": "m2"}

DEBUG   m0      Applying entries        {"member": "m0", "num-entries": 15}
DEBUG   m0      Applying entry  {"member": "m0", "index": 9, "term": 2, "type": "EntryNormal"}

....

INFO    m2      saved snapshot  {"member": "m2", "snapshot-index": 22}
INFO    m2      compacted Raft logs     {"member": "m2", "compact-index": 17}
```

To fix this issue, the patch uses log monitor to watch "compacted Raft
log" and expect that two members should compact log twice.

Fixes: etcd-io#15545

Signed-off-by: Wei Fu <fuweid89@gmail.com>
@fuweid fuweid force-pushed the deflake-issue-15545-TestV3WatchRestoreSnapshotUnsync branch from df83241 to 536953e Compare April 10, 2023 14:45
@chaochn47
Copy link
Member

LGTM.

FYI: Robustness test triggering snapshot can also be merged into raft log compaction log assertion style if the 100 milliseconds wait turns out to be flaky.

// Have to refresh blackholedMemberRevision. It can still increase as blackholedMember processes changes that are received but not yet applied.
blackholedMemberRevision, err := latestRevisionForEndpoint(ctx, blackholedMemberClient)
if err != nil {
return err
}
clusterRevision, err := latestRevisionForEndpoint(ctx, clusterClient)
if err != nil {
return err
}
t.Logf("clusterRevision: %d, blackholedMemberRevision: %d", clusterRevision, blackholedMemberRevision)
// Blackholed member has to be sufficiently behind to trigger snapshot transfer.
// Need to make sure leader compacted latest revBlackholedMem inside EtcdServer.snapshot.
// That's why we wait for clus.Cfg.SnapshotCount (to trigger snapshot) + clus.Cfg.SnapshotCatchUpEntries (EtcdServer.snapshot compaction offset)
if clusterRevision-blackholedMemberRevision > int64(clus.Cfg.SnapshotCount+clus.Cfg.SnapshotCatchUpEntries) {
break
}
time.Sleep(100 * time.Millisecond)

Copy link
Member

@ahrtr ahrtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a couple of minor comments, which can be resolved in a followup PR.

}
}

func (m *testingLogfMonitor) addSubscriber(id string, sub testingLogfSubscriber) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comment: change addSubscriber to register, and delSubscriber to deregister

delete(m.subscribers, id)
}

func (m *testingLogfMonitor) Logf(format string, args ...interface{}) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a generic solution, probably we should implementa Log as well.

@ahrtr ahrtr merged commit 1683231 into etcd-io:main Apr 10, 2023
@fuweid fuweid deleted the deflake-issue-15545-TestV3WatchRestoreSnapshotUnsync branch April 11, 2023 08:17
@fuweid
Copy link
Member Author

fuweid commented Apr 11, 2023

Thanks @ahrtr @chaochn47 @lavacat for the review. I will file a pr for the followup.

fuweid added a commit to fuweid/etcd that referenced this pull request Apr 14, 2023
It's followup for etcd-io#15667.

This patch is to use zaptest/observer as base to provide a similar
function to pkg/expect.Expect. By default, it's disable for reducing
memory cost.

Signed-off-by: Wei Fu <fuweid89@gmail.com>
fuweid added a commit to fuweid/etcd that referenced this pull request Apr 14, 2023
It's followup of etcd-io#15667.

This patch is to use zaptest/observer as base to provide a similar
function to pkg/expect.Expect. By default, it's disable for reducing
memory cost.

Signed-off-by: Wei Fu <fuweid89@gmail.com>
fuweid added a commit to fuweid/etcd that referenced this pull request Apr 15, 2023
It's followup of etcd-io#15667.

This patch is to use zaptest/observer as base to provide a similar
function to pkg/expect.Expect. By default, it's disable for reducing
memory cost.

Signed-off-by: Wei Fu <fuweid89@gmail.com>
fuweid added a commit to fuweid/etcd that referenced this pull request Apr 15, 2023
It's followup of etcd-io#15667.

This patch is to use zaptest/observer as base to provide a similar
function to pkg/expect.Expect.

Signed-off-by: Wei Fu <fuweid89@gmail.com>
fuweid added a commit to fuweid/etcd that referenced this pull request Apr 15, 2023
It's followup of etcd-io#15667.

This patch is to use zaptest/observer as base to provide a similar
function to pkg/expect.Expect.

Before change

```bash
11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
/usr/bin/time -v  taskset -c 0,1,2 go test -count=1 ./integration/...
        Elapsed (wall clock) time (h:mm:ss or m:ss): 6:53.11
        Maximum resident set size (kbytes): 331736
```

After change

```bash
11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
 /usr/bin/time -v  taskset -c 0,1,2 go test -count=1 ./integration/...
        Elapsed (wall clock) time (h:mm:ss or m:ss): 6:59.73
        Maximum resident set size (kbytes): 325832
```

It won't increase integration run-time too much.

Signed-off-by: Wei Fu <fuweid89@gmail.com>
fuweid added a commit to fuweid/etcd that referenced this pull request Apr 17, 2023
It's followup of etcd-io#15667.

This patch is to use zaptest/observer as base to provide a similar
function to pkg/expect.Expect.

Before change

```bash
11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
/usr/bin/time -v  taskset -c 0,1,2 go test -count=1 ./integration/...
        Elapsed (wall clock) time (h:mm:ss or m:ss): 6:53.11
        Maximum resident set size (kbytes): 331736
```

After change

```bash
11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
 /usr/bin/time -v  taskset -c 0,1,2 go test -count=1 ./integration/...
        Elapsed (wall clock) time (h:mm:ss or m:ss): 6:59.73
        Maximum resident set size (kbytes): 325832
```

Signed-off-by: Wei Fu <fuweid89@gmail.com>
fuweid added a commit to fuweid/etcd that referenced this pull request Apr 17, 2023
It's followup of etcd-io#15667.

This patch is to use zaptest/observer as base to provide a similar
function to pkg/expect.Expect.

The test env

```bash
11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
mkdir /sys/fs/cgroup/etcd-followup-15667
echo 0-2 | tee /sys/fs/cgroup/etcd-followup-15667/cpuset.cpus # three cores
```

Before change:

* memory.peak: ~ 681 MiB
* Elapsed (wall clock) time (h:mm:ss or m:ss): 6:14.04

After change:

* memory.peak: ~ 671 MiB
* Elapsed (wall clock) time (h:mm:ss or m:ss): 6:13.07

Based on the test result, I think it's safe to be enabled by default.

Signed-off-by: Wei Fu <fuweid89@gmail.com>
fuweid added a commit to fuweid/etcd that referenced this pull request Apr 18, 2023
It's followup of etcd-io#15667.

This patch is to use zaptest/observer as base to provide a similar
function to pkg/expect.Expect.

The test env

```bash
11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
mkdir /sys/fs/cgroup/etcd-followup-15667
echo 0-2 | tee /sys/fs/cgroup/etcd-followup-15667/cpuset.cpus # three cores
```

Before change:

* memory.peak: ~ 681 MiB
* Elapsed (wall clock) time (h:mm:ss or m:ss): 6:14.04

After change:

* memory.peak: ~ 671 MiB
* Elapsed (wall clock) time (h:mm:ss or m:ss): 6:13.07

Based on the test result, I think it's safe to be enabled by default.

Signed-off-by: Wei Fu <fuweid89@gmail.com>
serathius added a commit that referenced this pull request Apr 18, 2023
tests: make log monitor as common helper (followup #15667
@jmhbnz jmhbnz mentioned this pull request Sep 25, 2023
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

TestV3WatchRestoreSnapshotUnsync fails
4 participants