Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests/robustness: enhance compact failpoint #16310

Merged
merged 1 commit into from
Jul 26, 2023
Merged

Conversation

fuweid
Copy link
Member

@fuweid fuweid commented Jul 26, 2023

If the cluster serves requests slowly, the database has few revision number and then Compact won't trigger BatchCommit. Add a loop to check the last revision is big enough to trigger panic.

REF: #14691 (comment)

Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.

@@ -103,6 +103,18 @@ func validateFailpoint(clus *e2e.EtcdProcessCluster, failpoint Failpoint) error
return nil
}

func addMinimalRevForCompactFailpoint(failpoint Failpoint, batchNum int) Failpoint {
switch failpoint.Name() {
case "compactAfterCommitBatch", "compactBeforeCommitBatch":
Copy link
Member

@serathius serathius Jul 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using this decorator and doing name matching, can you add a flag?

var (
	CompactBeforeCommitBatchPanic            Failpoint = goPanicFailpoint{"compactBeforeCommitBatch", triggerCompact{waitForRevision: true}, AnyMember}
	CompactAfterCommitBatchPanic             Failpoint = goPanicFailpoint{"compactAfterCommitBatch", triggerCompact{waitForRevision: true}, AnyMember}
)

type triggerCompact struct {
	waitForRevision bool
}

	
func (t triggerCompact) Trigger(_ *testing.T, ctx context.Context, member e2e.EtcdProcess, _ *e2e.EtcdProcessCluster) error {
	cc, err := clientv3.New(clientv3.Config{
		Endpoints:            member.EndpointsGRPC(),
		Logger:               zap.NewNop(),
		DialKeepAliveTime:    10 * time.Second,
		DialKeepAliveTimeout: 100 * time.Millisecond,
	})
	if err != nil {
		return fmt.Errorf("failed creating client: %w", err)
	}
	defer cc.Close()
	if t.waitForRevision {
		for {
			resp, err := cc.Get(ctx, "/")
			if err != nil {
				return err
			}
			if resp.Header.Revision > member.Cfg.CompactionBatchLimit {
				break
			}
			time.Sleep(50 * time.Millisecond)
		}	
	}

	_, err = cc.Compact(ctx, rev)
	if err != nil && !strings.Contains(err.Error(), "error reading from server: EOF") {
		return err
	}
	return nil
}

@serathius
Copy link
Member

Idea is good, just need to improve implementation.

@@ -338,9 +338,11 @@ func (t triggerDefrag) Available(e2e.EtcdProcessClusterConfig, e2e.EtcdProcess)
return true
}

type triggerCompact struct{}
type triggerCompact struct {
waitForRevision bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

waitForRevision was a working name. Something like multiBatchCompaction would be better as we are basically waiting for revision to be big enough so compaction needs to be done in multiple batches.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multiBatchCompaction looks better and more accurate. let me update it.

If the cluster serves requests slowly, the database has few revision
number and then Compact won't trigger BatchCommit. Add a loop to check
the last revision is big enough to trigger panic.

Signed-off-by: Wei Fu <fuweid89@gmail.com>
@serathius serathius merged commit 6aad508 into etcd-io:main Jul 26, 2023
27 checks passed
@jmhbnz jmhbnz mentioned this pull request Sep 25, 2023
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants