Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove path from gofailpoints #14727

Closed
Tracked by #14045
serathius opened this issue Nov 10, 2022 · 10 comments
Closed
Tracked by #14045

Remove path from gofailpoints #14727

serathius opened this issue Nov 10, 2022 · 10 comments
Assignees
Labels
help wanted priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. type/feature

Comments

@serathius
Copy link
Member

What would you like to be added?

compiling etcd with FAILPOINTS=true make build results in expanding etcd with failpoints thanks to https://github.com/etcd-io/gofail library. Problem is that gofail injects package path to name of failpoint. This causes two problems.

First gofail was created before go1.12 and uses path from GOPATH and not go.mod. As result path is different between CI and local development.
Second code path changes as code is refactored between etcd releases. This is a problem as we want to be able to reproduce issues on previous supported releases and it creates a lot of toil to update paths.

This leads into problems where name of failpoint changes depending on your local machine setup or etcd version. For example same fail raftBeforeSave is:

  • etcdserver/raftBeforeSave - main branch in CI.
  • etcd.io/etcd/server/etcdserver/raftBeforeSave - main branch on my local machine.
  • etcd.io/etcd/etcdserver/raftBeforeSave - release-3.4 on local machine

Proposal:
Remove path from failpoints. Failpoints name should be unique, but we can achieve it by validating user provided names during binary bootstrap instead of generating unique by adding prefix.

Why is this needed?

This will allow to run linearizability tests on every developer setup and every etcd version without requiring code changes.

@serathius serathius added type/feature help wanted priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Nov 10, 2022
@lavacat
Copy link

lavacat commented Nov 14, 2022

I'll take a look at this if nobody else is looking. Conceptually I understand what you mean, but need to spend more time understanding the setup.

@serathius
Copy link
Member Author

Thanks for looking into it. I think the change should be pretty simple. Feel free to reach out on slack if you have any questions.

@lavacat
Copy link

lavacat commented Nov 24, 2022

Quick update - I think I got this working but need to wait for an extra approval internally to submit PR since it's not part of etcd repo. If somebody wants to take over, that's fine with me. The change isn't very large. Otherwise I'll submit in a week or so.

@ahrtr
Copy link
Member

ahrtr commented Nov 27, 2022

I tend to keep the package path. It's clearer which package each failpoint belongs to, and Users can define duplicated failpoints in different packages.

There is already an issue , and already a PR for it.

@serathius
Copy link
Member Author

I proposed to drop path name as it will really complicate maintaining failure injection for older etcd releases. Instead of unifying all failpoints and reusing code, we will need to have a separate failpoint list for each release.

@ahrtr
Copy link
Member

ahrtr commented Nov 27, 2022

For anyone information... all the existing failpoints are as below,

$ curl -L http://localhost:22381
backend/afterCommit=
backend/afterStartDBTxn=
backend/afterWritebackBuf=
backend/beforeCommit=
backend/beforeStartDBTxn=
backend/beforeWritebackBuf=
backend/commitAfterPreCommitHook=
backend/commitBeforePreCommitHook=
backend/defragBeforeCopy=
backend/defragBeforeRename=
etcdserver/raftAfterApplySnap=
etcdserver/raftAfterSave=
etcdserver/raftAfterSaveSnap=
etcdserver/raftAfterWALRelease=
etcdserver/raftBeforeApplySnap=
etcdserver/raftBeforeFollowerSend=
etcdserver/raftBeforeLeaderSend=
etcdserver/raftBeforeSave=
etcdserver/raftBeforeSaveSnap=
mvcc/compactAfterCommitBatch=
mvcc/compactAfterCommitScheduledCompact=
mvcc/compactAfterSetFinishedCompact=
mvcc/compactBeforeCommitBatch=
mvcc/compactBeforeCommitScheduledCompact=
mvcc/compactBeforeSetFinishedCompact=

Three proposals.

1. Remove the package

It will look like below.

$ curl -L http://localhost:22381
afterCommit=
afterStartDBTxn=
afterWritebackBuf=
...

It looks concise and consistent across releases, but obviously it isn't clear which package each failpoint belong to based on the output.

It also requires unique failpoint name across the repo; in other words, even different packages can't use the same failpoint name. So it raises additional requirement/limitation on the failpoint names.

2. Keep the packge

It will look like below,

$ curl -L http://localhost:22381
go.etcd.io/etcd/server/v3/storage/backend/afterCommit=
go.etcd.io/etcd/server/v3/storage/backend/afterStartDBTxn=
go.etcd.io/etcd/server/v3/storage/backend/afterWritebackBuf=
...

It's clearer. Of course, different releases may have different package structures, but it seems not a big problem.

Usually we don't change the package structures frequently, and it makes sense to update the test case (e.g. linearizablity) when we change the package structure. We can change the test implementation to make the impact minimum.

There are 25 failpoints for now, it seems it isn't too much effort to maintain different failpoints for different releases(3.5 and probably 3.4).

3. Keep the base directory name

It will look like below. It's actually the current behavior, but the code has bug. We need to fix it and make it explicit. This is a compromised solution. Usually we can tell the package by the directory name.

$ curl -L http://localhost:22381
backend/afterCommit=
backend/afterStartDBTxn=
backend/afterWritebackBuf=
...

Personally I prefer to # 2, and fallback to # 3.

@serathius
Copy link
Member Author

It looks concise and consistent across releases, but obviously it isn't clear which package each failpoint belong to based on the output.

Knowing which package failpoints belongs doesn't seem like a real con. It doesn't diminish any value and knowing it's package doesn't help in any case. You could say that it's easier to discover but I disagree. I don't expect anyone to use package name to find the failpoints, I expect them to grep the code with a failpoint name.

However, if you grep codebase for go.etcd.io/etcd/server/v3/storage/backend/afterCommit or backend/afterCommit you will not find anything because failpoint is identified by // gofail: var afterCommit struct{}. Prefix is artificial.

I think it's important that failpoint name in comment and the resulting failpoint in binary is the same. It's makes it easy to find without knowing which part of failpoint name is autogenerated which not.

Unique naming across packages is also not an downside. I think you don't distinguish between autogenerated failpoints name and self set naming convention. You can set a convention that failpoints have name based on local context and it can be more accurate than directory name. For example etcdserver/raftAfterSave. raft prefix is explicitly added because directory name doesn't provide any context. For failpoints like backend/afterCommit you can just rename them to backendAfterCommit. This way we improve the naming as names are shorter while providing same amount of context.

@ahrtr
Copy link
Member

ahrtr commented Nov 28, 2022

The points do not stand at all to me. Partially auto-generated prefix isn't a problem at all, as long as the rule is open and clear. Automatically generating a prefix is even better to me, because it guarantees that each failpoint always has a "category" prefixed, and users do not need to worry about it at all.

For me, backend/afterCommit definitely looks better and clearer than backendAfterCommit. Generally speaking, it's more like a problem of personal taste, so it isn't a big deal. Let's see what other maintainers say. cc @ptabor @spzala

@lavacat
Copy link

lavacat commented Dec 4, 2022

Submitted draft PR with the version of the code without package. I had this implemented before discussion about keeping package name.
I don't have a strong preference about keeping or removing package, happy to modify PR when discussion is finalized. Will add tests as well.

@serathius
Copy link
Member Author

Implemented in etcd-io/gofail#44

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. type/feature
Development

No branches or pull requests

3 participants