Remove path from gofailpoints #14727

serathius · 2022-11-10T18:56:30Z

What would you like to be added?

compiling etcd with FAILPOINTS=true make build results in expanding etcd with failpoints thanks to https://github.com/etcd-io/gofail library. Problem is that gofail injects package path to name of failpoint. This causes two problems.

First gofail was created before go1.12 and uses path from GOPATH and not go.mod. As result path is different between CI and local development.
Second code path changes as code is refactored between etcd releases. This is a problem as we want to be able to reproduce issues on previous supported releases and it creates a lot of toil to update paths.

This leads into problems where name of failpoint changes depending on your local machine setup or etcd version. For example same fail raftBeforeSave is:

etcdserver/raftBeforeSave - main branch in CI.
etcd.io/etcd/server/etcdserver/raftBeforeSave - main branch on my local machine.
etcd.io/etcd/etcdserver/raftBeforeSave - release-3.4 on local machine

Proposal:
Remove path from failpoints. Failpoints name should be unique, but we can achieve it by validating user provided names during binary bootstrap instead of generating unique by adding prefix.

Why is this needed?

This will allow to run linearizability tests on every developer setup and every etcd version without requiring code changes.

The text was updated successfully, but these errors were encountered:

lavacat · 2022-11-14T10:05:16Z

I'll take a look at this if nobody else is looking. Conceptually I understand what you mean, but need to spend more time understanding the setup.

serathius · 2022-11-14T11:55:09Z

Thanks for looking into it. I think the change should be pretty simple. Feel free to reach out on slack if you have any questions.

lavacat · 2022-11-24T17:45:05Z

Quick update - I think I got this working but need to wait for an extra approval internally to submit PR since it's not part of etcd repo. If somebody wants to take over, that's fine with me. The change isn't very large. Otherwise I'll submit in a week or so.

ahrtr · 2022-11-27T08:50:35Z

I tend to keep the package path. It's clearer which package each failpoint belongs to, and Users can define duplicated failpoints in different packages.

There is already an issue , and already a PR for it.

serathius · 2022-11-27T19:57:28Z

I proposed to drop path name as it will really complicate maintaining failure injection for older etcd releases. Instead of unifying all failpoints and reusing code, we will need to have a separate failpoint list for each release.

ahrtr · 2022-11-27T21:44:23Z

For anyone information... all the existing failpoints are as below,

$ curl -L http://localhost:22381
backend/afterCommit=
backend/afterStartDBTxn=
backend/afterWritebackBuf=
backend/beforeCommit=
backend/beforeStartDBTxn=
backend/beforeWritebackBuf=
backend/commitAfterPreCommitHook=
backend/commitBeforePreCommitHook=
backend/defragBeforeCopy=
backend/defragBeforeRename=
etcdserver/raftAfterApplySnap=
etcdserver/raftAfterSave=
etcdserver/raftAfterSaveSnap=
etcdserver/raftAfterWALRelease=
etcdserver/raftBeforeApplySnap=
etcdserver/raftBeforeFollowerSend=
etcdserver/raftBeforeLeaderSend=
etcdserver/raftBeforeSave=
etcdserver/raftBeforeSaveSnap=
mvcc/compactAfterCommitBatch=
mvcc/compactAfterCommitScheduledCompact=
mvcc/compactAfterSetFinishedCompact=
mvcc/compactBeforeCommitBatch=
mvcc/compactBeforeCommitScheduledCompact=
mvcc/compactBeforeSetFinishedCompact=

Three proposals.

1. Remove the package

It will look like below.

$ curl -L http://localhost:22381
afterCommit=
afterStartDBTxn=
afterWritebackBuf=
...

It looks concise and consistent across releases, but obviously it isn't clear which package each failpoint belong to based on the output.

It also requires unique failpoint name across the repo; in other words, even different packages can't use the same failpoint name. So it raises additional requirement/limitation on the failpoint names.

2. Keep the packge

It will look like below,

$ curl -L http://localhost:22381
go.etcd.io/etcd/server/v3/storage/backend/afterCommit=
go.etcd.io/etcd/server/v3/storage/backend/afterStartDBTxn=
go.etcd.io/etcd/server/v3/storage/backend/afterWritebackBuf=
...

It's clearer. Of course, different releases may have different package structures, but it seems not a big problem.

Usually we don't change the package structures frequently, and it makes sense to update the test case (e.g. linearizablity) when we change the package structure. We can change the test implementation to make the impact minimum.

There are 25 failpoints for now, it seems it isn't too much effort to maintain different failpoints for different releases(3.5 and probably 3.4).

3. Keep the base directory name

It will look like below. It's actually the current behavior, but the code has bug. We need to fix it and make it explicit. This is a compromised solution. Usually we can tell the package by the directory name.

$ curl -L http://localhost:22381
backend/afterCommit=
backend/afterStartDBTxn=
backend/afterWritebackBuf=
...

Personally I prefer to # 2, and fallback to # 3.

serathius · 2022-11-28T09:13:38Z

It looks concise and consistent across releases, but obviously it isn't clear which package each failpoint belong to based on the output.

Knowing which package failpoints belongs doesn't seem like a real con. It doesn't diminish any value and knowing it's package doesn't help in any case. You could say that it's easier to discover but I disagree. I don't expect anyone to use package name to find the failpoints, I expect them to grep the code with a failpoint name.

However, if you grep codebase for go.etcd.io/etcd/server/v3/storage/backend/afterCommit or backend/afterCommit you will not find anything because failpoint is identified by // gofail: var afterCommit struct{}. Prefix is artificial.

I think it's important that failpoint name in comment and the resulting failpoint in binary is the same. It's makes it easy to find without knowing which part of failpoint name is autogenerated which not.

Unique naming across packages is also not an downside. I think you don't distinguish between autogenerated failpoints name and self set naming convention. You can set a convention that failpoints have name based on local context and it can be more accurate than directory name. For example etcdserver/raftAfterSave. raft prefix is explicitly added because directory name doesn't provide any context. For failpoints like backend/afterCommit you can just rename them to backendAfterCommit. This way we improve the naming as names are shorter while providing same amount of context.

ahrtr · 2022-11-28T19:40:09Z

The points do not stand at all to me. Partially auto-generated prefix isn't a problem at all, as long as the rule is open and clear. Automatically generating a prefix is even better to me, because it guarantees that each failpoint always has a "category" prefixed, and users do not need to worry about it at all.

For me, backend/afterCommit definitely looks better and clearer than backendAfterCommit. Generally speaking, it's more like a problem of personal taste, so it isn't a big deal. Let's see what other maintainers say. cc @ptabor @spzala

lavacat · 2022-12-04T20:32:29Z

Submitted draft PR with the version of the code without package. I had this implemented before discussion about keeping package name.
I don't have a strong preference about keeping or removing package, happy to modify PR when discussion is finalized. Will add tests as well.

serathius · 2022-12-24T19:41:14Z

Implemented in etcd-io/gofail#44

serathius added type/feature help wanted priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Nov 10, 2022

serathius mentioned this issue Nov 10, 2022

Introduce etcd linearizability tests #14045

Closed

33 tasks

serathius assigned lavacat Nov 14, 2022

serathius mentioned this issue Nov 26, 2022

Release v0.1.0 etcd-io/gofail#33

Closed

lavacat mentioned this issue Dec 4, 2022

remove path from failpoint etcd-io/gofail#44

Merged

serathius closed this as completed Dec 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove path from gofailpoints #14727

Remove path from gofailpoints #14727

serathius commented Nov 10, 2022

lavacat commented Nov 14, 2022

serathius commented Nov 14, 2022

lavacat commented Nov 24, 2022

ahrtr commented Nov 27, 2022

serathius commented Nov 27, 2022

ahrtr commented Nov 27, 2022 •

edited

Loading

serathius commented Nov 28, 2022

ahrtr commented Nov 28, 2022

lavacat commented Dec 4, 2022

serathius commented Dec 24, 2022

Remove path from gofailpoints #14727

Remove path from gofailpoints #14727

Comments

serathius commented Nov 10, 2022

What would you like to be added?

Why is this needed?

lavacat commented Nov 14, 2022

serathius commented Nov 14, 2022

lavacat commented Nov 24, 2022

ahrtr commented Nov 27, 2022

serathius commented Nov 27, 2022

ahrtr commented Nov 27, 2022 • edited Loading

Three proposals.

1. Remove the package

2. Keep the packge

3. Keep the base directory name

serathius commented Nov 28, 2022

ahrtr commented Nov 28, 2022

lavacat commented Dec 4, 2022

serathius commented Dec 24, 2022

ahrtr commented Nov 27, 2022 •

edited

Loading