Improve support for failed requests in linearizability tests #14880

serathius · 2022-12-01T22:00:31Z

Based on recommendations in porcupine issues, failed requests should be marked as they failed at the end of test. This allows to stop aggregating errors within model making it:

Much simpler to implement. Handling all the edge cases turned out to be very hard as we added new methods like delete.
Much more accurate. Now we can validate revision being continuous as failed request cannot just pop up.

Only downside is history visualization is much less readable.

codecov-commenter · 2022-12-01T22:37:05Z

Codecov Report

Merging #14880 (5ff9202) into main (42bb543) will decrease coverage by 0.06%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             main   #14880      +/-   ##
==========================================
- Coverage   74.53%   74.46%   -0.07%     
==========================================
  Files         415      415              
  Lines       34335    34335              
==========================================
- Hits        25590    25569      -21     
- Misses       7096     7120      +24     
+ Partials     1649     1646       -3

Flag	Coverage Δ
all	`74.46% <ø> (-0.07%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
server/etcdserver/api/v3rpc/util.go	`67.74% <0.00%> (-9.68%)`	⬇️
server/storage/mvcc/watchable_store.go	`84.42% <0.00%> (-4.72%)`	⬇️
client/pkg/v3/testutil/recorder.go	`76.27% <0.00%> (-3.39%)`	⬇️
pkg/traceutil/trace.go	`96.15% <0.00%> (-1.93%)`	⬇️
server/proxy/grpcproxy/watch.go	`92.48% <0.00%> (-1.16%)`	⬇️
server/auth/store.go	`84.23% <0.00%> (-1.15%)`	⬇️
server/etcdserver/api/v3rpc/interceptor.go	`76.56% <0.00%> (-1.05%)`	⬇️
pkg/proxy/server.go	`60.67% <0.00%> (-0.68%)`	⬇️
client/v3/watch.go	`93.43% <0.00%> (-0.39%)`	⬇️
server/etcdserver/api/v3rpc/watch.go	`85.07% <0.00%> (-0.32%)`	⬇️
... and 6 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

serathius · 2022-12-01T23:51:56Z

cc @ahrtr @geetasg @ptabor

serathius · 2022-12-02T12:56:06Z

Did some testing and found out that new model is faster on linearizable history, but can be much much slower on non-linearizable history. I expect this is due fact that we have multiple failure injections in one history. This results in a lot of timeout requests which gives exponential cost. Before merging this PR we would need to rewrite testing to trigger only one failure per history.

tjungblu · 2022-12-02T13:32:00Z

when you want to implement multiple keys at some point, don't forget to implement the partition function:
https://github.com/anishathalye/porcupine/blob/master/porcupine_test.go#L1142-L1158

porcupine can parallelize over each partition, making it even quicker.

serathius · 2022-12-02T14:10:32Z

when you want to implement multiple keys at some point, don't forget to implement the partition function:

This is a good suggestion. It will not help us in this case (partition works for state, not for time like in this case) Based on different usage patters I was considering to add separate key when testing leases. cc @geetasg

serathius · 2022-12-02T14:43:01Z

Ok, fixed issue with linearizability verification going into exponential time. Solution was to shorten test runs to only one failpoint injections. We still do 60 iterations, however each of them recreates the cluster.

This is a little costly for 3 node cluster, but it was a quickest way to shorten request history and guarantee correctness. We might consider reusing cluster in future.

serathius · 2022-12-02T16:00:05Z

re-running linearizability tests to validate for flakes.

serathius · 2022-12-03T10:03:59Z

Recreating cluster on every run makes tests unstable. It is required for this change to work, so I decided to separate it to #14885 to be able to iterate on it quicker.

tests/linearizability/client.go

tests/linearizability/model.go

tjungblu · 2022-12-05T11:47:46Z

/lgtm (non-binding) overall, just a couple of nits around comments and naming.

serathius · 2022-12-05T14:53:19Z

Rewrite the PR based on comments from @tjungblu. Thanks for review!
Please take another look @ahrtr @tjungblu

tests/linearizability/model.go

ahrtr · 2022-12-06T00:53:26Z

tests/linearizability/model.go

-		state.LastRevision = response.revision
-		delete(state.FailedWrites, response.getData)
-		return true, state
+	if state.FailedWrite != nil && state.LastRevision < response.Revision {


Suggested change

if state.FailedWrite != nil && state.LastRevision < response.Revision {

if state.FailedWrite != nil && state.LastRevision <= response.Revision {

This is not true, as this code is responsible for restoring a failed write. state.LastRevision < response.Revision is correct as writes always increase revision.

writes always increase revision.

With the addition of delete operation, it doesn't stand anymore. Deleting an non-exist key will not increase the revision.

https://github.com/etcd-io/etcd/pull/14802/files#diff-f6cd9cfbf4b5310cea9742f346927212c88f39b5cab928a434fe3ffb44d50cccR182-R187

Yes, but that should be changed by PR that introduced delete.

tests/linearizability/client.go

tests/linearizability/id.go

ahrtr · 2022-12-06T01:35:20Z

tests/linearizability/history.go

+	for _, op := range h.failed {
+		if op.Call > maxTime {
+			continue
+		}
+		op.Return = maxTime + 1
+		operations = append(operations, op)
+	}


If I understand correctly, you followed the suggestion in anishathalye/porcupine#10? If yes, please add a comment here, otherwise it's hard for other contributors to understand.

Added the comment

ahrtr

Overall looks good to me.

I expect merging #14802 firstly, update and merge this PR afterwards.

Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>

serathius · 2022-12-06T10:11:44Z

Rebased on delete changes.

serathius marked this pull request as draft December 1, 2022 22:00

serathius force-pushed the linearizability-failed branch from 6f9582e to accc53f Compare December 1, 2022 22:01

serathius force-pushed the linearizability-failed branch 4 times, most recently from e15357e to 0d2778b Compare December 1, 2022 23:42

serathius marked this pull request as ready for review December 1, 2022 23:51

ahrtr self-requested a review December 2, 2022 08:51

serathius force-pushed the linearizability-failed branch from 60a930c to 6a2fdfc Compare December 2, 2022 14:40

serathius force-pushed the linearizability-failed branch from 6a2fdfc to d5b70af Compare December 2, 2022 15:06

serathius force-pushed the linearizability-failed branch 2 times, most recently from 64e37a1 to 85f72e9 Compare December 2, 2022 19:55

serathius marked this pull request as draft December 3, 2022 10:02

serathius force-pushed the linearizability-failed branch 3 times, most recently from 150ef6c to 4409a95 Compare December 5, 2022 11:19

serathius marked this pull request as ready for review December 5, 2022 11:21