-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve support for failed requests in linearizability tests #14880
Conversation
6f9582e
to
accc53f
Compare
Codecov Report
@@ Coverage Diff @@
## main #14880 +/- ##
==========================================
- Coverage 74.53% 74.46% -0.07%
==========================================
Files 415 415
Lines 34335 34335
==========================================
- Hits 25590 25569 -21
- Misses 7096 7120 +24
+ Partials 1649 1646 -3
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
e15357e
to
0d2778b
Compare
Did some testing and found out that new model is faster on linearizable history, but can be much much slower on non-linearizable history. I expect this is due fact that we have multiple failure injections in one history. This results in a lot of timeout requests which gives exponential cost. Before merging this PR we would need to rewrite testing to trigger only one failure per history. |
when you want to implement multiple keys at some point, don't forget to implement the partition function: porcupine can parallelize over each partition, making it even quicker. |
This is a good suggestion. It will not help us in this case (partition works for state, not for time like in this case) Based on different usage patters I was considering to add separate key when testing leases. cc @geetasg |
60a930c
to
6a2fdfc
Compare
Ok, fixed issue with linearizability verification going into exponential time. Solution was to shorten test runs to only one failpoint injections. We still do 60 iterations, however each of them recreates the cluster. This is a little costly for 3 node cluster, but it was a quickest way to shorten request history and guarantee correctness. We might consider reusing cluster in future. |
6a2fdfc
to
d5b70af
Compare
re-running linearizability tests to validate for flakes. |
64e37a1
to
85f72e9
Compare
Recreating cluster on every run makes tests unstable. It is required for this change to work, so I decided to separate it to #14885 to be able to iterate on it quicker. |
150ef6c
to
4409a95
Compare
/lgtm (non-binding) overall, just a couple of nits around comments and naming. |
9d03a46
to
48002c7
Compare
48002c7
to
ab7c4e0
Compare
state.LastRevision = response.revision | ||
delete(state.FailedWrites, response.getData) | ||
return true, state | ||
if state.FailedWrite != nil && state.LastRevision < response.Revision { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if state.FailedWrite != nil && state.LastRevision < response.Revision { | |
if state.FailedWrite != nil && state.LastRevision <= response.Revision { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not true, as this code is responsible for restoring a failed write. state.LastRevision < response.Revision
is correct as writes always increase revision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
writes always increase revision.
With the addition of delete operation, it doesn't stand anymore. Deleting an non-exist key will not increase the revision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but that should be changed by PR that introduced delete.
for _, op := range h.failed { | ||
if op.Call > maxTime { | ||
continue | ||
} | ||
op.Return = maxTime + 1 | ||
operations = append(operations, op) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, you followed the suggestion in anishathalye/porcupine#10? If yes, please add a comment here, otherwise it's hard for other contributors to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the comment
49467c8
to
2a1997e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good to me.
I expect merging #14802 firstly, update and merge this PR afterwards.
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2a1997e
to
16fdca7
Compare
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
16fdca7
to
5ff9202
Compare
Rebased on delete changes. |
Based on recommendations in porcupine issues, failed requests should be marked as they failed at the end of test. This allows to stop aggregating errors within model making it:
Only downside is history visualization is much less readable.