-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jepsen transient failures under network partition conditions #7549
Comments
Hi @pilvitaneli, thanks for the testing results! We're actively investigating Jepsen tests on top of our own tests, which resulted in #7572. The Jepsen tests helped verify that we fixed the split brain issue (it no longer happens). In all of our runs though, we couldn't simulate a result similar to your first run (the I'll let you know how our continued testing with Jepsen goes, thanks again for your results! |
Running just isolate-self-primaries-nemesis 50 times in a succession results in 22 failures: |
@pilvitaneli circling back to this after a while, do you happen to have the commit sha of Jepsen that you are using for running your tests? I'd like to make sure we run the same tests. |
I haven't run in a while, but last was with jepsen-io/jepsen@761693b . It does not appear as though there are considerable changes after that, but I could try to re-run with current master. |
Going to close this as it's been almost 2 years and we have a different issue we are tracking things for the 5.0 release - #20031 |
Hi! Jepsen tests include five nemeses (test scenarios) that introduce different types of network partitions (see here). The tests add documents to index before, during and after these partitions, and verify that the documents which were acknowledged during the partitions are retrievable afterwards. Sometimes the tests indicate that a number of documents were indexed, but are not retrievable---however, this does not happen on every run (of the same scenario). For example, in a run of 20 times each (against 598854d), the following :lost-frac amounts were reported:
isolate-self-primaries-nemesis 244/361, 2/733, 1/607, 1/603, 1/213, 65/216 (and 14 times 0)
nemesis/partition-random-halves 1/355, 1/226, 4/733, 1/433 (and 16 times 0)
nemesis/partition-halves 1/65, 1/438, 4/715, 2/457, 6/731, 1/435, 9/433 (and 13 times 0)
nemesis/partitioner nemesis/bridge 2/415, 3/253, 2/383, 7/754, 1/786, 1/767 (and 14 times 0)
nemesis/partition-random-node does not report any lost documents.
In total, out of a 100 runs, 23 failed.
The text was updated successfully, but these errors were encountered: