Possible aborted reads with process crashes #14092

aphyr · 2022-06-06T21:56:33Z

What happened?

With etcd 3.5.3 and jetcd 4.1.74.Final, I suspect that the effects of failed transactions can be visible to later readers and writers when processes are killed with kill -9. For example, take this pair of operations from one Jepsen run, in a five-node cluster running on Debian 11 LXC nodes:

            :anomalies {:G1a ({:op {:type :ok,
                                    :f :txn,
                                    :value [[:append 4325 21]
                                            [:r
                                             4325
                                             [5 7 10 11 14 21]]
                                            [:r
                                             4325
                                             [5 7 10 11 14 21]]],
                                    :time 224493456205,
                                    :process 143,
                                    :index 224833},
                               :mop [:r 4325 [5 7 10 11 14 21]],
                               :writer {:type :fail,
                                        :f :txn,
                                        :value [[:append 4325 14]
                                                [:r 4325 nil]
                                                [:r 4324 nil]
                                                [:append 4324 12]],
                                        :time 226412418904,
                                        :process 145,
                                        :error :didn't-succeed,
                                        :index 227733},
                               :element 14}

The :writer transaction began by appending (via CaS) the element 14 to a list stored in key 4325. That transaction returned a TxnResponse to the client rather than throwing, but response.isSucceeded() returned false--we assume that since it did not succeed, it must have aborted. However, the :op transaction performed a read of key 4325 and observed element 14! If the writer did in fact abort, this op would constitute an aborted read.

Here's the full logs from this test run.

What did you expect to happen?

I expect that if a transaction does not succeed, and also does not throw an exception (as jetcd typically does for indefinite errors), it would not be visible to later readers.

How can we reproduce it (as minimally and precisely as possible)?

Clone https://github.com/jepsen-io/etcd at a1bf380a1c09d62bf6bf2e7b97bd02a35902ed36, and run:

lein run test-all -w append --concurrency 2n --time-limit 1000 --rate 1000 --test-count 5 --nemesis kill

This is basically the same test suite we used for the etcd testing back in late 2019--I've just updated it to the latest jetcd, bumped dependencies, and pointed it at 3.5.3 instead.

Anything else we need to know?

No response

Etcd version (please run commands below)

$ etcd --version
etcd Version: 3.5.3
Git SHA: 0452feec7
Go Version: go1.16.15
Go OS/Arch: linux/amd64

$ etcdctl version
etcdctl version: 3.5.3
API version: 3.5

Etcd configuration (command line flags or environment variables)

Each node is started without a config file, using CLI flags like ``` etcd --enable-v2 --log-outputs stderr --logger zap --name n1 --listen-peer-urls http://192.168.122.101:2380 --listen-client-urls http://192.168.122.101:2379 --advertise-client-urls http://192.168.122.101:2379 --initial-cluster-state new --initial-advertise-peer-urls http://192.168.122.101:2380 --initial-cluster ```

Etcd debug information (please run commands blow, feel free to obfuscate the IP address or FQDN in the output)

root@n1:/opt/etcd# ./etcdctl member list -w table --endpoints=n1:2379
+------------------+---------+------+-----------------------------+-----------------------------+------------+
|        ID        | STATUS  | NAME |         PEER ADDRS          |        CLIENT ADDRS         | IS LEARNER |
+------------------+---------+------+-----------------------------+-----------------------------+------------+
| 1153c9690d2b2284 | started |   n3 | http://192.168.122.103:2380 | http://192.168.122.103:2379 |      false |
| 4824313a421b2502 | started |   n5 | http://192.168.122.105:2380 | http://192.168.122.105:2379 |      false |
| 4d6e27d122507e9c | started |   n4 | http://192.168.122.104:2380 | http://192.168.122.104:2379 |      false |
| a1ffd5acd6a88a6a | started |   n2 | http://192.168.122.102:2380 | http://192.168.122.102:2379 |      false |
| afa39e55dee6dc2e | started |   n1 | http://192.168.122.101:2380 | http://192.168.122.101:2379 |      false |
+------------------+---------+------+-----------------------------+-----------------------------+------------+

root@n1:/opt/etcd# ./etcdctl --endpoints=http://192.168.122.105:2379,http://192.168.122.103:2379,http://192.168.122.104:2379,http://192.168.122.102:2379,http://192.168.122.101:2379 endpoint status -w table
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|          ENDPOINT           |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| http://192.168.122.105:2379 | 4824313a421b2502 |   3.5.3 |  4.2 MB |      true |      false |        20 |      60226 |              60226 |        |
| http://192.168.122.103:2379 | 1153c9690d2b2284 |   3.5.3 |  4.2 MB |     false |      false |        20 |      60226 |              60226 |        |
| http://192.168.122.104:2379 | 4d6e27d122507e9c |   3.5.3 |  4.2 MB |     false |      false |        20 |      60228 |              60228 |        |
| http://192.168.122.102:2379 | a1ffd5acd6a88a6a |   3.5.3 |  4.2 MB |     false |      false |        20 |      60228 |              60228 |        |
| http://192.168.122.101:2379 | afa39e55dee6dc2e |   3.5.3 |  4.2 MB |     false |      false |        20 |      60228 |              60228 |        |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

Relevant log output

Full logs are in the attached zip file.

The text was updated successfully, but these errors were encountered:

serathius · 2022-06-07T09:29:31Z

From brief read this looks like bug in jetcd and not etcd. Please file it to jetcd repo or provide explanation what is the error in Etcd.

aphyr · 2022-06-07T14:42:11Z

Could you tell me a bit more about why you suspect jetcd is to blame here, rather than etcd itself?

Here's a protocol-level view of the same kind of anomaly. We make a request to conditionally set key 693 to a new value ending in ... 29 iff its revision is still 7191:

Four milliseconds later, the server responded with an HTTP2 200 OK response, with grpc-status: 0, containing a mostly-blank Txn,Response message--notably without a succeeded field. I'm not sure, but since I can't find any examples of responses which have succeeded = 0, I suspect that the absence of the succeeded field entirely might be how etcd's protocol encodes a transaction guard failure.

Nevertheless, this write of 29 appeared in later reads, which suggests this is an aborted read.

                                    :f :txn,
                                    :value [[:r 697 [4 5 11 12 13 16 17 18 20 22 24 25 27 28 30]]
                                            [:r 693 [1 2 3 4 6 8 11 12 13 15 16 18 20 21 22 23 25 27 29]]
                                            [:r 698 [5 6 7 4 19 20 24 30]]
                                            [:append 697 9]],
                                    :time 28156744830,
                                    :process 8,
                                    :index 36577},

To confirm this, here's a transaction which definitely did fail--again, I believe, because its precondition guard failed to match. Again, HTTP 200, GRPC status 0, a Txn,response message without a succeeded field:

Compare this to what happens during a successful request: we get back a Txn,Response which does have a succeeded field, as well as kvs structures containing the results of that transaction:

Also compare this to how the protocol represents an exceptional response--for instance, here's a malformed request which caused etcd to return an HTTP 200 with gprc-status: 3 (INVALID_ARGUMENT) and a grpc-message of etcdserver: duplicate key given in txn request:

Here's a zip file with tcpdump captures of the client-server traffic on all nodes, if you'd like to see for yourself. The write causing an aborted read is in n5/tcpdump.pcap.

20220607T093354.000-0400.zip

ahrtr · 2022-06-12T17:14:22Z

response.isSucceeded() returned false--we assume that since it did not succeed, it must have aborted

It isn't correct. It only means whether the compare is evaluated to true or false. See rpc.pb.go#L1371-L1372

aphyr · 2022-06-13T12:05:32Z

Begging your pardon, but are you saying that a transaction whose guard clause evaluates to false should go on to evaluate the transaction's success block? That seems to be at odds with both the transaction documentation and etcd's behavior in, well, pretty much every other response in these tests...

ahrtr · 2022-06-13T18:49:46Z

Two comments/points:

The TxnResponse.Succeeded only means the result of compare (The If statement);
When the compare is evaluated to false, the Failure block is processed. Otherwise, the Success block is processed.

aphyr · 2022-06-13T18:53:32Z

OK, so... to repeat, we have multiple cases where a transaction's compare failed, but etcd apparently executed the success block anyway--or if it didn't "really" execute, other transactions are able to observe state which could only have arisen from executing the success block. You're saying this is not a bug?

ahrtr · 2022-06-13T20:15:02Z

If TxnResponse has the header (cluster_id, member_id, revision and raft_term), then it must contain the field succeeded, because its type is bool and it always has a value no matter a program explicitly assigns a value to it or not.

where a transaction's compare failed, but etcd apparently executed the success block anyway

I still am still not convinced. Could you show me the evidence? How did you know transaction's compare failed? Did you get the conclusion because you did not see the field succeeded ?

aphyr · 2022-06-13T21:02:30Z

If TxnResponse has the header (cluster_id, member_id, revision and raft_term), then it must contain the field succeeded, because its type is bool and it always has a value no matter a program explicitly assigns a value to it or not.

Hmm. At the JVM level, in the client, this value is false. That's because it's not present in the protocol-buffer message, and the protocol buffer spec states that the default value for bools is false.

As for why there's no succeeded field in the protobuf-encoded messages... Ah, I've got some speculation there, but I don't think it's actually on-topic. The core problem remains: etcd returns responses without a succeeded field set to true even when they apparently did succeed.

Could you show me the evidence?

I've told you what the Java client is doing, provided full logs of test runs, given you a reproducible automated test suite, shown you Wireshark disassembly of the request and response, and provided pcap files for you to confirm yourself. I'm honestly at a loss: what other kind of evidence are you looking for?

ahrtr · 2022-06-14T07:41:38Z

OK, based on the proto3 doc, if a scalar message field is set to its default, the value will not be serialized on the wire. It could be the reason why you did not see the field "succeeded" in the TxnResponse.

We have both integration and e2e test cases for transaction, and I have never seen such issue so far. I suspect it's client side issue, including jetcd or the test case.

stale · 2022-09-21T02:36:44Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

serathius · 2022-09-28T08:34:00Z

Closing as a jetcd issue. Feel free to reopen if you find evidence that issue is on etcd server side.

This client shells out to etcdctl and supports the list-append workload only--I haven't implemented any other features. I'm only doing this because the etcd team refuse to believe a bug I reported could actually be in etcd; they insist it must be in jetcd, and the jetcd maintainers haven't found anything, and <sigh> there goes my entire Sunday etcd-io/etcd#14092 (comment)

aphyr added the type/bug label Jun 6, 2022

ahrtr added type/question and removed type/bug labels Jun 12, 2022

This was referenced Jun 14, 2022

Possible aborted reads with process crashes etcd-io/jetcd#1072

Closed

Introduce etcd linearizability tests #14045

Closed

stale bot added the stale label Sep 21, 2022

serathius closed this as completed Sep 28, 2022

aphyr mentioned this issue Dec 5, 2022

Lost update and cyclic information flow on process crash #14890

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible aborted reads with process crashes #14092

Possible aborted reads with process crashes #14092

aphyr commented Jun 6, 2022 •

edited

Loading

serathius commented Jun 7, 2022 •

edited

Loading

aphyr commented Jun 7, 2022

ahrtr commented Jun 12, 2022

aphyr commented Jun 13, 2022

ahrtr commented Jun 13, 2022

aphyr commented Jun 13, 2022

ahrtr commented Jun 13, 2022

aphyr commented Jun 13, 2022 •

edited

Loading

ahrtr commented Jun 14, 2022

stale bot commented Sep 21, 2022

serathius commented Sep 28, 2022

Possible aborted reads with process crashes #14092

Possible aborted reads with process crashes #14092

Comments

aphyr commented Jun 6, 2022 • edited Loading

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Etcd version (please run commands below)

Etcd configuration (command line flags or environment variables)

Etcd debug information (please run commands blow, feel free to obfuscate the IP address or FQDN in the output)

Relevant log output

serathius commented Jun 7, 2022 • edited Loading

aphyr commented Jun 7, 2022

ahrtr commented Jun 12, 2022

aphyr commented Jun 13, 2022

ahrtr commented Jun 13, 2022

aphyr commented Jun 13, 2022

ahrtr commented Jun 13, 2022

aphyr commented Jun 13, 2022 • edited Loading

ahrtr commented Jun 14, 2022

stale bot commented Sep 21, 2022

serathius commented Sep 28, 2022

aphyr commented Jun 6, 2022 •

edited

Loading

serathius commented Jun 7, 2022 •

edited

Loading

aphyr commented Jun 13, 2022 •

edited

Loading