node_confirms on GET #1750

martinsumner · 2020-02-18T09:29:27Z

The node_confirms bucket property is respected on PUT. A PUT may succeed (in that it has been written to Riak) but will error when insufficient vnodes have responded from different nodes. This allows us to be sure that the PUT will not disappear on a single node failure.

More information - https://github.com/basho/riak_kv/blob/develop-2.9/docs/Node-Diversity.md

If a PUT is made and errors due to node_confirms. After the PUT, a GET can be made and succeed - but at that stage, the object may still be on a single node. How does the application know if it has safely stored the data on two nodes - especially as this may now be a different process reading the object, and so not be aware of the potentially failed PUT i.e. re-PUT until a success response is not an option.

It would be useful to have a GET request, where success indicates that even on a single node failure, it can be safely assumed that the read change will not be lost from the history.

PR=2 can be used as an alternative, but this will fail in lots of cases when node diversity does exist. The other issue with PR=2 and PW=2 is when a node is started, but not joined to the cluster, and perhaps due to admin error is introduced into a load-balanced group. Now PR=2, and PW=2 will succeed, but we haven't actually ensured the data is stored in more than one place.

It is possible to add node_confirms to GET. However, this would not naturally confirm that the PUT has reached two nodes - counting parameter on GETs count the number of nodes consulted, but the answer may still only come from one node. However, if node_confirms=2 on GET, on success the application would know that:

More than one node in the preflist is now active;
That if the returned record is not on one node, a read repair is pending to fix that.

The cluster using node_confirms = 2 on GET and PUT would still have a degree of partition tolerance. however, some preflists on some partitions (especially minority partitions) may not have node diversity and would error.

martinsumner · 2020-02-18T09:37:14Z

There are some peculiarities with a specific customer that make this more relevant than it might seem. For this customer inbound updates are persisted first to a Riak msgStore, and are then queued from the store to be applied, and periodically re-queued until the apply processing loop succeeds.

Success is determined by:

a) either making the update;
b) reading the message GUID in change log stored within the target object (replay detected).

The desire with adding node_confirms to GET requests is to make part (b) stricter. We want this to process to work (safely) even when multiple nodes have temporary failure, but not when a rogue node has been added to a load balanced group due to admin error, and always be resilient to data loss when any single node suffers permanent failure and loss of persisted storage.

No changes yet to support option on specific request, just respecting of bucket property. See #1750 Tested https://github.com/basho/riak_test/blob/mas-i1750-nodeconfirms/tests/node_confirms_vs_pw.erl

martinsumner · 2020-02-25T10:18:01Z

#1751

This was referenced Feb 20, 2020

Mas i1750 nodeconfirms #1751

Merged

Mas i1750 nodeconfirms basho/riak_test#1340

Merged

martinsumner closed this as completed Feb 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

node_confirms on GET #1750

node_confirms on GET #1750

martinsumner commented Feb 18, 2020

martinsumner commented Feb 18, 2020

martinsumner commented Feb 25, 2020

node_confirms on GET #1750

node_confirms on GET #1750

Comments

martinsumner commented Feb 18, 2020

martinsumner commented Feb 18, 2020

martinsumner commented Feb 25, 2020