Mas i981 resethashtreetokens #1712

martinsumner · 2019-08-29T20:07:52Z

This is mitigation for the broader problems referred to in - basho/riak#981.

This change is tested in:
https://github.com/nhs-riak/riak_test/blob/mas-i981-resethashtreetokens/tests/verify_aae_resettoken.erl

The idea is that in some circumstances we want to temporarily ensure that writes aren't blocked by AAE hashtree_token depletion. This might be to prove that this isn't an issue behind slow writes, or because there is a need for coordinated AAE tree rebuilds to mitigate some other issue.

This can be managed through riak attach:

Get the current min and max token count across the cluster:

{Min, Max} = riak_kv_util:return_hashtree_tokens().

Set the max token count to be in a range between very large numbers on each vnode:

riak_kv_util:reset_hashtree_tokens(200000, 250000).

After completing any related work, reset back to the original range:

riak_kv_util:reset_hashtree_tokens(Min, Max).

This will work from any node in the cluster, across the cluster, in a healthy cluster. There is no need to run the commands on each node.

Develop 2.9

Simple utility to report and reset the AAE hashtree tokens

Currently unable to pinpoint delay, and understand potential issues.

Bob-The-Marauder · 2019-08-30T14:56:57Z

The code looks good and the functionality makes sense. What might be an ideal additional feature would be to allow an automated option that, when enabled, acts as follows:

AAE hashtree clearing begins:

See how many tokens there are in the pool.
Use the tools here to set token value for the pool to a large number capped at a sensible value.
Perform full AAE hashtree clearing process.
Finish clearing, write out AAE queue.
Reset hashtree tokens back to the value in 1.

This would allow users who regularly encounter basho/riak#981 to avoid the problem without the need to write creative scripts that invoke the above controls for every time AAE needs to clear the hashtrees.

martinsumner · 2019-08-30T16:07:03Z

@Bob-The-Marauder

Automated coordination between the hashtree clearing/rebuild process and the aae token pool (which is on the process dictionary of the vnode process, not the hashtree process), I think would be quite hard. There is significant potential for race (or deadlock) conditions if these two processes try to coordinate activity. Resolving these would require new states to be defined (on the kv_index_hashtree process), and then consideration as to how all messages should be handled in those new states. Lots of the sort of risky, hard to test work, that one would prefer to avoid.

I think effort would be better spent on the root causes (e.g. the need for co-ordinated rebuilds due to AAE/TTL conflict, the need for separate AAE stores, the long time spent clearing trees), than on automating the mitigation.

Bob-The-Marauder · 2019-09-02T05:40:17Z

That makes sense. So far, I am only aware of two companies that have run into this issue and one of them simply opted to turn off AAE (even though they were not using TTL). The tools above provide sufficient usage to allow a work around to be added to a custom AAE script for automating the mitigation. That should be good enough for now. Unless you have any further edits, +1 from me.

martinsumner added 3 commits August 23, 2019 21:52

Merge pull request #37 from basho/develop-2.9

5e1c05c

Develop 2.9

Add AAE token management

5967f7d

Simple utility to report and reset the AAE hashtree tokens

Resolve comment typo

55fc228

martinsumner mentioned this pull request Aug 29, 2019

Mas i981 resethashtreetokens basho/riak_test#1322

Merged

Add logging to clear process

f9f65b7

Currently unable to pinpoint delay, and understand potential issues.

Improve log clarity

a53e956

martinsumner merged commit 70889c5 into basho:develop-2.9 Sep 2, 2019

martinsumner deleted the mas-i981-resethashtreetokens branch September 2, 2019 08:52

martinsumner mentioned this pull request Sep 2, 2019

AAE and Expiry basho/riak#981

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mas i981 resethashtreetokens #1712

Mas i981 resethashtreetokens #1712

martinsumner commented Aug 29, 2019

Bob-The-Marauder commented Aug 30, 2019

martinsumner commented Aug 30, 2019

Bob-The-Marauder commented Sep 2, 2019

Mas i981 resethashtreetokens #1712

Mas i981 resethashtreetokens #1712

Conversation

martinsumner commented Aug 29, 2019

Bob-The-Marauder commented Aug 30, 2019

martinsumner commented Aug 30, 2019

Bob-The-Marauder commented Sep 2, 2019