Mas i1775 ttaaesafety #1776

martinsumner · 2020-11-12T17:38:52Z

Bug fixes and improvements to TTAAE full-sync

Re-calibrate defaults

This also appears to be much quicker within `riak_test`. Why, is not clear. Perhaps related to queueing at aae_runner as perfomance log within aae_runner is fast even when fetch_clocks_nval returns a slow response. This change is leading towards the possibility of adding modified_range to fetch_clocks_nval, and so allowing for hourcheck and daycheck as well as allcheck in ttaaefs schedules.

remote and local clients have alternate expectation on whether `data` included as first elemen tof tuple.

To allow for option to add modified range on "full" exchanges

To allow for modified ranges on nval_checks

Is expecting a large tree (as cached trees are always large)

martinsumner · 2020-11-16T14:46:02Z

basho/riak_test@develop-3.0...mas-i1775-ttaaesafety

By default have extra capacity for aae_folds used in replication. Pretty print buckets and keys in repair logs where possible.

Why do deletes not get reaped - but only on one site

There exists a race condition here, and wan to try and take the minimal step to mitigate this condition. There is not hurry at this point, as the client has already been informed of the success/failure of the delete. Assumption is that a pause approximately equal to the intra-cluster network delay will be sufficient. Note that the race can be cleared up even if it happens (e.g. by repl or by manual reap). Aim is to contorl the volume at which it happens (to make it easier for repl to clear up without intervention of manual repl).

This reverts commit 33a963b.

Make this log more operator friendly - especially if an operator may need to run a range_repl to recover.

When a work item doesn't find much repair work, then quieten the max results so that it does less work. This should allow for more frequent scheduling (e.g. without having lots of work initiated from a cluster which is behind).

Also increase timeout for exchanges to 1 hour (half the 2 hour crash timeout)

Otherwise compare will all be done in repair - and take a long time

This may well be the right thing to do (quiten the more complicated queries) - but the resultant behaviour isn't intuitive.

Replaces the boost concept. Now range_checks can be added to the schedule, and when a delta is discovered the range of the delta will be set to be used in the next range_check query. If there is no range set, then the range_check is a null op. The range is always unset before each range_check, and will only be reset if sufficient results are found. The expectation is that fewer hour/day/all checks should need to be configured as the bulk of the work will be managed by range_check. As the range_check is handling an identified problem - it will be run with boosted max_results.

Also expose the set/get/clear functions for the range - make it easier to control this from a remote_console and/or tests. Need to account for the fact that rnage queries may now be requested on NVal tests - and so NVal may be an integer not range on fetch_clocks_range.

Also - splitting full-sync folds onto different queues was confusing, and perhaps unnecessary with range_check. Revert this back, and expand the default size of the af3 queue. - use the kv_index_tictcatree feature to log purpose of exchanges, to more easily differrentiate between inter and intra cluster AAE in logs

When AAE is working (i.e. clusters in sync) still use range_check, just sync since the last success. Will find ranges quicker after short outages. If there has not been success, and there has not been a range set (i.e. this site is behind), then don't do range_check. The range_check will have to wait for another check to either succeed, or set a range so that range_checks can continue.

Otherwise very low to start AAE exchanges

src/riak_kv_vnode.erl

src/riak_kv_ttaaefs_manager.erl

src/riak_kv_vnode.erl

src/riak_kv_ttaaefs_manager.erl

Tombstones do not currently have a last modified date. This has not caused any issues, as last modified dates are only checked: https://github.com/basho/riak_kv/blame/d17495741ed001dcb54fa9e75e0dff026baaefde/src/riak_kv_vnode.erl#L3048-L3049 This would have previously blown up before the relative safe get_last_modified function was added, as opposed to doing a dict:fetch for the tag (which would have triggered an exception). This is necessary to have time-based rpelication work correctly. Currently there can be issues where recent tombstone discrepancies cna only be picked up by all_check (even if they are recent). This is confusing. Should we just live with the confusion rather than risk changing this? But it feels like tomstones not having a LMD is just a mistake i.e. it being useful is just a scenario which was missed, and there are no issues. Note, as a side effect, tombstones also get a vtag. Perhaps some client somwhere will find this weird/unexpected?

martinsumner

There is at least a switch to disable this {riak_kv, tombstone_timestamp} can be set to false - so anyone having an issue can simply flick that switch.

Bob-The-Marauder · 2020-12-05T15:30:39Z

Not sure if this comment is entirely relevant but including it just in case.

As part of system tuning, it's recommended to set the disks to noatime mounts https://www.tiot.jp/riak-docs/riak/kv/2.2.6/using/performance/#mounts

I know the question in https://postriak.slack.com/archives/C6R0LPH4N/p1607107910139900 uses "last_modified_date" and that noatime is only relevent for "last accessed" but thought I should mention it in case there is any potential for people to tune Riak and, in doing so, inadvertently break functions.

martinsumner · 2020-12-05T16:10:18Z

@Bob-The-Marauder - last modified time is the time on an individual object (which comes from the OS system time), and not related to times on files etc. So this won't be an issue.

Otherwise invalid queries get no repsonse to riak_client.

martinsumner added 4 commits November 10, 2020 14:00

Add backoffs on query timeouts

4f4e522

Re-calibrate defaults

Fix issue with format switches onmodifed date ranges

36353b1

remote and local clients have alternate expectation on whether `data` included as first elemen tof tuple.

Switch kv_index_tictactree branch

1cc2267

To allow for option to add modified range on "full" exchanges

martinsumner mentioned this pull request Nov 12, 2020

TictacAAE fullsync - need for further safety measures #1775

Closed

martinsumner added 3 commits November 12, 2020 19:21

Add support for DayCheck/HourCheck on full fullsync

2e425d9

Add API changes for fetch_clock_nval

2bc9157

To allow for modified ranges on nval_checks

Correction to default nval filter

8fa58d1

Is expecting a large tree (as cached trees are always large)

martinsumner added 21 commits November 16, 2020 21:31

Schema and log refinements

e5dedf6

By default have extra capacity for aae_folds used in replication. Pretty print buckets and keys in repair logs where possible.

Temp commit

33a963b

Why do deletes not get reaped - but only on one site

Revert "Temp commit"

f48568b

This reverts commit 33a963b.

Improve reporting of repairs

0a6e7b9

Make this log more operator friendly - especially if an operator may need to run a range_repl to recover.

Update rebar.config

0d1f747

Fix repair logging

26f3462

Use environment variable to boost/quiet work items

4106675

When a work item doesn't find much repair work, then quieten the max results so that it does less work. This should allow for more frequent scheduling (e.g. without having lots of work initiated from a cluster which is behind).

Log and quietener improvements

7af3c2a

split queues for all_sync and modified_range queries

8a936c6

Also increase timeout for exchanges to 1 hour (half the 2 hour crash timeout)

Decode clock before compare

6c33cc8

Otherwise compare will all be done in repair - and take a long time

Decode clocks in mod_range queries

a7ff5b4

Remove auto-quitener

5282d0d

This may well be the right thing to do (quiten the more complicated queries) - but the resultant behaviour isn't intuitive.

Resolve name clash

d346369

Typo

4ddb72f

Correct success reply names

4c71d57

Should assume a success on startup

00cf684

Otherwise very low to start AAE exchanges