Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mas i1775 ttaaesafety #1776

Merged
merged 52 commits into from
Dec 14, 2020
Merged

Mas i1775 ttaaesafety #1776

merged 52 commits into from
Dec 14, 2020

Conversation

martinsumner
Copy link
Contributor

Bug fixes and improvements to TTAAE full-sync

Re-calibrate defaults
This also appears to be much quicker within `riak_test`.  Why, is not clear.  Perhaps related to queueing at aae_runner as perfomance log within aae_runner is fast even when fetch_clocks_nval returns a slow response.

This change is leading towards the possibility of adding modified_range to fetch_clocks_nval, and so allowing for hourcheck and daycheck as well as allcheck in ttaaefs schedules.
remote and local clients have alternate expectation on whether `data` included as first elemen tof tuple.
To allow for option to add modified range on "full" exchanges
To allow for modified ranges on nval_checks
Is expecting a large tree (as cached trees are always large)
@martinsumner
Copy link
Contributor Author

By default have extra capacity for aae_folds used in replication.

Pretty print buckets and keys in repair logs where possible.
Why do deletes not get reaped - but only on one site
There exists a race condition here, and wan to try and take the minimal step to mitigate this condition.  There is not hurry at this point, as the client has already been informed of the success/failure of the delete.

Assumption is that a pause approximately equal to the intra-cluster network delay will be sufficient.

Note that the race can be cleared up even if it happens (e.g. by repl or by manual reap).  Aim is to contorl the volume at which it happens (to make it easier for repl to clear up without intervention of manual repl).
This reverts commit 33a963b.
Make this log more operator friendly - especially if an operator may need to run a range_repl to recover.
When a work item doesn't find much repair work, then quieten the max results so that it does less work.  This should allow for more frequent scheduling (e.g. without having lots of work initiated from a cluster which is behind).
Also increase timeout for exchanges to 1 hour (half the 2 hour crash timeout)
Otherwise compare will all be done in repair - and take a long time
This may well be the right thing to do (quiten the more complicated queries) - but the resultant behaviour isn't intuitive.
Replaces the boost concept.  Now range_checks can be added to the schedule, and when a delta is discovered the range of the delta will be set to be used in the next range_check query.

If there is no range set, then the range_check is a null op.  The range is always unset before each range_check, and will only be reset if sufficient results are found.

The expectation is that fewer hour/day/all checks should need to be configured as the bulk of the work will be managed by range_check.  As the range_check is handling an identified problem - it will be run with boosted max_results.
Also expose the set/get/clear functions for the range - make it easier to control this from a remote_console and/or tests.

Need to account for the fact that rnage queries may now be requested on NVal tests - and so NVal may be an integer not range on fetch_clocks_range.
Also

- splitting full-sync folds onto different queues was confusing, and perhaps unnecessary with range_check.  Revert this back, and expand the default size of the af3 queue.

- use the kv_index_tictcatree feature to log purpose of exchanges, to more easily differrentiate between inter and intra cluster AAE in logs
When AAE is working (i.e. clusters in sync) still use range_check, just sync since the last success.  Will find ranges quicker after short outages.

If there has not been success, and there has not been a range set (i.e. this site is behind), then don't do range_check.  The range_check will have to wait for another check to either succeed, or set a range so that range_checks can continue.
Otherwise very low to start AAE exchanges
Tombstones do not currently have a last modified date.  This has not caused any issues, as last modified dates are only checked:

https://github.com/basho/riak_kv/blame/d17495741ed001dcb54fa9e75e0dff026baaefde/src/riak_kv_vnode.erl#L3048-L3049

This would have previously blown up before the relative safe get_last_modified function was added, as opposed to doing a dict:fetch for the tag (which would have triggered an exception).

This is necessary to have time-based rpelication work correctly.  Currently there can be issues where recent tombstone discrepancies cna only be picked up by all_check (even if they are recent).  This is confusing.

Should we just live with the confusion rather than risk changing this?  But it feels like tomstones not having a LMD is just a mistake i.e. it being useful is just a scenario which was missed, and there are no issues.

Note, as a side effect, tombstones also get a vtag.

Perhaps some client somwhere will find this weird/unexpected?
Copy link
Contributor Author

@martinsumner martinsumner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is at least a switch to disable this {riak_kv, tombstone_timestamp} can be set to false - so anyone having an issue can simply flick that switch.

@martinsumner martinsumner marked this pull request as ready for review December 4, 2020 19:32
@Bob-The-Marauder
Copy link

Not sure if this comment is entirely relevant but including it just in case.

As part of system tuning, it's recommended to set the disks to noatime mounts https://www.tiot.jp/riak-docs/riak/kv/2.2.6/using/performance/#mounts

I know the question in https://postriak.slack.com/archives/C6R0LPH4N/p1607107910139900 uses "last_modified_date" and that noatime is only relevent for "last accessed" but thought I should mention it in case there is any potential for people to tune Riak and, in doing so, inadvertently break functions.

@martinsumner
Copy link
Contributor Author

@Bob-The-Marauder - last modified time is the time on an individual object (which comes from the OS system time), and not related to times on files etc. So this won't be an issue.

@martinsumner martinsumner mentioned this pull request Dec 6, 2020
@martinsumner martinsumner added the 3.0.2 For Riak release 3.0.2 label Dec 8, 2020
@martinsumner martinsumner merged commit 75e142e into develop-3.0 Dec 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.0.2 For Riak release 3.0.2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants