-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TictacAAE fullsync - need for further safety measures #1775
Comments
After further investigation, some issues have arisen.
The |
If the intention is not to repair tree caches as part of fetch_clocks when running full ttaaefs fullsync, then it becomes easier to consider supporting a schedule that includes hour and day syncs. In an hour or day sync, a complete comparison would be made between the AAEtree caches, but in the hour sync it will assume any delta was in the last hour of changes (by modified date). likewise the day sync will assume any delta is in the last day of changes. These fetch_clocks queries can then be run with a last-modified-date range, and on very large stores this will be much faster than a full nval compare. |
In part related to #1765. When testing of intra-cluster AAE with very large vnodes, and large deltas (either due to genuine deltas or false deltas when awaiting cache repairs) then the behaviour of Tictac AAE was neither efficient or sufficiently conservative.
Some of the measures related to intra-cluster AAE, will also lead to improvement in inter-cluster AAE (such as active pruning of the AAE runner queue). However, there still exists the risk that inter-cluster AAE behaviour may not be predictable, conservative or efficient in the face of large deltas with large stores.
Following changes proposed:
Update the NextGenREPL doc to make the explicit the potential need for operator intervention to use
aae_fold
repl_keys_range
to resolve large deltas, rather than wait for this to be resolved via fullsync. Currently this fullsync method is optimised for reconciliation (i.e. confirming no deltas) and resolving small deltas. Deltas of <10K may take an unexpected length of time to resolve.Update the NextGenREPL doc to make explicit that the
participate_in_coverage
option should be used when recovering a crashed node, if using TictacAAE fullsync. by disabling coverage participation for a crashed node until the tree caches have been built - it will prevent excessive work due to false deltas before the cache rebuilds have completed.Allow the
max_results
andscan_timeout
on AAE exchanges to be configurable for those prompted by inter-cluster AAE, just as they are for intra-cluster AAE.Backoff further in the
riak_kv_ttaaefs_manager
gen_server when a previous exchange completes but ends in the statewaiting_all_results
(the state is returned to the gen_server via the reply function), given that this state as a final state indicates that the exchange did not complete (and probably due to timeout).Change the default configurations to be more friendly to repair than reconciliation (e.g. in large clusters reconciling every 15 minutes per node is fine, but repairing at that frequency is likely to lead to backlogs).
The text was updated successfully, but these errors were encountered: