-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Journal compaction overhead with KeyChanges #306
Comments
There is some initial mitigation to this issue. If a Journal evolves with large numbers of KeyChanges objects, then scoring is expensive and this is in part as the list of file positions to score for each file is unordered and randomly distributed. This means that the average jump is beyond any read-ahead, and there's a 50% chance the next jump will be behind not ahead. To improve this, rather than fetching 100 positions 100 * SomeAdjustmentFactor positions could be selected, then sorted, and then an adjacent (within the sample) subset of 100 positions be chosen. This would ensure that the next position is always ahead, and is smaller in ratio to the SomeAdjustmentFactor. The process of fetching positions takes more CPU, but there exists the potential for the process of fetching keys for scoring to require less disk head movement and be more friendly to OS cache-filling behaviour. |
There is also a PR to provide some further building blocks for improvement: This PR changes the penciller request to fetch the sequence number so that it returns current/replaced/missing rather than true/false - so that in the future alternative action may be possible based on replaced/mssing. There is also a tech debt related to rebuilding a ledger from a journal. In this case there is a cost per file in the Journal related to never updating the MinSQN - so that the process has to loop aggressively at the start of each file before the first SQN occurs in a batch. This is corrected here. The tests have also been expanded in this PR to ensure that there is better coverage of the combinations of updates, deletes, compaction and rebuilds. |
In order to properly resolve this issue, there is potentially a need for another compaction type. It is safe to compact away KeyChanges objects for a key, once the key has been removed. However it is only safe if all the history for a give key is removed in the same compaction event. If we have partial removal, the problem would be that another part may not be removed (due to not being in a compactable file, due to a new object being created in the future with the same key). Then on a rebuild event (i.e. due to a corrupted ledger being wiped) there would be rogue 2i entries. There are three ways forward here:
|
Currently, of the three options, the alternative compaction seems easier to implement. The scenario where this has an overhead of the system tends to take months to build up, so it shouldn't be necessary for this compaction to run frequently. It should be enough for there to be an additional config parameter like:
This would be a more expensive compaction (although scoring would be the same cost), but run at a very low frequency, with natural protection against concurrent compaction events, it is expected the overhead should not be overly expensive. |
The Some issues:
|
When moving away from using the Since then, the need to merge Riak logic into the leveled database was deemed to be unavoidable, and so the Implementing |
There is technical debt in leveled when using the
retain
recovery strategy. This permanently retains a history of an object in the Journal to help with tracking key changes (i.e. index updates).When used in Riak this can be cleared up through handoffs (e.g. join/leave operations).
This garbage has more side effects than originally anticipated. Especially when using spinning HDDs and regular journal compaction - it creates a background load of random read activity.
This needs to be improved.
The text was updated successfully, but these errors were encountered: