Journal compaction overhead with KeyChanges #306

martinsumner · 2020-03-06T11:27:33Z

There is technical debt in leveled when using the retain recovery strategy. This permanently retains a history of an object in the Journal to help with tracking key changes (i.e. index updates).

When used in Riak this can be cleared up through handoffs (e.g. join/leave operations).

This garbage has more side effects than originally anticipated. Especially when using spinning HDDs and regular journal compaction - it creates a background load of random read activity.

This needs to be improved.

The text was updated successfully, but these errors were encountered:

martinsumner · 2020-03-10T09:09:33Z

There is some initial mitigation to this issue.

#307

If a Journal evolves with large numbers of KeyChanges objects, then scoring is expensive and this is in part as the list of file positions to score for each file is unordered and randomly distributed. This means that the average jump is beyond any read-ahead, and there's a 50% chance the next jump will be behind not ahead.

To improve this, rather than fetching 100 positions 100 * SomeAdjustmentFactor positions could be selected, then sorted, and then an adjacent (within the sample) subset of 100 positions be chosen. This would ensure that the next position is always ahead, and is smaller in ratio to the SomeAdjustmentFactor. The process of fetching positions takes more CPU, but there exists the potential for the process of fetching keys for scoring to require less disk head movement and be more friendly to OS cache-filling behaviour.

martinsumner · 2020-03-10T09:12:51Z

There is also a PR to provide some further building blocks for improvement:

#308

This PR changes the penciller request to fetch the sequence number so that it returns current/replaced/missing rather than true/false - so that in the future alternative action may be possible based on replaced/mssing.

There is also a tech debt related to rebuilding a ledger from a journal. In this case there is a cost per file in the Journal related to never updating the MinSQN - so that the process has to loop aggressively at the start of each file before the first SQN occurs in a batch. This is corrected here.

The tests have also been expanded in this PR to ensure that there is better coverage of the combinations of updates, deletes, compaction and rebuilds.

martinsumner · 2020-03-10T09:28:47Z

In order to properly resolve this issue, there is potentially a need for another compaction type.

It is safe to compact away KeyChanges objects for a key, once the key has been removed. However it is only safe if all the history for a give key is removed in the same compaction event.

If we have partial removal, the problem would be that another part may not be removed (due to not being in a compactable file, due to a new object being created in the future with the same key). Then on a rebuild event (i.e. due to a corrupted ledger being wiped) there would be rogue 2i entries.

There are three ways forward here:

Address the scenario created by partial removal, and partially remove during standard compaction. Addressing the problem would mean some form of AAE between secondary index terms and object state.
Have an alternative compaction event which is a complete rebuild of the Journal based on removal of object history from the journal where the object is missing at a given snapshot. In this alternative compaction the process must first snapshot, then score based on the potential to clean history, then compact the full journal then switch to the new journal.
Revisit the recalc model of journal compaction. Originally there were intended to be 3 mechanisms for Journal compaction - recovr, retain and recalc. The recalc mechanism would actually avoid this problem. The recalc mechanism had been ignored as it requires the injection of logic from the domain into the store (i.e. how to create diff'd IndexSpecs). However, this may be better than existing bodges. It would also resolve the issue for replaced objects as well as missing objects.

martinsumner · 2020-03-10T09:46:42Z

Currently, of the three options, the alternative compaction seems easier to implement. The scenario where this has an overhead of the system tends to take months to build up, so it shouldn't be necessary for this compaction to run frequently. It should be enough for there to be an additional config parameter like:

leveled.journal_missingcompact_perc which should be an integer 0..100. This would determine the proportion of compaction events what would run the special "compact for missing objects" compactions - and could default to 0. If an operator has this problem, this percentage can be increased (and in most cases 1 would be a high enough value).

This would be a more expensive compaction (although scoring would be the same cost), but run at a very low frequency, with natural protection against concurrent compaction events, it is expected the overhead should not be overly expensive.

martinsumner · 2020-03-10T10:02:46Z

The recalc option is potentially the most complete solution.

Some issues:

It is possible to migrate forwards (it appears to be theoretically possible to switch from running a Journal in retain mode to recalc mode - as you simply have to ignore additional information in recalc mode), but not back (without relying on external anti-entropy).
The load time on startup, and on rebuild of the ledger will be slower - as a ledger read would be required before each write.

martinsumner · 2020-03-10T16:36:12Z

When moving away from using the recalc mechanism, at the time this was in-part as a desire for leveled to remain independent on Riak.

Since then, the need to merge Riak logic into the leveled database was deemed to be unavoidable, and so the leveled_head module was introduced - which allowed for riak logic to be merged, but also potentially other logic to be user-defined for other applications.

Implementing recalc may well be considered to be a natural progression from this change - it would simply require another function (for diffing index specs) in the leveled_head module.

martinsumner · 2020-03-30T19:26:11Z

#310

martinsumner closed this as completed Mar 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Journal compaction overhead with KeyChanges #306

Journal compaction overhead with KeyChanges #306

martinsumner commented Mar 6, 2020

martinsumner commented Mar 10, 2020

martinsumner commented Mar 10, 2020

martinsumner commented Mar 10, 2020 •

edited

Loading

martinsumner commented Mar 10, 2020 •

edited

Loading

martinsumner commented Mar 10, 2020

martinsumner commented Mar 10, 2020

martinsumner commented Mar 30, 2020

Journal compaction overhead with KeyChanges #306

Journal compaction overhead with KeyChanges #306

Comments

martinsumner commented Mar 6, 2020

martinsumner commented Mar 10, 2020

martinsumner commented Mar 10, 2020

martinsumner commented Mar 10, 2020 • edited Loading

martinsumner commented Mar 10, 2020 • edited Loading

martinsumner commented Mar 10, 2020

martinsumner commented Mar 10, 2020

martinsumner commented Mar 30, 2020

martinsumner commented Mar 10, 2020 •

edited

Loading

martinsumner commented Mar 10, 2020 •

edited

Loading