Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prioritize compaction of entry logs with the lowest amount of remaining usable data #3389

Open
dlg99 opened this issue Jul 6, 2022 · 0 comments

Comments

@dlg99
Copy link
Contributor

dlg99 commented Jul 6, 2022

FEATURE REQUEST

  1. Please describe the feature you are requesting.

Prioritize compaction to free up more space faster.

  1. Indicate the importance of this issue to you (blocker, must-have, should-have, nice-to-have).

must-have

  1. Provide any additional detail on your proposed use case for this feature.

Looking at GarbageCollectorThread:

doCompactEntryLogs() iterates over entry logs in whatever natural order they happen to be, picks the first with usage below thresholds and starts compacting.
Do major compaction it means we can start compaction with an entry log at 80% utilization instead of e.g. one with 10%.
This can be easily fixed by building a PriorityQueue of entry logs, ordering by lowest utilization (meta.getUsage()) to free up more space sooner.
Building of the queue should not take too much time and can be combined with doGcEntryLogs() which iterates over all entries in entryLogMetaMap anyway; memory-wise it should be fine too.

hangc0276 pushed a commit that referenced this issue Jul 22, 2022
…nt of remaining usable data (#3390)

Descriptions of the changes in this PR:


### Motivation

Prioritize compaction to free up more space faster.

### Changes

doCompactEntryLogs() iterates over entry logs in whatever natural order they happen to be, picks the first with usage below thresholds and starts compacting.

Added a Priority Queue of entry logs to pick ones with the most compactable space first; it also helps when the time for compaction is limited (via majorCompactionMaxTimeMillis / minorCompactionMaxTimeMillis), instead of spending time on rewriting files with more data we'll pick the files with the least amount of data first.

Master Issue: #3389
zymap pushed a commit that referenced this issue Aug 2, 2022
…nt of remaining usable data (#3390)

Descriptions of the changes in this PR:

### Motivation

Prioritize compaction to free up more space faster.

### Changes

doCompactEntryLogs() iterates over entry logs in whatever natural order they happen to be, picks the first with usage below thresholds and starts compacting.

Added a Priority Queue of entry logs to pick ones with the most compactable space first; it also helps when the time for compaction is limited (via majorCompactionMaxTimeMillis / minorCompactionMaxTimeMillis), instead of spending time on rewriting files with more data we'll pick the files with the least amount of data first.

Master Issue: #3389

(cherry picked from commit 1825677)
dlg99 added a commit to datastax/bookkeeper that referenced this issue Nov 19, 2022
…nt of remaining usable data (apache#3390)

Descriptions of the changes in this PR:

Prioritize compaction to free up more space faster.

doCompactEntryLogs() iterates over entry logs in whatever natural order they happen to be, picks the first with usage below thresholds and starts compacting.

Added a Priority Queue of entry logs to pick ones with the most compactable space first; it also helps when the time for compaction is limited (via majorCompactionMaxTimeMillis / minorCompactionMaxTimeMillis), instead of spending time on rewriting files with more data we'll pick the files with the least amount of data first.

Master Issue: apache#3389

(cherry picked from commit 1825677)
(cherry picked from commit 063cc8b)
dlg99 added a commit to dlg99/bookkeeper that referenced this issue Mar 14, 2023
…nt of remaining usable data (apache#3390)

Descriptions of the changes in this PR:

Prioritize compaction to free up more space faster.

doCompactEntryLogs() iterates over entry logs in whatever natural order they happen to be, picks the first with usage below thresholds and starts compacting.

Added a Priority Queue of entry logs to pick ones with the most compactable space first; it also helps when the time for compaction is limited (via majorCompactionMaxTimeMillis / minorCompactionMaxTimeMillis), instead of spending time on rewriting files with more data we'll pick the files with the least amount of data first.

Master Issue: apache#3389

(cherry picked from commit 1825677)
dlg99 added a commit to dlg99/bookkeeper that referenced this issue Mar 14, 2023
…nt of remaining usable data (apache#3390)

Descriptions of the changes in this PR:

Prioritize compaction to free up more space faster.

doCompactEntryLogs() iterates over entry logs in whatever natural order they happen to be, picks the first with usage below thresholds and starts compacting.

Added a Priority Queue of entry logs to pick ones with the most compactable space first; it also helps when the time for compaction is limited (via majorCompactionMaxTimeMillis / minorCompactionMaxTimeMillis), instead of spending time on rewriting files with more data we'll pick the files with the least amount of data first.

Master Issue: apache#3389

(cherry picked from commit 1825677)
dlg99 added a commit to dlg99/bookkeeper that referenced this issue Mar 16, 2023
…nt of remaining usable data (apache#3390)

Descriptions of the changes in this PR:

Prioritize compaction to free up more space faster.

doCompactEntryLogs() iterates over entry logs in whatever natural order they happen to be, picks the first with usage below thresholds and starts compacting.

Added a Priority Queue of entry logs to pick ones with the most compactable space first; it also helps when the time for compaction is limited (via majorCompactionMaxTimeMillis / minorCompactionMaxTimeMillis), instead of spending time on rewriting files with more data we'll pick the files with the least amount of data first.

Master Issue: apache#3389

(cherry picked from commit 1825677)
dlg99 added a commit to datastax/bookkeeper that referenced this issue Mar 16, 2023
#6)

* [Issue 3389] Prioritize compaction of entry logs with the lowest amount of remaining usable data (apache#3390)

Descriptions of the changes in this PR:

Prioritize compaction to free up more space faster.

doCompactEntryLogs() iterates over entry logs in whatever natural order they happen to be, picks the first with usage below thresholds and starts compacting.

Added a Priority Queue of entry logs to pick ones with the most compactable space first; it also helps when the time for compaction is limited (via majorCompactionMaxTimeMillis / minorCompactionMaxTimeMillis), instead of spending time on rewriting files with more data we'll pick the files with the least amount of data first.

Master Issue: apache#3389

(cherry picked from commit 1825677)

* checkstyle in random files

* flaky test
Ghatage pushed a commit to sijie/bookkeeper that referenced this issue Jul 12, 2024
…nt of remaining usable data (apache#3390)

Descriptions of the changes in this PR:


### Motivation

Prioritize compaction to free up more space faster.

### Changes

doCompactEntryLogs() iterates over entry logs in whatever natural order they happen to be, picks the first with usage below thresholds and starts compacting.

Added a Priority Queue of entry logs to pick ones with the most compactable space first; it also helps when the time for compaction is limited (via majorCompactionMaxTimeMillis / minorCompactionMaxTimeMillis), instead of spending time on rewriting files with more data we'll pick the files with the least amount of data first.

Master Issue: apache#3389
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant