-
Notifications
You must be signed in to change notification settings - Fork 902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ISSUE 3220: Autorecovery does not process underreplicated empty ledgers #3239
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the fix (in LedgerChecker) makes sense to me.
I left some comments.
I am happy to see that we are re-enabling a test that was lost because it was too flaky
@@ -383,6 +383,7 @@ public void testEmptyLedgerLosesQuorumEventually() throws Exception { | |||
LOG.info("Killing last bookie, {}, in ensemble {}", replicaToKill, | |||
lh.getLedgerMetadata().getAllEnsembles().get(0L)); | |||
killBookie(replicaToKill); | |||
startNewBookie(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we changing this existing test ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The behavior has changed now.
basically autorecovery used to ignore failed bookie for empty ledgers if it is not in the writeset for entryId 0 when WQ < ES.
e.g. ensemble = (bl1, bk2, bk3), WQ = 2, ledger is empty.
bk3 is down, it is not in the writeset for entryId=0.
Auditor would add it to underreplicated, autorecovery would skip it and simply remove from underreplicated, go back to the start of the sentence.
This is nice until you end up with an empty ledger like that. or a few thousands of them.
@@ -130,11 +144,16 @@ public void testDecommissionForLedgersWithMultipleSegmentsAndNotWriteClosed() th | |||
lh4.addEntry(j, "data".getBytes()); | |||
} | |||
|
|||
// avoiding autorecovery fencing the ledger | |||
servers.forEach(srv -> srv.stopAutoRecovery()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure that "stopAutoRecovery" waits for autoRecovery to be totally stopped, maybe there is still some task running?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it does wait.
I reduced openLedgerRereplicationGracePeriod, thus autorecovery may fence the ledger and now the test has to stop/start it to avoid flakiness in this case (add to the ledger to force ensemble change)
@@ -88,7 +98,7 @@ public void testDecommissionBookie() throws Exception { | |||
*/ | |||
bkAdmin.decommissionBookie(BookieImpl.getBookieId(killedBookieConf)); | |||
bkAdmin.triggerAudit(); | |||
Thread.sleep(500); | |||
Thread.sleep(5000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out of the scope of this PR, but in the future it would be better to wait for a specific condition, in order to reduce flakyness
setAutoRecoveryEnabled(true); | ||
} | ||
|
||
@FlakyTest("https://github.com/apache/bookkeeper/issues/502") | ||
@Test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so basically we were not running this test anymore.
great to see this running again
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerChecker.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm
Awesome work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
(cherry picked from commit eadbdd4)
(cherry picked from commit eadbdd4)
(cherry picked from commit eadbdd4)
Descriptions of the changes in this PR:
Motivation
Currently decomm malfunctions on an empty ledger:
Autorecovery ends up removing such ledger from the list of underreplicated ledgers but does not update the ensemble leaving the bookie being decommed there and Auditor will add these ledger back on the next run.
In greater details:
Currently the ReplicationWorker/LedgerChecker end up skipping fragment where ES > WQ and failed bookie is not in the writeset (or a ledger is empty).
E.g. with ES = 3, WQ = 2
Ensemble: (bk1, bk2, bk3)
bk3 is dead.
if ledger is empty (and e.g. closed) it is added to the list of underreplicated ledgers but autorecovery skips it (no failed fragments, dead bookie remains in the metadata) and simply removes it from UR ledgers (until next Auditor's run adds it back).
This becomes a problem when a few bookies with large number of empty ledgers get decommed.
Every run of the Auditor ends up in increase of znodes (added UR ledgers) and a lot of busy work for the autorecovery (overall increased load on zookeeper).
Changes
Added test for the scenario I encountered.
Tests expanded with post-autorecovery validation of ledger ensembles to make sure the decomemd bookie no longer used there.
Ledger checker now does not skip fragments with firstEntryId == -1 but checks if the bookie is actually available.
This changes current behavior.
The fragments for an empty ledger will end up "rereplicated" (metadata updated).
Master Issue: #3220