ISSUE 3220: Autorecovery does not process underreplicated empty ledgers #3239

dlg99 · 2022-04-22T04:32:53Z

Descriptions of the changes in this PR:

Motivation

Currently decomm malfunctions on an empty ledger:

Autorecovery ends up removing such ledger from the list of underreplicated ledgers but does not update the ensemble leaving the bookie being decommed there and Auditor will add these ledger back on the next run.

In greater details:

Currently the ReplicationWorker/LedgerChecker end up skipping fragment where ES > WQ and failed bookie is not in the writeset (or a ledger is empty).

E.g. with ES = 3, WQ = 2
Ensemble: (bk1, bk2, bk3)
bk3 is dead.
if ledger is empty (and e.g. closed) it is added to the list of underreplicated ledgers but autorecovery skips it (no failed fragments, dead bookie remains in the metadata) and simply removes it from UR ledgers (until next Auditor's run adds it back).

This becomes a problem when a few bookies with large number of empty ledgers get decommed.
Every run of the Auditor ends up in increase of znodes (added UR ledgers) and a lot of busy work for the autorecovery (overall increased load on zookeeper).

Changes

Added test for the scenario I encountered.
Tests expanded with post-autorecovery validation of ledger ensembles to make sure the decomemd bookie no longer used there.

Ledger checker now does not skip fragments with firstEntryId == -1 but checks if the bookie is actually available.

This changes current behavior.
The fragments for an empty ledger will end up "rereplicated" (metadata updated).

Master Issue: #3220

eolivelli

the fix (in LedgerChecker) makes sense to me.

I left some comments.
I am happy to see that we are re-enabling a test that was lost because it was too flaky

eolivelli · 2022-04-22T06:24:48Z

bookkeeper-server/src/test/java/org/apache/bookkeeper/replication/BookieAutoRecoveryTest.java

@@ -383,6 +383,7 @@ public void testEmptyLedgerLosesQuorumEventually() throws Exception {
        LOG.info("Killing last bookie, {}, in ensemble {}", replicaToKill,
                 lh.getLedgerMetadata().getAllEnsembles().get(0L));
        killBookie(replicaToKill);
+        startNewBookie();


why are we changing this existing test ?

The behavior has changed now.
basically autorecovery used to ignore failed bookie for empty ledgers if it is not in the writeset for entryId 0 when WQ < ES.
e.g. ensemble = (bl1, bk2, bk3), WQ = 2, ledger is empty.
bk3 is down, it is not in the writeset for entryId=0.

Auditor would add it to underreplicated, autorecovery would skip it and simply remove from underreplicated, go back to the start of the sentence.
This is nice until you end up with an empty ledger like that. or a few thousands of them.

eolivelli · 2022-04-22T06:26:01Z

bookkeeper-server/src/test/java/org/apache/bookkeeper/client/BookieDecommissionTest.java

@@ -130,11 +144,16 @@ public void testDecommissionForLedgersWithMultipleSegmentsAndNotWriteClosed() th
            lh4.addEntry(j, "data".getBytes());
        }

+        // avoiding autorecovery fencing the ledger
+        servers.forEach(srv -> srv.stopAutoRecovery());


I am not sure that "stopAutoRecovery" waits for autoRecovery to be totally stopped, maybe there is still some task running?

it does wait.
I reduced openLedgerRereplicationGracePeriod, thus autorecovery may fence the ledger and now the test has to stop/start it to avoid flakiness in this case (add to the ledger to force ensemble change)

eolivelli · 2022-04-22T06:26:49Z

bookkeeper-server/src/test/java/org/apache/bookkeeper/client/BookieDecommissionTest.java

@@ -88,7 +98,7 @@ public void testDecommissionBookie() throws Exception {
         */
        bkAdmin.decommissionBookie(BookieImpl.getBookieId(killedBookieConf));
        bkAdmin.triggerAudit();
-        Thread.sleep(500);
+        Thread.sleep(5000);


out of the scope of this PR, but in the future it would be better to wait for a specific condition, in order to reduce flakyness

eolivelli · 2022-04-22T06:27:26Z

bookkeeper-server/src/test/java/org/apache/bookkeeper/client/BookieDecommissionTest.java

        setAutoRecoveryEnabled(true);
    }

    @FlakyTest("https://github.com/apache/bookkeeper/issues/502")
+    @Test


so basically we were not running this test anymore.

great to see this running again

bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerChecker.java

eolivelli

Lgtm

Awesome work

nicoloboschi

LGTM

(cherry picked from commit eadbdd4)

dlg99 marked this pull request as draft April 22, 2022 04:32

eolivelli requested review from ivankelly, jiazhai, jvrao, merlimat and reddycharan April 22, 2022 06:21

eolivelli reviewed Apr 22, 2022

View reviewed changes

dlg99 requested a review from Ghatage April 25, 2022 18:54

dlg99 marked this pull request as ready for review April 25, 2022 19:06

eolivelli approved these changes Apr 25, 2022

View reviewed changes

nicoloboschi approved these changes Apr 26, 2022

View reviewed changes

Autorecovery to rereplicate empty ledgers

58f20e6

dlg99 force-pushed the decomm_stuck branch from 8c5b03c to 58f20e6 Compare April 27, 2022 17:05

eolivelli requested a review from rdhabalia April 28, 2022 06:32

eolivelli merged commit eadbdd4 into apache:master Jun 21, 2022

hangc0276 assigned dlg99 Jul 25, 2022

hangc0276 added type/bug area/autorecovery release/4.14.6 release/4.15.1 labels Jul 25, 2022

hangc0276 added this to the 4.16.0 milestone Jul 25, 2022

zymap pushed a commit that referenced this pull request Aug 1, 2022

Autorecovery to rereplicate empty ledgers (#3239)

b715cf7

(cherry picked from commit eadbdd4)

zymap added the cherry-picked/branch-4.15 label Aug 1, 2022

zhaohaidao mentioned this pull request Sep 7, 2022

Missing ledgers when inducing network packet loss in Bookkeeper 4.15 #3466

Closed

hangc0276 pushed a commit to hangc0276/bookkeeper that referenced this pull request Nov 5, 2022

Autorecovery to rereplicate empty ledgers (apache#3239)

94665e9

(cherry picked from commit eadbdd4)

hangc0276 pushed a commit to hangc0276/bookkeeper that referenced this pull request Nov 7, 2022

Autorecovery to rereplicate empty ledgers (apache#3239)

6bad86e

(cherry picked from commit eadbdd4)

hangc0276 added the cherry-picked/branch-4.14 label Nov 7, 2022

dlg99 mentioned this pull request Dec 6, 2022

Bookie decommission blocked due to OPEN state empty ledgers #3692

Closed

This was referenced Feb 1, 2023

Serious Performance problem caused by #3239 #3759

Closed

Fix serious Performance problem of getLastEntryInLedgerInternal #3769

Closed

Ghatage pushed a commit to sijie/bookkeeper that referenced this pull request Jul 12, 2024

Autorecovery to rereplicate empty ledgers (apache#3239)

ca9a38a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ISSUE 3220: Autorecovery does not process underreplicated empty ledgers #3239

ISSUE 3220: Autorecovery does not process underreplicated empty ledgers #3239

dlg99 commented Apr 22, 2022 •

edited

Loading

eolivelli left a comment

eolivelli Apr 22, 2022

dlg99 Apr 22, 2022

eolivelli Apr 22, 2022

dlg99 Apr 22, 2022

eolivelli Apr 22, 2022

eolivelli Apr 22, 2022

eolivelli left a comment

nicoloboschi left a comment

ISSUE 3220: Autorecovery does not process underreplicated empty ledgers #3239

ISSUE 3220: Autorecovery does not process underreplicated empty ledgers #3239

Conversation

dlg99 commented Apr 22, 2022 • edited Loading

Motivation

Changes

eolivelli left a comment

Choose a reason for hiding this comment

eolivelli Apr 22, 2022

Choose a reason for hiding this comment

dlg99 Apr 22, 2022

Choose a reason for hiding this comment

eolivelli Apr 22, 2022

Choose a reason for hiding this comment

dlg99 Apr 22, 2022

Choose a reason for hiding this comment

eolivelli Apr 22, 2022

Choose a reason for hiding this comment

eolivelli Apr 22, 2022

Choose a reason for hiding this comment

eolivelli left a comment

Choose a reason for hiding this comment

nicoloboschi left a comment

Choose a reason for hiding this comment

dlg99 commented Apr 22, 2022 •

edited

Loading