Missing ledgers when inducing network packet loss in Bookkeeper 4.15 #3466

RaulGracia · 2022-09-02T10:42:05Z

BUG REPORT

Describe the bug

During our tests of Pravega (https://cncf.pravega.io/), we have detected a regression after upgrading to Bookkeeper 4.15 (pravega/pravega#6676). The regression is consistently reproducible when doing a fault injection test (30% network packet loss). Note that this kind of test consistently pass using Bookkeeper 4.14, for which we believe that is a regression or a change in the behavior of the new Bookkeeper release.

A reproduction of this issue goes as follows:

Pravega creates a ledger, in this case ledger 3246:

2022-07-20T14:45:25.706422066+00:00 stdout F 2022-07-20 14:45:25,706 7043990 [ZKC-connect-executor-0-SendThread(zookeeper-client:2181)] DEBUG org.apache.zookeeper.ClientCnxn - Reading reply session id: 0x2016bc13e0a001c, packet:: clientPath:/pravega/pravega/bookkeeper/ledgers/00/0000/L3246 serverPath:/pravega/pravega/bookkeeper/ledgers/00/0000/L3246 finished:false header:: 127,1  replyHeader:: 127,38654707364,0  request:: '/pravega/pravega/bookkeeper/ledgers/00/0000/L3246,#426f6f6b69654d65746164617461466f726d617456657273696f6e933affffff8c18210218020ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff12813238a19626f6f6b6b65657065722d626f6f6b69652d322d3131353830a19626f6f6b6b65657065722d626f6f6b69652d302d32383831361003834204825a16ab6170706c69636174696f6e127507261766567615a15af426f6f6b4b65657065724c6f674964122313360ffffffe2ffffffe5ffffffcdffffff8bffffff81ffffff93ffffffd8ffffff9c10,v{s{31,s{'world,'anyone}}},0  response:: '/pravega/pravega/bookkeeper/ledgers/00/0000/L3246
2022-07-20T14:45:25.706687124+00:00 stdout F 2022-07-20 14:45:25,706 7043990 [ZKC-connect-executor-0-EventThread] INFO  o.a.b.client.LedgerCreateOp - Ensemble: [bookkeeper-bookie-2-11580, bookkeeper-bookie-0-28816] for ledger: 3246
2022-07-20T14:45:25.706725535+00:00 stdout F 2022-07-20 14:45:25,706 7043990 [core-13] INFO  i.p.s.s.i.bookkeeper.BookKeeperLog - Log[13]: Created Ledger 3246.
2022-07-20T14:45:47.426112317+00:00 stdout F 2022-07-20 14:45:47,426 7065710 [ZKC-connect-executor-0-SendThread(zookeeper-client:2181)] DEBUG org.apache.zookeeper.ClientCnxn - Reading reply session id: 0x2016bc13e0a001c, packet:: clientPath:/pravega/pravega/bookkeeper/ledgers/00/0000/L3246 serverPath:/pravega/pravega/bookkeeper/ledgers/00/0000/L3246 finished:false header:: 145,4  replyHeader:: 145,38654707423,0  request:: '/pravega/pravega/bookkeeper/ledgers/00/0000/L3246,F  response:: #426f6f6b69654d65746164617461466f726d617456657273696f6e933affffff8c18210218020ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff12813238a19626f6f6b6b65657065722d626f6f6b69652d322d3131353830a19626f6f6b6b65657065722d626f6f6b69652d302d32383831361003834204825a16ab6170706c69636174696f6e127507261766567615a15af426f6f6b4b65657065724c6f674964122313360ffffffe2ffffffe5ffffffcdffffff8bffffff81ffffff93ffffffd8ffffff9c10,s{38654707364,38654707364,1658328325290,1658328325290,0,0,0,0,172,0,38654707364}
2022-07-20T14:45:47.426207948+00:00 stdout F 2022-07-20 14:45:47,426 7065710 [ZKC-connect-executor-0-EventThread] DEBUG o.a.b.meta.AbstractZkLedgerManager - Registered ledger metadata listener ReadOnlyLedgerHandle(lid = 3246, id = 672259987) on ledger 3246.

Probably after some network failures induced by the test, we see that this ledger is unregistered by Bookkeeper client (has it been actually removed from Zookeeper?):

2022-07-20T14:45:47.426475848+00:00 stdout F 2022-07-20 14:45:47,426 7065710 [ZKC-connect-executor-0-SendThread(zookeeper-client:2181)] DEBUG org.apache.zookeeper.ClientCnxn - Reading reply session id: 0x2016bc13e0a001c, packet:: clientPath:/pravega/pravega/bookkeeper/ledgers/00/0000/L3246 serverPath:/pravega/pravega/bookkeeper/ledgers/00/0000/L3246 finished:false header:: 146,4  replyHeader:: 146,38654707423,0  request:: '/pravega/pravega/bookkeeper/ledgers/00/0000/L3246,T  response:: #426f6f6b69654d65746164617461466f726d617456657273696f6e933affffff8c18210218020ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff12813238a19626f6f6b6b65657065722d626f6f6b69652d322d3131353830a19626f6f6b6b65657065722d626f6f6b69652d302d32383831361003834204825a16ab6170706c69636174696f6e127507261766567615a15af426f6f6b4b65657065724c6f674964122313360ffffffe2ffffffe5ffffffcdffffff8bffffff81ffffff93ffffffd8ffffff9c10,s{38654707364,38654707364,1658328325290,1658328325290,0,0,0,0,172,0,38654707364}
2022-07-20T14:45:47.426637438+00:00 stdout F 2022-07-20 14:45:47,426 7065710 [BookKeeperClientWorker-OrderedExecutor-7-0] DEBUG o.a.b.client.PendingReadLacOp - Read LAC complete with enough validResponse for ledger: 3246 LAC: -1
2022-07-20T14:45:47.426737033+00:00 stdout F 2022-07-20 14:45:47,426 7065710 [core-13] DEBUG o.a.b.meta.AbstractZkLedgerManager - Unregistered ledger metadata listener ReadOnlyLedgerHandle(lid = 3246, id = 672259987) on ledger 3246.
2022-07-20T14:46:16.612224510+00:00 stdout F 2022-07-20 14:46:16,612 7094896 [BookKeeperClientWorker-OrderedExecutor-7-0] DEBUG o.a.b.proto.PerChannelBookieClient - Could not write Add request to bookie 192.169.1.60/192.169.1.60:3181 for ledger 3246, entry 0
2022-07-20T14:46:16.612271701+00:00 stdout F 2022-07-20 14:46:16,612 7094896 [BookKeeperClientWorker-OrderedExecutor-7-0] ERROR o.a.bookkeeper.client.PendingAddOp - Write of ledger entry to quorum failed: L3246 E0
2022-07-20T14:46:16.616440402+00:00 stdout F 2022-07-20 14:46:16,616 7094900 [BookKeeperClientWorker-OrderedExecutor-7-0] DEBUG o.a.b.proto.PerChannelBookieClient - Could not write ReadLAC request to bookie null for ledger 3246, entry -1

After that, Pravega still assumes that the ledger is there (as Pravega itself has not deleted it), but any further operation on this ledger fails:

2022-07-20T14:46:16.630253410+00:00 stdout F 2022-07-20 14:46:16,629 7094913 [core-30] DEBUG o.a.bookkeeper.client.LedgerHandle - Failed to close ledger 3246 :
2022-07-20T14:46:16.630795088+00:00 stdout F 2022-07-20 14:46:16,630 7094914 [core-30] ERROR i.p.s.s.i.bookkeeper.BookKeeperLog - Log[13]: Unable to close LedgerHandle for Ledger 3246.
2022-07-20T14:46:16.630795088+00:00 stdout F io.pravega.segmentstore.storage.DurableDataLogException: Unable to close ledger 3246.

This makes that, when finding the ledger missing, that Segment Container in Pravega is stuck and cannot recover:

2022-07-20T15:59:48.793802385+00:00 stdout F 2022-07-20 15:59:48,793 11507077 [BookKeeperClientWorker-OrderedExecutor-7-0] DEBUG o.a.bookkeeper.client.PendingReadOp - No such entry found on bookie.  L3246 E0 bookie: bookkeeper-bookie-0-28816
2022-07-20T15:59:48.793815178+00:00 stdout F 2022-07-20 15:59:48,793 11507077 [BookKeeperClientWorker-OrderedExecutor-7-0] ERROR o.a.bookkeeper.client.PendingReadOp - Read of ledger entry failed: L3246 E0-E0, Sent to [bookkeeper-bookie-0-28816, bookkeeper-bookie-2-11580], Heard from [] : bitset = {}, Error = 'No such ledger exists on Bookies'. First unread entry is (-1, rc = null)
2022-07-20T15:59:48.793906376+00:00 stdout F 2022-07-20 15:59:48,793 11507077 [core-1] INFO  i.p.s.s.c.StreamSegmentContainerMetadata - SegmentContainer[13]: Exit RecoveryMode.
...
2022-07-20T15:59:48.794030332+00:00 stdout F 2022-07-20 15:59:48,794 11507078 [core-8] INFO  i.p.s.s.r.StreamSegmentReadIndex - ReadIndex[13-123]: Closed.
2022-07-20T15:59:48.794605613+00:00 stdout F 2022-07-20 15:59:48,794 11507078 [core-1] ERROR i.p.s.server.logs.DurableLog - DurableLog[13] Recovery FAILED.
2022-07-20T15:59:48.794605613+00:00 stdout F org.apache.bookkeeper.client.BKException$BKNoSuchLedgerExistsException: No such ledger exists on Bookies
2022-07-20T15:59:48.794605613+00:00 stdout F 	at org.apache.bookkeeper.client.BKException.create(BKException.java:74)
2022-07-20T15:59:48.794605613+00:00 stdout F 	at org.apache.bookkeeper.client.PendingReadOp.submitCallback(PendingReadOp.java:654)
2022-07-20T15:59:48.794605613+00:00 stdout F 	at org.apache.bookkeeper.client.PendingReadOp$LedgerEntryRequest.fail(PendingReadOp.java:176)
2022-07-20T15:59:48.794605613+00:00 stdout F 	at org.apache.bookkeeper.client.PendingReadOp$SequenceReadRequest.sendNextRead(PendingReadOp.java:397)
2022-07-20T15:59:48.794605613+00:00 stdout F 	at org.apache.bookkeeper.client.PendingReadOp$SequenceReadRequest.logErrorAndReattemptRead(PendingReadOp.java:440)

At the Bookies side, we can see that the fence request is received by one Bookie (bookie-2) at 2022-07-20T14:48:11:

2-bookie-9c88965c-2022-07-20T14-49-58Z.log.gz:2022-07-20T14:48:11.157496389+00:00 stdout F 14:48:11,157 INFO  Ledger fence request received for ledger: 3246 from address: /192.168.0.228:57782
2-bookie-9c88965c-2022-07-20T14-49-58Z.log.gz:2022-07-20T14:48:11.160872045+00:00 stdout F 14:48:11,160 INFO  Ledger fence request received for ledger: 3246 from address: /192.168.0.228:57782

And that ledger seems to be present in the metadata of ledgers, at least for some time (look for L3246):

2-bookie-9c88965c-2022-07-20T14-55-24Z.log.gz:2022-07-20T14:51:48.716729331+00:00 stdout F 14:51:48,716 DEBUG Reading reply session id: 0x3016baeab7f000f, packet:: clientPath:/pravega/pravega/bookkeeper/ledgers/00/0000 serverPath:/pravega/pravega/bookkeeper/ledgers/00/0000 finished:false header:: 36287,8  replyHeader:: 36287,38654709412,0  request:: '/pravega/pravega/bookkeeper/ledgers/00/0000,F  response:: v{'L3195,'L3070,'L3071,'L3192,'L2784,'L3078,'L3199,'L3079,'L3076,'L3197,'L3077,'L3198,'L3081,'L0377,'L3089,'L3087,'L0372,'L3052,'L3173,'L3053,'L3295,'L3050,'L3171,'L3172,'L3293,'L3290,'L3170,'L3291,'L0463,'L3058,'L0464,'L3059,'L3056,'L3177,'L3057,'L3178,'L0461,'L3054,'L3175,'L3055,'L3176,'L3297,'L3063,'L3184,'L3064,'L3061,'L3182,'L3062,'L3183,'L3181,'L3069,'L3067,'L3188,'L3068,'L3065,'L3066,'L3310,'L1814,'L3317,'L3203,'L3324,'L3322,'L3202,'L3323,'L3200,'L0734,'L1705,'L0737,'L3209,'L0736,'L3207,'L3328,'L3208,'L3329,'L3205,'L0732,'L3327,'L0739,'L0738,'L3096,'L3094,'L3095,'L3092,'L3093,'L3091,'L0388,'L0702,'L0704,'L0703,'L2327,'L0709,'L0706,'L0705,'L0708,'L0707,'L0390,'L3302,'L2452,'L0713,'L0712,'L3308,'L0714,'L3306,'L3307,'L0711,'L0710,'L3305,'L3115,'L3236,'L3116,'L3237,'L0762,'L3113,'L3234,'L3114,'L3233,'L3230,'L3110,'L3231,'L0768,'L0404,'L0767,'L0769,'L0764,'L3119,'L0763,'L3238,'L0765,'L3118,'L0771,'L3126,'L1983,'L3127,'L3248,'L3245,'L3125,'L3246,'L3122,'L3243,'L3123,'L3244,'L3120,'L3000,'L0418,'L2957,'L3009,'L2955,'L3007,'L3214,'L0740,'L2485,'L3211,'L3330,'L0745,'L3218,'L0741,'L0744,'L3104,'L3105,'L3226,'L3102,'L3103,'L3224,'L3101,'L2130,'L0757,'L0756,'L2816,'L0759,'L0758,'L3108,'L3107,'L3030,'L3151,'L3031,'L3152,'L3273,'L3150,'L3038,'L3159,'L2984,'L3278,'L2982,'L3037,'L3279,'L3155,'L2980,'L3156,'L3277,'L2064,'L3032,'L3153,'L3274,'L3033,'L3154,'L3275,'L2989,'L0444,'L2988,'L0447,'L2985,'L2986,'L2073,'L3041,'L3162,'L3283,'L3042,'L3163,'L3160,'L3281,'L3040,'L3161,'L3280,'L0694,'L0696,'L2992,'L3047,'L3048,'L3169,'L3166,'L3167,'L0692,'L3043,'L3164,'L3165,'L2999,'L2996,'L3130,'L3137,'L3017,'L3014,'L3135,'L3015,'L3136,'L3012,'L3134,'L3010,'L3131,'L3252,'L3132,'L2969,'L2966,'L3018,'L3139,'L3019,'L3140,'L3020,'L3141,'L3148,'L3269,'L0671,'L2610,'L2973,'L3149,'L3025,'L2971,'L3147,'L3024,'L3145,'L3142,'L3143,'L2979,'L2735,'L2974,'L3029}
2-bookie-9c88965c-2022-07-20T14-55-24Z.log.gz:2022-07-20T14:51:48.736027964+00:00 stdout F 14:51:48,735 DEBUG Reading reply session id: 0x3016baeab7f000f, packet:: clientPath:/pravega/pravega/bookkeeper/ledgers/00/0000/L3246 serverPath:/pravega/pravega/bookkeeper/ledgers/00/0000/L3246 finished:false header:: 36521,4  replyHeader:: 36521,38654709412,0  request:: '/pravega/pravega/bookkeeper/ledgers/00/0000/L3246,F  response:: #426f6f6b69654d65746164617461466f726d617456657273696f6e933a7c8210218ffffff9f92002833238a19626f6f6b6b65657065722d626f6f6b69652d322d3131353830a19626f6f6b6b65657065722d626f6f6b69652d302d32383831361003834204825a16ab6170706c69636174696f6e127507261766567615a15af426f6f6b4b65657065724c6f6749641223133600,s{38654707364,38654708391,1658328325290,1658328491164,2,0,0,0,155,0,38654707364}

2-bookie-9c88965c-2022-07-20T14-55-52Z.log.gz:2022-07-20T14:55:21.070507213+00:00 stdout F 14:55:21,070 DEBUG Reading reply session id: 0x3016baeab7f000f, packet:: clientPath:/pravega/pravega/bookkeeper/ledgers/00/0000 serverPath:/pravega/pravega/bookkeeper/ledgers/00/0000 finished:false header:: 36739,8  replyHeader:: 36739,38654709965,0  request:: '/pravega/pravega/bookkeeper/ledgers/00/0000,F  response:: v{'L3195,'L3070,'L3071,'L3192,'L2784,'L3078,'L3199,'L3079,'L3076,'L3197,'L3077,'L3198,'L3081,'L0377,'L3089,'L3087,'L0372,'L3052,'L3053,'L3295,'L3050,'L3171,'L3172,'L3293,'L3170,'L3291,'L0463,'L3058,'L0464,'L3059,'L3056,'L3177,'L3057,'L3178,'L0461,'L3054,'L3175,'L3055,'L3176,'L3063,'L3184,'L3064,'L3061,'L3182,'L3062,'L3183,'L3181,'L3069,'L3067,'L3188,'L3068,'L3065,'L3066,'L1814,'L3317,'L3203,'L3324,'L3322,'L3202,'L3323,'L3200,'L0734,'L1705,'L0737,'L3209,'L0736,'L3207,'L3208,'L3329,'L3205,'L0732,'L3327,'L0739,'L0738,'L3096,'L3094,'L3095,'L3092,'L3093,'L3091,'L0388,'L0702,'L0704,'L0703,'L2327,'L0709,'L0706,'L0705,'L0708,'L0707,'L0390,'L3302,'L2452,'L0713,'L0712,'L3308,'L0714,'L3306,'L3307,'L0711,'L0710,'L3305,'L3350,'L3115,'L3236,'L3116,'L3237,'L3358,'L0762,'L3113,'L3234,'L3114,'L3353,'L3233,'L3354,'L3230,'L3351,'L3110,'L3231,'L3352,'L0768,'L0404,'L0767,'L0769,'L0764,'L3119,'L0763,'L3238,'L0765,'L3118,'L3360,'L0771,'L3126,'L3368,'L1983,'L3248,'L3245,'L3366,'L3125,'L3246,'L3122,'L3243,'L3364,'L3123,'L3244,'L3365,'L3120,'L3000,'L3363,'L0418,'L2957,'L3009,'L2955,'L3007,'L3214,'L0740,'L2485,'L3211,'L3330,'L0745,'L3218,'L0741,'L0744,'L3338,'L3104,'L3346,'L3105,'L3226,'L3347,'L3102,'L3103,'L3224,'L3345,'L3101,'L2130,'L3341,'L0757,'L0756,'L2816,'L0759,'L0758,'L3108,'L3348,'L3107,'L3349,'L3030,'L3151,'L3031,'L3152,'L3273,'L3150,'L3390,'L3038,'L3159,'L2984,'L3278,'L2982,'L3037,'L3279,'L3155,'L2980,'L3156,'L3277,'L2064,'L3032,'L3153,'L3274,'L3033,'L3154,'L3275,'L2989,'L0444,'L2988,'L0447,'L2985,'L2986,'L2073,'L3041,'L3162,'L3042,'L3163,'L3160,'L3281,'L3040,'L3161,'L3280,'L0694,'L0696,'L2992,'L3047,'L3048,'L3169,'L3166,'L3167,'L0692,'L3043,'L3164,'L3165,'L2999,'L2996,'L3130,'L3372,'L3370,'L3137,'L3379,'L3017,'L3014,'L3135,'L3377,'L3015,'L3136,'L3378,'L3012,'L3375,'L3134,'L3010,'L3131,'L3252,'L3373,'L3132,'L3374,'L2969,'L2966,'L3018,'L3139,'L3019,'L3140,'L3382,'L3020,'L3141,'L3383,'L3380,'L3381,'L3148,'L3269,'L0671,'L2610,'L2973,'L3149,'L3025,'L3388,'L2971,'L3147,'L3389,'L3024,'L3387,'L3142,'L3384,'L3143,'L3385,'L2979,'L2735,'L2974,'L3029}
2-bookie-9c88965c-2022-07-20T14-55-52Z.log.gz:2022-07-20T14:55:21.090055084+00:00 stdout F 14:55:21,090 DEBUG Reading reply session id: 0x3016baeab7f000f, packet:: clientPath:/pravega/pravega/bookkeeper/ledgers/00/0000/L3246 serverPath:/pravega/pravega/bookkeeper/ledgers/00/0000/L3246 finished:false header:: 36970,4  replyHeader:: 36970,38654709965,0  request:: '/pravega/pravega/bookkeeper/ledgers/00/0000/L3246,F  response:: #426f6f6b69654d65746164617461466f726d617456657273696f6e933a7c8210218ffffff9f92002833238a19626f6f6b6b65657065722d626f6f6b69652d322d3131353830a19626f6f6b6b65657065722d626f6f6b69652d302d32383831361003834204825a16ab6170706c69636174696f6e127507261766567615a15af426f6f6b4b65657065724c6f6749641223133600,s{38654707364,38654708391,1658328325290,1658328491164,2,0,0,0,155,0,38654707364}
2-bookie-9c88965c-2022-07-20T14-55-52Z.log.gz:2022-07-20T14:55:21.095978533+00:00 stdout F 14:55:21,095 INFO  Following ledgers: [3329, 771, 3076, 3077, 3081, 3338, 2064, 3089, 3346, 3091, 3092, 3348, 1814, 3094, 3350, 2327, 3095, 3351, 3096, 3352, 2073, 3354, 3101, 3102, 3104, 3105, 3107, 3108, 3114, 3115, 3116, 3118, 3119, 2610, 3122, 3123, 3125, 3130, 3131, 3132, 3134, 3135, 3136, 3139, 3140, 3141, 3143, 3147, 3148, 3150, 3151, 3152, 3153, 3155, 3156, 3159, 3160, 3162, 3163, 3164, 3165, 3166, 3167, 3169, 3171, 3172, 3175, 3176, 3177, 3182, 3183, 3184, 372, 3199, 3202, 388, 3205, 390, 3208, 3209, 3211, 2957, 3214, 2452, 3224, 3226, 2971, 3230, 671, 3231, 3233, 3236, 1705, 3243, 2988, 3244, 3245, 3246, 2735, 692, 2485, 694, 2999, 3000, 702, 447, 703, 704, 707, 708, 710, 3014, 712, 713, 3273, 714, 3274, 3277, 3278, 463, 3025, 3281, 3029, 3031, 3032, 3033, 3291, 732, 3037, 734, 3038, 3295, 736, 3040, 737, 3041, 738, 3042, 739, 3043, 740, 741, 3047, 744, 745, 3305, 3050, 3306, 3307, 3052, 3308, 3053, 3054, 3055, 3057, 3058, 756, 3061, 3317, 758, 759, 3063, 3064, 3065, 3066, 763, 3068, 765, 3069, 767, 3071, 3327] of bookie: [bookkeeper-bookie-0-28816] are identified as underreplicated

2-bookie-9c88965c-2022-07-20T15-00-53Z.log.gz:2022-07-20T14:56:19.912265378+00:00 stdout F 14:56:19,912 DEBUG Reading reply session id: 0x1016bac9b630005, packet:: clientPath:/pravega/pravega/bookkeeper/ledgers/00/0000 serverPath:/pravega/pravega/bookkeeper/ledgers/00/0000 finished:false header:: 6,8  replyHeader:: 6,38654710371,0  request:: '/pravega/pravega/bookkeeper/ledgers/00/0000,F  response:: v{'L3195,'L3070,'L3071,'L3192,'L2784,'L3078,'L3199,'L3079,'L3076,'L3197,'L3077,'L3198,'L3081,'L3401,'L0377,'L3400,'L3089,'L3087,'L0372,'L3407,'L3408,'L3405,'L3406,'L3403,'L3404,'L3052,'L3053,'L3295,'L3050,'L3171,'L3172,'L3293,'L3170,'L3291,'L0463,'L3058,'L0464,'L3059,'L3056,'L3177,'L3057,'L3178,'L0461,'L3054,'L3175,'L3055,'L3176,'L3063,'L3184,'L3064,'L3061,'L3182,'L3062,'L3183,'L3181,'L3069,'L3067,'L3188,'L3068,'L3065,'L3066,'L1814,'L3317,'L3203,'L3324,'L3322,'L3202,'L3323,'L3200,'L0734,'L1705,'L0737,'L3209,'L0736,'L3207,'L3208,'L3205,'L0732,'L3327,'L0739,'L0738,'L3096,'L3094,'L3095,'L3092,'L3093,'L3091,'L3412,'L3413,'L0388,'L3410,'L3411,'L0702,'L0704,'L3418,'L0703,'L3419,'L2327,'L3416,'L3417,'L3414,'L3415,'L0709,'L0706,'L0705,'L0708,'L0707,'L0390,'L3302,'L3423,'L3424,'L3421,'L3422,'L2452,'L3420,'L0713,'L0712,'L3308,'L0714,'L3306,'L3307,'L0711,'L3425,'L0710,'L3305,'L3426,'L3350,'L3115,'L3236,'L3116,'L3237,'L3358,'L0762,'L3113,'L3234,'L3114,'L3353,'L3233,'L3354,'L3230,'L3351,'L3110,'L3231,'L3352,'L0768,'L0404,'L0767,'L0769,'L0764,'L3119,'L0763,'L3238,'L0765,'L3118,'L3360,'L0771,'L3126,'L3368,'L1983,'L3248,'L3245,'L3366,'L3125,'L3246,'L3122,'L3243,'L3364,'L3123,'L3244,'L3365,'L3120,'L3000,'L3363,'L0418,'L2957,'L3009,'L2955,'L3007,'L3214,'L0740,'L2485,'L3211,'L3330,'L0745,'L3218,'L0741,'L0744,'L3338,'L3104,'L3346,'L3105,'L3226,'L3347,'L3102,'L3103,'L3224,'L3345,'L3101,'L2130,'L3341,'L0757,'L0756,'L2816,'L0759,'L0758,'L3108,'L3348,'L3107,'L3349,'L3030,'L3151,'L3393,'L3031,'L3152,'L3273,'L3391,'L3150,'L3392,'L3038,'L3159,'L2984,'L3278,'L3399,'L2982,'L3037,'L3279,'L3155,'L2980,'L3156,'L3277,'L3398,'L2064,'L3032,'L3153,'L3274,'L3395,'L3033,'L3154,'L3275,'L3396,'L2989,'L0444,'L2988,'L0447,'L2985,'L2986,'L2073,'L3041,'L3162,'L3042,'L3163,'L3160,'L3281,'L3040,'L3161,'L3280,'L0694,'L0696,'L2992,'L3047,'L3048,'L3169,'L3166,'L3167,'L0692,'L3043,'L3164,'L3165,'L2999,'L2996,'L3130,'L3372,'L3137,'L3017,'L3014,'L3135,'L3015,'L3136,'L3378,'L3012,'L3134,'L3010,'L3131,'L3252,'L3132,'L3374,'L2969,'L2966,'L3018,'L3139,'L3019,'L3140,'L3382,'L3020,'L3141,'L3148,'L3269,'L0671,'L2610,'L2973,'L3149,'L3025,'L3388,'L2971,'L3147,'L3389,'L3024,'L3142,'L3384,'L3143,'L3385,'L2979,'L2735,'L2974,'L3029}
2-bookie-9c88965c-2022-07-20T15-00-53Z.log.gz:2022-07-20T14:56:19.932194546+00:00 stdout F 14:56:19,932 DEBUG Reading reply session id: 0x1016bac9b630005, packet:: clientPath:/pravega/pravega/bookkeeper/ledgers/00/0000 serverPath:/pravega/pravega/bookkeeper/ledgers/00/0000 finished:false header:: 12,8  replyHeader:: 12,38654710371,0  request:: '/pravega/pravega/bookkeeper/ledgers/00/0000,F  response:: v{'L3195,'L3070,'L3071,'L3192,'L2784,'L3078,'L3199,'L3079,'L3076,'L3197,'L3077,'L3198,'L3081,'L3401,'L0377,'L3400,'L3089,'L3087,'L0372,'L3407,'L3408,'L3405,'L3406,'L3403,'L3404,'L3052,'L3053,'L3295,'L3050,'L3171,'L3172,'L3293,'L3170,'L3291,'L0463,'L3058,'L0464,'L3059,'L3056,'L3177,'L3057,'L3178,'L0461,'L3054,'L3175,'L3055,'L3176,'L3063,'L3184,'L3064,'L3061,'L3182,'L3062,'L3183,'L3181,'L3069,'L3067,'L3188,'L3068,'L3065,'L3066,'L1814,'L3317,'L3203,'L3324,'L3322,'L3202,'L3323,'L3200,'L0734,'L1705,'L0737,'L3209,'L0736,'L3207,'L3208,'L3205,'L0732,'L3327,'L0739,'L0738,'L3096,'L3094,'L3095,'L3092,'L3093,'L3091,'L3412,'L3413,'L0388,'L3410,'L3411,'L0702,'L0704,'L3418,'L0703,'L3419,'L2327,'L3416,'L3417,'L3414,'L3415,'L0709,'L0706,'L0705,'L0708,'L0707,'L0390,'L3302,'L3423,'L3424,'L3421,'L3422,'L2452,'L3420,'L0713,'L0712,'L3308,'L0714,'L3306,'L3307,'L0711,'L3425,'L0710,'L3305,'L3426,'L3350,'L3115,'L3236,'L3116,'L3237,'L3358,'L0762,'L3113,'L3234,'L3114,'L3353,'L3233,'L3354,'L3230,'L3351,'L3110,'L3231,'L3352,'L0768,'L0404,'L0767,'L0769,'L0764,'L3119,'L0763,'L3238,'L0765,'L3118,'L3360,'L0771,'L3126,'L3368,'L1983,'L3248,'L3245,'L3366,'L3125,'L3246,'L3122,'L3243,'L3364,'L3123,'L3244,'L3365,'L3120,'L3000,'L3363,'L0418,'L2957,'L3009,'L2955,'L3007,'L3214,'L0740,'L2485,'L3211,'L3330,'L0745,'L3218,'L0741,'L0744,'L3338,'L3104,'L3346,'L3105,'L3226,'L3347,'L3102,'L3103,'L3224,'L3345,'L3101,'L2130,'L3341,'L0757,'L0756,'L2816,'L0759,'L0758,'L3108,'L3348,'L3107,'L3349,'L3030,'L3151,'L3393,'L3031,'L3152,'L3273,'L3391,'L3150,'L3392,'L3038,'L3159,'L2984,'L3278,'L3399,'L2982,'L3037,'L3279,'L3155,'L2980,'L3156,'L3277,'L3398,'L2064,'L3032,'L3153,'L3274,'L3395,'L3033,'L3154,'L3275,'L3396,'L2989,'L0444,'L2988,'L0447,'L2985,'L2986,'L2073,'L3041,'L3162,'L3042,'L3163,'L3160,'L3281,'L3040,'L3161,'L3280,'L0694,'L0696,'L2992,'L3047,'L3048,'L3169,'L3166,'L3167,'L0692,'L3043,'L3164,'L3165,'L2999,'L2996,'L3130,'L3372,'L3137,'L3017,'L3014,'L3135,'L3015,'L3136,'L3378,'L3012,'L3134,'L3010,'L3131,'L3252,'L3132,'L3374,'L2969,'L2966,'L3018,'L3139,'L3019,'L3140,'L3382,'L3020,'L3141,'L3148,'L3269,'L0671,'L2610,'L2973,'L3149,'L3025,'L3388,'L2971,'L3147,'L3389,'L3024,'L3142,'L3384,'L3143,'L3385,'L2979,'L2735,'L2974,'L3029}

2-bookie-9c88965c-2022-07-20T15-00-53Z.log.gz:2022-07-20T14:56:32.051382550+00:00 stdout F 14:56:32,051 DEBUG Reading reply session id: 0x3016baeab7f000f, packet:: clientPath:/pravega/pravega/bookkeeper/ledgers/00/0000/L3246 serverPath:/pravega/pravega/bookkeeper/ledgers/00/0000/L3246 finished:false header:: 40548,4  replyHeader:: 40548,38654710417,0  request:: '/pravega/pravega/bookkeeper/ledgers/00/0000/L3246,F  response:: #426f6f6b69654d65746164617461466f726d617456657273696f6e933a7c8210218ffffff9f92002833238a19626f6f6b6b65657065722d626f6f6b69652d322d3131353830a19626f6f6b6b65657065722d626f6f6b69652d302d32383831361003834204825a16ab6170706c69636174696f6e127507261766567615a15af426f6f6b4b65657065724c6f6749641223133600,s{38654707364,38654708391,1658328325290,1658328491164,2,0,0,0,155,0,38654707364}

But the ledger does not exist when we attempt to read from it:

0-bookie-adfe5832-2022-07-20T18-23-58Z.log.gz:2022-07-20T15:59:48.625362817+00:00 stdout F 15:59:48,625 INFO  No ledger found while reading entry: 0 from ledger: 3246
2-bookie-9c88965c-2022-07-20T17-26-08Z.log.gz:2022-07-20T15:59:48.624934442+00:00 stdout F 15:59:48,624 INFO  No ledger found while reading entry: 0 from ledger: 3246
...
2-bookie-9c88965c-2022-07-20T17-26-08Z.log.gz:2022-07-20T16:39:42.753841016+00:00 stdout F 16:39:42,753 INFO  No ledger found while reading entry: 0 from ledger: 3246

In my understanding, from the Pravega point of view ledger 3246 is created and not explicitly deleted by Pravega itself, which means that there has been a data loss, most likely induced by some corner case exposed via network packet loss. In normal conditions, where there are no induced packet loss, things work fine. But it does not mean that, in a real cluster, a similar network failure could happen and the same issue could be reproduced. Again, we have been doing fault injection tests with Bookkeeper 4.14 and probably previous versions as well (e.g., Bookkeeper 4.11), and this is the first time we encounter this behavior.

To Reproduce

Steps to reproduce the behavior:

Write to Bookkeeper using Pravega (https://github.com/pravega/pravega).
Introduce 30% network packet loss with LitmusChaos (see https://www.youtube.com/watch?v=EU_g8cKa1G8 for a demo on how to reproduce this experiment)
Consistently, Pravega gets a No Such Ledger Exception in some containers using Bookkeeper 4.15 (with Bookkeeper 4.14, the system is able to work under such conditions)

Expected behavior

Bookkeeper 4.15 should offer a similar behavior in how ledgers are managed compared to Bookkeeper 4.14 (and previous versions).

Screenshots

n/a

Additional context

At the moment, there are 2 changes that could be related to this issue (but I cannot confirm that they actually cause the problem):

Upgrade of Zookeeper library from 3.6.x to 3.8.x.
This PR in 4.15 and not in 4.14 seems related to how ledgers are managed: Fix autoRecover memory leak. #3361

The text was updated successfully, but these errors were encountered:

eolivelli · 2022-09-02T11:36:04Z

A couple of questions:

is the ledger znode present on ZK ?
which are your replication parameters: ES, WQ, AQ ?
how many bookies do you have ?

eolivelli · 2022-09-02T12:41:16Z

another question:

how many entries are supposed to be in the ledger ? only 1 ?

RaulGracia · 2022-09-02T13:58:39Z

Thanks @eolivelli, reposting the answer in Slack for better visibility: https://pravega-io.slack.com/archives/C0151LSM46L/p1662115749173509

is the ledger znode present on ZK ?

The cluster is not in state anymore, but if the Bookkeeper Client gets a BKException$BKNoSuchLedgerExistsException, my understanding is that it should not be there. However, I see that the ledger 3246 is created at 2022-07-20 14:45:25,706 and then we can see in Bookie logs some queries on these ledger metadata ZNodes:

2-bookie-9c88965c-2022-07-20T14-55-24Z.log.gz:2022-07-20T14:51:48.716729331+00:00 stdout F 14:51:48,716 DEBUG Reading reply session id: 0x3016baeab7f000f, packet:: clientPath:/pravega/pravega/bookkeeper/ledgers/00/0000 serverPath:/pravega/pravega/bookkeeper/ledgers/00/0000 finished:false header:: 36287,8  replyHeader:: 36287,38654709412,0  request:: '/pravega/pravega/bookkeeper/ledgers/00/0000,F  response:: v{'L3195,'L3070,'L3071,'L3192,'L2784,'L3078,'L3199,'L3079,'L3076,'L3197,'L3077,'L3198,'L3081,'L0377,'L3089,'L3087,'L0372,'L3052,'L3173,'L3053,'L3295,'L3050,'L3171,'L3172,'L3293,'L3290,'L3170,'L3291,'L0463,'L3058,'L0464,'L3059,'L3056,'L3177,'L3057,'L3178,'L0461,'L3054,'L3175,'L3055,'L3176,'L3297,'L3063,'L3184,'L3064,'L3061,'L3182,'L3062,'L3183,'L3181,'L3069,'L3067,'L3188,'L3068,'L3065,'L3066,'L3310,'L1814,'L3317,'L3203,'L3324,'L3322,'L3202,'L3323,'L3200,'L0734,'L1705,'L0737,'L3209,'L0736,'L3207,'L3328,'L3208,'L3329,'L3205,'L0732,'L3327,'L0739,'L0738,'L3096,'L3094,'L3095,'L3092,'L3093,'L3091,'L0388,'L0702,'L0704,'L0703,'L2327,'L0709,'L0706,'L0705,'L0708,'L0707,'L0390,'L3302,'L2452,'L0713,'L0712,'L3308,'L0714,'L3306,'L3307,'L0711,'L0710,'L3305,'L3115,'L3236,'L3116,'L3237,'L0762,'L3113,'L3234,'L3114,'L3233,'L3230,'L3110,'L3231,'L0768,'L0404,'L0767,'L0769,'L0764,'L3119,'L0763,'L3238,'L0765,'L3118,'L0771,'L3126,'L1983,'L3127,'L3248,'L3245,'L3125,'L3246,'L3122,'L3243,'L3123,'L3244,'L3120,'L3000,'L0418,'L2957,'L3009,'L2955,'L3007,'L3214,'L0740,'L2485,'L3211,'L3330,'L0745,'L3218,'L0741,'L0744,'L3104,'L3105,'L3226,'L3102,'L3103,'L3224,'L3101,'L2130,'L0757,'L0756,'L2816,'L0759,'L0758,'L3108,'L3107,'L3030,'L3151,'L3031,'L3152,'L3273,'L3150,'L3038,'L3159,'L2984,'L3278,'L2982,'L3037,'L3279,'L3155,'L2980,'L3156,'L3277,'L2064,'L3032,'L3153,'L3274,'L3033,'L3154,'L3275,'L2989,'L0444,'L2988,'L0447,'L2985,'L2986,'L2073,'L3041,'L3162,'L3283,'L3042,'L3163,'L3160,'L3281,'L3040,'L3161,'L3280,'L0694,'L0696,'L2992,'L3047,'L3048,'L3169,'L3166,'L3167,'L0692,'L3043,'L3164,'L3165,'L2999,'L2996,'L3130,'L3137,'L3017,'L3014,'L3135,'L3015,'L3136,'L3012,'L3134,'L3010,'L3131,'L3252,'L3132,'L2969,'L2966,'L3018,'L3139,'L3019,'L3140,'L3020,'L3141,'L3148,'L3269,'L0671,'L2610,'L2973,'L3149,'L3025,'L2971,'L3147,'L3024,'L3145,'L3142,'L3143,'L2979,'L2735,'L2974,'L3029}
2-bookie-9c88965c-2022-07-20T14-55-24Z.log.gz:2022-07-20T14:51:48.736027964+00:00 stdout F 14:51:48,735 DEBUG Reading reply session id: 0x3016baeab7f000f, packet:: clientPath:/pravega/pravega/bookkeeper/ledgers/00/0000/L3246 serverPath:/pravega/pravega/bookkeeper/ledgers/00/0000/L3246 finished:false header:: 36521,4  replyHeader:: 36521,38654709412,0  request:: '/pravega/pravega/bookkeeper/ledgers/00/0000/L3246,F  response:: #426f6f6b69654d65746164617461466f726d617456657273696f6e933a7c8210218ffffff9f92002833238a19626f6f6b6b65657065722d626f6f6b69652d322d3131353830a19626f6f6b6b65657065722d626f6f6b69652d302d32383831361003834204825a16ab6170706c69636174696f6e127507261766567615a15af426f6f6b4b65657065724c6f6749641223133600,s{38654707364,38654708391,1658328325290,1658328491164,2,0,0,0,155,0,38654707364}

I don't know if the fact that this request queries /pravega/pravega/bookkeeper/ledgers/00/0000/L3246 and gets some actual payload in response means that it actually exists in Zookeeper.

which are your replication parameters: ES, WQ, AQ ?

In this experiment, the configuration is ensembleSize=2, writeQuorumSize=2, ackQuorumSize=2.

how many bookies do you have ?

The Bookkeeper service is configured to keep 4 Bookies.

how many entries are supposed to be in the ledger ? only 1 ?

Pravega rolls over ledger when they reach their max size (1GB default) or when there is some container recovery (due to fencing). So, this particular ledger was created but contained 0 entries at the moment when the issue happened. It would have contained many more in case the issue wouldn't had appeared.

at a first glance it looks to me that N bookies are answering that they do not know the ledger

Could be, according to this log, the ledger should be created using 2 Bookies:
2022-07-20T14:45:25.706687124+00:00 stdout F 2022-07-20 14:45:25,706 7043990 [ZKC-connect-executor-0-EventThread] INFO o.a.b.client.LedgerCreateOp - Ensemble: [bookkeeper-bookie-2-11580, bookkeeper-bookie-0-28816] for ledger: 3246
But in the Bookie logs, I mainly see 2-bookie-9c88965c with logs related to ledger 3246. The test inducing network drops may be exposing this potential issue that is hard to discover otherwise.

zhaohaidao · 2022-09-07T05:56:44Z

Not sure if it has something to do with this pr: ISSUE 3220-Autorecovery does not process underreplicated empty ledgers
#3239
If network packet loss causes a bookie to be determined to be unavailable. the ledgers on this unavailable bookie will become underreplicated and this ledgers will be closed by the autorecovery service.

hangc0276 · 2022-09-08T03:09:30Z

We found the same issue on the Pulsar side. apache/pulsar#17520
We tested bk 4.15.0, and the unit test passed, but for bk 4.15.1, the unit test failed.
I revert PR #3239 in my local env, the unit test still failed with another exception.

2022-09-08T10:57:49,000 - ERROR - [ForkJoinPool.commonPool-worker-6:ServerCnx@1312] - Try add schema failed, remote address /127.0.0.1:65225, topic persistent://pulsar/ns/testReplicationWithSchema-54c4f655-5443-49d0-8ef2-c65eb8a70c74, producerId 1
java.util.concurrent.CompletionException: org.apache.pulsar.broker.service.schema.exceptions.SchemaException: Error while recovering ledger -  ledger=3 - operation=Failed to open ledger
	at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:332) ~[?:?]
	at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:347) ~[?:?]
	at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1141) ~[?:?]
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[?:?]
	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162) ~[?:?]
	at org.apache.pulsar.broker.service.schema.BookkeeperSchemaStorage.lambda$openLedger$40(BookkeeperSchemaStorage.java:596) ~[classes/:?]
	at org.apache.bookkeeper.client.LedgerOpenOp.openComplete(LedgerOpenOp.java:248) ~[bookkeeper-server-4.15.1.jar:4.15.1]
	at org.apache.bookkeeper.client.LedgerOpenOp$1.safeOperationComplete(LedgerOpenOp.java:216) ~[bookkeeper-server-4.15.1.jar:4.15.1]
	at org.apache.bookkeeper.client.LedgerOpenOp$1.safeOperationComplete(LedgerOpenOp.java:206) ~[bookkeeper-server-4.15.1.jar:4.15.1]
	at org.apache.bookkeeper.util.OrderedGenericCallback.operationComplete(OrderedGenericCallback.java:62) ~[bookkeeper-server-4.15.1.jar:4.15.1]
	at org.apache.bookkeeper.proto.BookkeeperInternalCallbacks$TimedGenericCallback.operationComplete(BookkeeperInternalCallbacks.java:189) ~[bookkeeper-server-4.15.1.jar:4.15.1]
	at org.apache.bookkeeper.client.ReadOnlyLedgerHandle.lambda$recover$5(ReadOnlyLedgerHandle.java:295) ~[bookkeeper-server-4.15.1.jar:4.15.1]
	at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) ~[?:?]
	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841) ~[?:?]
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[?:?]
	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162) ~[?:?]
	at org.apache.bookkeeper.client.LedgerRecoveryOp.submitCallback(LedgerRecoveryOp.java:159) ~[bookkeeper-server-4.15.1.jar:4.15.1]
	at org.apache.bookkeeper.client.LedgerRecoveryOp.access$100(LedgerRecoveryOp.java:37) ~[bookkeeper-server-4.15.1.jar:4.15.1]
	at org.apache.bookkeeper.client.LedgerRecoveryOp$1.readLastConfirmedDataComplete(LedgerRecoveryOp.java:136) ~[bookkeeper-server-4.15.1.jar:4.15.1]
	at org.apache.bookkeeper.client.ReadLastConfirmedOp.readEntryComplete(ReadLastConfirmedOp.java:155) ~[bookkeeper-server-4.15.1.jar:4.15.1]
	at org.apache.bookkeeper.proto.PerChannelBookieClient$ReadCompletion$1.readEntryComplete(PerChannelBookieClient.java:1880) ~[bookkeeper-server-4.15.1.jar:4.15.1]
	at org.apache.bookkeeper.proto.PerChannelBookieClient$ReadCompletion.lambda$errorOut$0(PerChannelBookieClient.java:1896) ~[bookkeeper-server-4.15.1.jar:4.15.1]
	at org.apache.bookkeeper.proto.PerChannelBookieClient$CompletionValue$1.safeRun(PerChannelBookieClient.java:1689) ~[bookkeeper-server-4.15.1.jar:4.15.1]
	at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) ~[bookkeeper-common-4.15.1.jar:4.15.1]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.77.Final.jar:4.1.77.Final]
	at java.lang.Thread.run(Thread.java:833) ~[?:?]
Caused by: org.apache.pulsar.broker.service.schema.exceptions.SchemaException: Error while recovering ledger -  ledger=3 - operation=Failed to open ledger
	at org.apache.pulsar.broker.service.schema.BookkeeperSchemaStorage.bkException(BookkeeperSchemaStorage.java:702) ~[classes/:?]
	... 23 more

RaulGracia · 2022-09-09T08:51:01Z

@eolivelli @zhaohaidao @hangc0276 We have done another set of failure injection tests to narrow down the issue. These are the results:

Bookkeeper 4.14 server and Bookkeeper 4.14 client: Tests pass
Bookkeeper 4.15 server and Bookkeeper 4.14 client: Tests pass
Bookkeeper 4.15 server and Bookkeeper 4.15 client: Issue reproduced
Bookkeeper 4.14 server and Bookkeeper 4.15 client: Issue reproduced

With this, I think we can narrow down the regression to be in the Bookkeeper client only in Bookkeeper 4.15. Hope this helps us to shortlist the set of PRs that can be related to this problem.

eolivelli · 2022-09-09T12:42:22Z

I have done some testing on master branch.
I have created some tests that simulate a "timeout" while performing recovery operations. And in every case I tests the BK client returns BKLedgerRecoveryException

After re-reading again the logs in this issue it seems that:

the LAC on the ledger 3246 is at least 0 but no bookie has the entry
in the same process (assuming that the logs come from the same JVM) we have a ReadOnlyLedgerHandle on ledger 3246 but we are also writing to the ledger, so we have a WriteHandle

eolivelli · 2022-09-09T12:43:21Z

I did a "git diff" between branch-4.14 and master branch on the bookkeeper-server/src/main/java/org/apache/bookkeeper/client directory and I can't find any change that can be related to this problem.

eolivelli · 2022-09-09T12:44:42Z

@RaulGracia do you have "auto-recovery" running into your cluster?

eolivelli · 2022-09-09T12:51:08Z

Not sure if it has something to do with this pr: ISSUE 3220-Autorecovery does not process underreplicated empty ledgers
#3239
If network packet loss causes a bookie to be determined to be unavailable. the ledgers on this unavailable bookie will become underreplicated and this ledgers will be closed by the autorecovery service.

@zhaohaidao your idea is interesting.

The fact that the client tries to read from entry 0 even if the ledger is empty means that something CLOSED the ledger and set the LAC to 0 (and not -1).

I am saying that LAC should be at least 0 because of these logs:
2022-07-20T15:59:48.794605613+00:00 stdout F 2022-07-20 15:59:48,794 11507078 [core-1] ERROR i.p.s.server.logs.DurableLog - DurableLog[13] Recovery FAILED.
It seems (please @RaulGracia confirm) that DurableLog is scanning the ledger, and it receives an error during a "readEntries" operation.
readEntries fails fast if you try to read after the local value of LAC, but here we see that the BK client is trying to perform the PendingReadOp and so locally LAC is >= 0

eolivelli · 2022-09-09T12:54:24Z

it is also interesting the the Writer tried to write entry 0

Could not write Add request to bookie 192.169.1.60/192.169.1.60:3181 for ledger 3246, entry 0

but it wasn't able to even send the write request.
That should lead to a "ensemble change", and to choose a new bookie.
If you have 4 bookies then we should see something more in the logs and the operation should move forward and the BK client should have tried to write the same entry to another set of bookies.
In the LedgerMetadata we should have multiple ensembles.

zymap · 2022-09-20T04:20:29Z

@RaulGracia
I saw the logs in step 2:

2022-07-20T14:45:47.426475848+00:00 stdout F 2022-07-20 14:45:47,426 7065710 [ZKC-connect-executor-0-SendThread(zookeeper-client:2181)] DEBUG org.apache.zookeeper.ClientCnxn - Reading reply session id: 0x2016bc13e0a001c, packet:: clientPath:/pravega/pravega/bookkeeper/ledgers/00/0000/L3246 serverPath:/pravega/pravega/bookkeeper/ledgers/00/0000/L3246 finished:false header:: 146,4  replyHeader:: 146,38654707423,0  request:: '/pravega/pravega/bookkeeper/ledgers/00/0000/L3246,T  response:: #426f6f6b69654d65746164617461466f726d617456657273696f6e933affffff8c18210218020ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff12813238a19626f6f6b6b65657065722d626f6f6b69652d322d3131353830a19626f6f6b6b65657065722d626f6f6b69652d302d32383831361003834204825a16ab6170706c69636174696f6e127507261766567615a15af426f6f6b4b65657065724c6f674964122313360ffffffe2ffffffe5ffffffcdffffff8bffffff81ffffff93ffffffd8ffffff9c10,s{38654707364,38654707364,1658328325290,1658328325290,0,0,0,0,172,0,38654707364}
2022-07-20T14:45:47.426637438+00:00 stdout F 2022-07-20 14:45:47,426 7065710 [BookKeeperClientWorker-OrderedExecutor-7-0] DEBUG o.a.b.client.PendingReadLacOp - Read LAC complete with enough validResponse for ledger: 3246 LAC: -1
2022-07-20T14:45:47.426737033+00:00 stdout F 2022-07-20 14:45:47,426 7065710 [core-13] DEBUG o.a.b.meta.AbstractZkLedgerManager - Unregistered ledger metadata listener ReadOnlyLedgerHandle(lid = 3246, id = 672259987) on ledger 3246.
2022-07-20T14:46:16.612224510+00:00 stdout F 2022-07-20 14:46:16,612 7094896 [BookKeeperClientWorker-OrderedExecutor-7-0] DEBUG o.a.b.proto.PerChannelBookieClient - Could not write Add request to bookie 192.169.1.60/192.169.1.60:3181 for ledger 3246, entry 0
2022-07-20T14:46:16.612271701+00:00 stdout F 2022-07-20 14:46:16,612 7094896 [BookKeeperClientWorker-OrderedExecutor-7-0] ERROR o.a.bookkeeper.client.PendingAddOp - Write of ledger entry to quorum failed: L3246 E0
2022-07-20T14:46:16.616440402+00:00 stdout F 2022-07-20 14:46:16,616 7094900 [BookKeeperClientWorker-OrderedExecutor-7-0] DEBUG o.a.b.proto.PerChannelBookieClient - Could not write ReadLAC request to bookie null for ledger 3246, entry -1

There have two things I noticed:

The ledger seems closed after it received the ReadLAC response.

Read LAC complete with enough validResponse for ledger: 3246 LAC: -1
2022-07-20T14:45:47.426737033+00:00 stdout F 2022-07-20 14:45:47,426 7065710 [core-13] DEBUG o.a.b.meta.AbstractZkLedgerManager - Unregistered ledger metadata listener ReadOnlyLedgerHandle(lid = 3246, id = 672259987) on ledger 3246

Bookie channel is null in the callback. That looks like when the callback is triggered, the channel is close.

Could not write ReadLAC request to bookie null for ledger 3246, entry -1

And looks like in the Pravega, the ledger handler closes again so the error happens again. (step 2 closed and step 3 closed again)

2022-07-20T14:46:16.630253410+00:00 stdout F 2022-07-20 14:46:16,629 7094913 [core-30] DEBUG o.a.bookkeeper.client.LedgerHandle - Failed to close ledger 3246 :
2022-07-20T14:46:16.630795088+00:00 stdout F 2022-07-20 14:46:16,630 7094914 [core-30] ERROR i.p.s.s.i.bookkeeper.BookKeeperLog - Log[13]: Unable to close LedgerHandle for Ledger 3246.
2022-07-20T14:46:16.630795088+00:00 stdout F io.pravega.segmentstore.storage.DurableDataLogException: Unable to close ledger 3246.

I saw in the previous, you are using 4.14.1, I found there has a change related to this which was released in 4.14.3.

Could you please also run the test against with the 4.14.1 and 4.14.3?

cc @hangc0276 @eolivelli

RaulGracia · 2022-09-20T07:12:34Z

@zymap thanks a lot for your analysis. Yes, you are right, we are using 4.14.2 concretely, as the last commit we have from mainline is . And we do not have that PR #2794 when Bookkeeper works well with our fault injection tests. Maybe we could try another test with latest 4.14 commit, to see if the same issue is reproduced.

dlg99 · 2022-09-27T01:43:02Z

@RaulGracia is there any chance that lossy network caused some kind of split brain or data inconsistency on ZK side?
e.g. I recently dealt with the situation when out of 5 ZK nodes one had missed specific node, so the clients that were lucky enough to read form that node didn't see the data.
Can you browse the BK metadata using zkCli on all zk nodes to confirm that metadata for the missing ledger is either present (and the same) or missing on all nodes?

RaulGracia · 2022-09-27T06:57:05Z

@dlg99 thanks Andrey. While this situation can indeed happen (specially inducing failure injection at the network level), my understanding is that that would happen to any version of Bookkeeper, right? However, our observation is that when executing this kind of tests, only Pravega using Bookkeeper 4.15 client gets into this situation where ledgers supposed to be there are not; when using Bookkeeper 4.14.2, the issue does not reproduce. Note that we have done this test many times with Bookkeeper 4.14.2, while the few times we executed that with Bookkeeper 4.15 client we always got into that situation.

We are currently executing the same test using Bookkeeper client 4.14.3, to validate the hypothesis of @zymap #3466 (comment)

dlg99 · 2022-09-27T16:04:45Z

@RaulGracia My thought process was along the lines of updated ZK client making the problem easier to repro if that's ZK related.
I experimented locally and tried to inject errors in unit tests but had no luck.

RaulGracia · 2022-09-29T08:01:22Z

FYI @zymap @eolivelli @dlg99 we have performed the same test again with Bookkeeper 4.14.3 client as suggested by @zymap, but we have not been able to reproduce the issue this time. So, at the moment, with the evidences we have we cannot say that #2794 is the root cause of the problem.

dlg99 · 2022-09-30T19:09:11Z

@RaulGracia I am trying to wrap my head around how #2794 could result in a missing ledger handle.

One thing off with the PR is that it calls blocking lh.close() from the callback running on the main threadpool. AFAICT there are no deadlocks because whatever else happens runs on the ZK. Zk can be quite slow because of injected network errors.
So here is the proposed change: #3513 (waiting for the tests to finish up)

Can the lost ledger handle might be caused by the way Pravega handles errors in ledger open?

RaulGracia · 2022-10-03T10:47:26Z

Thanks a lot @dlg99, I can certainly review #3513. But after the latest experiments we did with Bookkeeper 4.14.3, which includes #2794, we couldn't reproduce the issue, so I'm not sure that that PR is contributing to the issue described here. In any case, it is independent from the value that #3513 brings by itself.

Can the lost ledger handle might be caused by the way Pravega handles errors in ledger open?

The point is that the very same Pravega code has no issues using Bookkeeper 4.14.2 client and injecting network failures, and we consistently get issues like the one describe here when upgrading to Bookkeeper 4.15 client. For this reason, I thought that the problem may be related to some change in the Bookkeeper client within branch-4.15.

I think that to discard that the issue is present in the whole branch-4.14, we will reproduce fault injection tests with latest 4.14 release; if the issue is cannot be reproduced, we know that branch-4.14 is fine in this regard (and we do not need to inspect cherry picked PRs). If the issue is reproduced, the number of commits to analyze would be much smaller.

eolivelli · 2022-10-17T08:31:55Z

I am working on this.
I am preparing a set of Pravega docker images to pass to Dell-EMC Quality Eng team who can run the tests.

The. docker images contain a build of Pravega compatible with BK 4.15 (thanks to @RaulGracia ) but with the BK client compiled at different points in the GIT history of 4.15.0.

This way we will see which commit broke the compatibility.

It will take much time because those system tests take at least 24h before failing

RaulGracia · 2022-11-23T12:34:58Z

To wrap up this issue: We have done fault injection tests of Pravega and several Bookkeeper client versions and we have not been able to reproduce the original issue (see details here). We have also upgraded Pravega master (future 0.14 release) to Bookkeeper 4.15.3. If the issue is reproduced again, we will reopen the issue.

RaulGracia · 2022-12-07T14:51:04Z

Hi all, I think we could have another reproduction of this problem, this time using Bookkeeper 4.14.2 client/server (therefore, the issue reported here may be present in other versions of Bookkeeper, not only in Bookkeeper 4.15).

This is the information I have been able to recollect so far (the missing ledger this time is 3538):

Ledger is created correctly:

2022-12-05 15:21:11,541 8491908 [core-23] INFO  i.p.s.s.i.bookkeeper.BookKeeperLog - Log[29]: Created Ledger 3538.

Pravega can write and process properly up to entry 52:

2022-12-05 15:26:53,197 8833564 [BookKeeperClientWorker-OrderedExecutor-2-0] DEBUG o.a.bookkeeper.client.PendingAddOp - Submit callback (lid:3538, eid: 52). rc:0
2022-12-05 15:26:53,197 8833564 [core-17] DEBUG i.p.s.s.logs.OperationProcessor - OperationProcessor[29]: CommitSuccess (TxnId = 51, LastFullySerializedSN = 505834, LastStartedSN = 505834, Address = Sequence = 17179869235, LedgerId = 3538, EntryId = 51, Length = 35244).
2022-12-05 15:26:53,197 8833564 [core-28] DEBUG i.p.s.s.logs.OperationProcessor - OperationProcessor[29]: CommitSuccess (TxnId = 52, LastFullySerializedSN = 506120, LastStartedSN = 506120, Address = Sequence = 17179869236, LedgerId = 3538, EntryId = 52, Length = 783197).

Post that, we see that, probably due to the network fault-injection test induced, there is a ensamble change that impacts entry 53 onwards for that ledger:

2022-12-05 15:27:07,801 8848168 [BookKeeperClientWorker-OrderedExecutor-2-0] DEBUG o.a.bookkeeper.client.LedgerHandle - [EnsembleChange(ledger:3538, change-id:0000000001)][attempt:1] changing ensemble from: [bookkeeper-bookie-2-13057, bookkeeper-bookie-1-21445] to: [bookkeeper-bookie-2-13057, bookkeeper-bookie-0-27781] starting at entry: 53

We keep getting some errors and acknowledgements writing entries from 53 onwards, until we get this error:

2022-12-05 15:27:15,431 8855798 [ZKC-connect-executor-0-EventThread] ERROR o.a.b.client.MetadataUpdateLoop - UpdateLoop(ledgerId=3538,loopId=1ec0500f) Error writing metadata to store
  Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /pravega/pravega/bookkeeper/ledgers/00/0000/L3538
2022-12-05 15:27:15,432 8855799 [BookKeeperClientWorker-OrderedExecutor-2-0] WARN  o.a.bookkeeper.client.LedgerHandle - [EnsembleChange(ledger:3538, change-id:0000000001)][attempt:1] Exception changing ensemble
  Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /pravega/pravega/bookkeeper/ledgers/00/0000/L3538
2022-12-05 15:27:15,433 8855800 [BookKeeperClientWorker-OrderedExecutor-2-0] ERROR o.a.bookkeeper.client.LedgerHandle - Closing ledger 3538 due to ZKException: Error while using ZooKeeper

After that, the zNode for that very same ledger 3538 seems to be deleted:

2022-12-05 15:27:19,429 8859796 [ZKC-connect-executor-0-SendThread(zookeeper-client:2181)] DEBUG org.apache.zookeeper.ClientCnxn - Got WatchedEvent state:SyncConnected type:NodeDeleted path:/pravega/pravega/bookkeeper/ledgers/00/0000/L3538 for session id 0x3004cfb266c0003
2022-12-05 15:27:19,429 8859796 [ZKC-connect-executor-0-EventThread] DEBUG o.a.b.meta.AbstractZkLedgerManager - Received watched event WatchedEvent state:SyncConnected type:NodeDeleted path:/pravega/pravega/bookkeeper/ledgers/00/0000/L3538 from zookeeper based ledger manager.
2022-12-05 15:27:19,429 8859796 [ZKC-connect-executor-0-EventThread] DEBUG o.a.b.meta.AbstractZkLedgerManager - No ledger metadata listeners to remove from ledger 3538 after it's deleted.

And it seems deleted because, after that, we just see NoNodeException when we try to access the Zookeeper metadata of this ledger:

2022-12-05 15:27:23,218 8863585 [ZKC-connect-executor-0-EventThread] DEBUG o.a.b.meta.AbstractZkLedgerManager - No such ledger: 3538
  org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /pravega/pravega/bookkeeper/ledgers/00/0000/L3538
2022-12-05 15:27:23,218 8863585 [ZKC-connect-executor-0-EventThread] ERROR o.a.b.client.MetadataUpdateLoop - UpdateLoop(ledgerId=3538,loopId=4e9e5246) Failed to read metadata from store
2022-12-05 15:27:23,219 8863586 [core-19] ERROR i.p.s.s.i.bookkeeper.BookKeeperLog - Log[29]: Unable to close LedgerHandle for Ledger 3538.
   io.pravega.segmentstore.storage.DurableDataLogException: Unable to close ledger 3538.

Once the issue was reported, I tried to dig a little more with the Pravega and Bookkeeper CLIs:

From Pravega side, if we inspect the BookkeeperLog[29], we clearly see that ledger 3538 is not accessible:

> bk details 29
{
  "key": "log_summary",
  "value": {
    "logId": 29,
    "epoch": 9,
    "version": 511,
    "enabled": true,
    "ledgers": 4,
    "truncation": "Sequence = 12884902323, LedgerId = 3491, EntryId = 435"
  }
}
{
  "key": "ledger_details",
  "value": {
    "ledger": 3491,
    "seq": 3,
    "status": "NotEmpty",
    "lac": 971,
    "length": 0,
    "numBookies": 3,
    "numFragments": 2,
    "ensembleSize": 2,
    "writeQuorumSize": 2,
    "ackQuorumSize": 2,
    "ensembles": "0: [bookkeeper-bookie-0-27781,bookkeeper-bookie-1-21445],972: [bookkeeper-bookie-3-9007,bookkeeper-bookie-1-21445]"
  }
}
Exception executing BK details command: Unable to open-read ledger 3538.
	Ledger 3538: Seq = 4, Status = NotEmpty. BK: Unable to open-read ledger 3538.
{
  "key": "ledger_details",
  "value": {
    "ledger": 3720,
    "seq": 8,
    "status": "Empty",
    "lac": -1,
    "length": 0,
    "numBookies": 2,
    "numFragments": 1,
    "ensembleSize": 2,
    "writeQuorumSize": 2,
    "ackQuorumSize": 2,
    "ensembles": "0: [bookkeeper-bookie-2-13057,bookkeeper-bookie-3-9007]"
  }
}
{
  "key": "ledger_details",
  "value": {
    "ledger": 3738,
    "seq": 9,
    "status": "Unknown",
    "lac": -1,
    "length": 0,
    "numBookies": 2,
    "numFragments": 1,
    "ensembleSize": 2,
    "writeQuorumSize": 2,
    "ackQuorumSize": 2,
    "ensembles": "0: [bookkeeper-bookie-0-27781,bookkeeper-bookie-3-9007]"
  }
}

Using the Bookkeeper CLI, if we list all the ledgers in the system, we can see ledger 3538:

bin/bookkeeper shell listledgers
...
17:39:21,474 INFO  ledgerID: 3537
17:39:21,474 INFO  ledgerID: 3538     <<<<<< HERE IT IS
17:39:21,474 INFO  ledgerID: 3539
...

However, it is not possible to get its contents or metadata:

===== LEDGER: 3538 =====
master key  : 850bf1071c5e3d8c24235676f8816ae0cbe2f14f
size         : 0
entries      : 0
isFenced     : true
===== LEDGER: 3538 =====

bin/bookkeeper shell readledger -ledgerid 3538

17:37:18,747 INFO  Successfully connected to bookie: bookkeeper-bookie-1-21445 [id: 0x45c77aa8, L:/10.128.0.116:33870 - R:10.129.0.171/10.129.0.171:3181]
17:37:18,748 INFO  connection [id: 0x45c77aa8, L:/10.128.0.116:33870 - R:10.129.0.171/10.129.0.171:3181] authenticated as BookKeeperPrincipal{ANONYMOUS}
17:37:18,755 ERROR Read of ledger entry failed: L3538 E0-E0, Sent to [bookkeeper-bookie-2-13057, bookkeeper-bookie-1-21445], Heard from [] : bitset = {}, Error = 'No such ledger exists on Bookies'. First unread entry is (-1, rc = null)
17:37:18,761 ERROR Error reading entry 0 from ledger 3538
org.apache.bookkeeper.client.BKException$BKNoSuchLedgerExistsException: No such ledger exists on Bookies
	at org.apache.bookkeeper.client.SyncCallbackUtils.finish(SyncCallbackUtils.java:83) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3-build-223.jar:4.14.3-build-223]

bin/bookkeeper shell ledgermetadata -ledgerid 3538

 17:42:00,323 WARN  An exception was thrown while closing send thread for session 0x2004d6096890377.
org.apache.zookeeper.ClientCnxn$EndOfStreamException: Unable to read additional data from server sessionid 0x2004d6096890377, likely server has closed socket
	at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:77) ~[org.apache.zookeeper-zookeeper-3.6.2.jar:3.6.2]
	at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) ~[org.apache.zookeeper-zookeeper-3.6.2.jar:3.6.2]
	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1275) [org.apache.zookeeper-zookeeper-3.6.2.jar:3.6.2]
Exception in thread "main" com.google.common.util.concurrent.UncheckedExecutionException: java.util.concurrent.ExecutionException: org.apache.bookkeeper.client.BKException$BKNoSuchLedgerExistsOnMetadataServerException: No such ledger exists on Metadata Server

How inducing delays at the network level in the pods leads very rarely to this situation is something that I still don't know. I'm also attaching the relevant log lines from Pravega Segment Store for ledger 3538 (ledger-3538.log).

RaulGracia added the type/bug label Sep 2, 2022

RaulGracia mentioned this issue Sep 15, 2022

Revert Bookkeeper 4.15 upgrade in master pravega/pravega#6878

Closed

dlg99 mentioned this issue Sep 30, 2022

LedgerOpenOp: Do not call blocking close() in the callback #3513

Merged

eolivelli self-assigned this Oct 17, 2022

eolivelli added this to the 4.16.0 milestone Oct 17, 2022

RaulGracia mentioned this issue Nov 14, 2022

Issue 6727: Upgrade Pravega to Bookkeeper 4.15 pravega/pravega#6946

Merged

RaulGracia closed this as completed Nov 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing ledgers when inducing network packet loss in Bookkeeper 4.15 #3466

Missing ledgers when inducing network packet loss in Bookkeeper 4.15 #3466

RaulGracia commented Sep 2, 2022 •

edited

Loading

eolivelli commented Sep 2, 2022

eolivelli commented Sep 2, 2022

RaulGracia commented Sep 2, 2022

zhaohaidao commented Sep 7, 2022

hangc0276 commented Sep 8, 2022

RaulGracia commented Sep 9, 2022

eolivelli commented Sep 9, 2022 •

edited

Loading

eolivelli commented Sep 9, 2022

eolivelli commented Sep 9, 2022

eolivelli commented Sep 9, 2022

eolivelli commented Sep 9, 2022

zymap commented Sep 20, 2022 •

edited

Loading

RaulGracia commented Sep 20, 2022

dlg99 commented Sep 27, 2022

RaulGracia commented Sep 27, 2022

dlg99 commented Sep 27, 2022

RaulGracia commented Sep 29, 2022

dlg99 commented Sep 30, 2022

RaulGracia commented Oct 3, 2022 •

edited

Loading

eolivelli commented Oct 17, 2022

RaulGracia commented Nov 23, 2022

RaulGracia commented Dec 7, 2022 •

edited

Loading

Missing ledgers when inducing network packet loss in Bookkeeper 4.15 #3466

Missing ledgers when inducing network packet loss in Bookkeeper 4.15 #3466

Comments

RaulGracia commented Sep 2, 2022 • edited Loading

eolivelli commented Sep 2, 2022

eolivelli commented Sep 2, 2022

RaulGracia commented Sep 2, 2022

zhaohaidao commented Sep 7, 2022

hangc0276 commented Sep 8, 2022

RaulGracia commented Sep 9, 2022

eolivelli commented Sep 9, 2022 • edited Loading

eolivelli commented Sep 9, 2022

eolivelli commented Sep 9, 2022

eolivelli commented Sep 9, 2022

eolivelli commented Sep 9, 2022

zymap commented Sep 20, 2022 • edited Loading

RaulGracia commented Sep 20, 2022

dlg99 commented Sep 27, 2022

RaulGracia commented Sep 27, 2022

dlg99 commented Sep 27, 2022

RaulGracia commented Sep 29, 2022

dlg99 commented Sep 30, 2022

RaulGracia commented Oct 3, 2022 • edited Loading

eolivelli commented Oct 17, 2022

RaulGracia commented Nov 23, 2022

RaulGracia commented Dec 7, 2022 • edited Loading

RaulGracia commented Sep 2, 2022 •

edited

Loading

eolivelli commented Sep 9, 2022 •

edited

Loading

zymap commented Sep 20, 2022 •

edited

Loading

RaulGracia commented Oct 3, 2022 •

edited

Loading

RaulGracia commented Dec 7, 2022 •

edited

Loading