-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bookie Handle not available #267
Comments
This error is printed when the client fails to connect to a particular bookie for reading/writing. If the process is crashing/restarting, it's to be expected to see that in broker logs.
These are probably related to the bookie auto-replication (it's very noisy on the logs). Are you running that in the same process as the bookies?
Are these from data ledgers and were supposed not to be deleted? Can you grep for the ledgerId in the broker/bookie logs to check when (an possibly why) it was deleted? |
Yes we're running the auto-recovery in the same process as the bookies. I had to disable it, because several bookies wouldn't stop logging that exception and the CPU got to a 100%. Checking the logs I couldn't find a reason for the deletion of the ledger. Maybe it was supposed to be deleted and in that case why it was trying to replicate non-existent ledgers. |
@merlimat None of our bookies have crashed for the past 18hs and we are still getting these messages:
|
After a lot of digging we found that the servers that contained those ledgers were accidentally deleted by us. The whole ensemble was removed with no chance of replication. |
* Use `distributedlog-core-shaded` in pulsar worker * revert to db ledger storage * Include netty-all * Fix serviceUrl for functions cli
fixes apache#266 `topics` in `KafkaTopicManager` will cache `PersistentTopic` by `brokerService.getTopic`, it's unnecessary because `PersistentTopic` is cached in `brokerService.getTopic`. we should remove it to avoid getting a `null` topic.
Don't know if it's related to #258, since this afternoon after some of our bookies crashed unexpectedly, we can't consume messages from a specific partition of a topic. We get this error on the brokers:
ERROR - [BookKeeperClientWorker-17-1:PersistentDispatcherMultipleConsumers@316] - [persistent://fury/global/apicoremisc_listing_sort__listing_sort_api/apicoremisc_listing_sort__listing_sort_api-partition-7 / apicoremisc_listing_sort_saas_consumer] Error reading entries at 141134:25599 : Bookie handle is not available,
And lots of these on bookies:
2017-03-02 00:13:13,429 - ERROR - [BookKeeperClientWorker-22-1:LedgerFragmentReplicator$2@252] - BK error reading ledger entry: 44413 org.apache.bookkeeper.client.BKException$BKBookieHandleNotAvailableException at org.apache.bookkeeper.client.BKException.create(BKException.java:62) at org.apache.bookkeeper.client.LedgerFragmentReplicator$2.readComplete(LedgerFragmentReplicator.java:253) at org.apache.bookkeeper.client.PendingReadOp.submitCallback(PendingReadOp.java:430) at org.apache.bookkeeper.client.PendingReadOp.access$000(PendingReadOp.java:59) at org.apache.bookkeeper.client.PendingReadOp$LedgerEntryRequest.sendNextRead(PendingReadOp.java:171) at org.apache.bookkeeper.client.PendingReadOp$LedgerEntryRequest.logErrorAndReattemptRead(PendingReadOp.java:227) at org.apache.bookkeeper.client.PendingReadOp.readEntryComplete(PendingReadOp.java:380) at org.apache.bookkeeper.proto.BookieClient$2$1.safeRun(BookieClient.java:312) at org.apache.bookkeeper.util.SafeRunnable.run(SafeRunnable.java:31) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144) at java.lang.Thread.run(Thread.java:745)
Trying to get the metadata of the mentioned ledger on the log using bookkeeper shell I get a not found error. Looks like a few ledgers dissapeared for some reason.
Any help on finding the root cause of this issue will be much appreciated. If you need more information please let me know.
The text was updated successfully, but these errors were encountered: