Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bookie Handle not available #267

Closed
estebangarcia opened this issue Mar 2, 2017 · 4 comments
Closed

Bookie Handle not available #267

estebangarcia opened this issue Mar 2, 2017 · 4 comments

Comments

@estebangarcia
Copy link

Don't know if it's related to #258, since this afternoon after some of our bookies crashed unexpectedly, we can't consume messages from a specific partition of a topic. We get this error on the brokers:

ERROR - [BookKeeperClientWorker-17-1:PersistentDispatcherMultipleConsumers@316] - [persistent://fury/global/apicoremisc_listing_sort__listing_sort_api/apicoremisc_listing_sort__listing_sort_api-partition-7 / apicoremisc_listing_sort_saas_consumer] Error reading entries at 141134:25599 : Bookie handle is not available,

And lots of these on bookies:

2017-03-02 00:13:13,429 - ERROR - [BookKeeperClientWorker-22-1:LedgerFragmentReplicator$2@252] - BK error reading ledger entry: 44413 org.apache.bookkeeper.client.BKException$BKBookieHandleNotAvailableException at org.apache.bookkeeper.client.BKException.create(BKException.java:62) at org.apache.bookkeeper.client.LedgerFragmentReplicator$2.readComplete(LedgerFragmentReplicator.java:253) at org.apache.bookkeeper.client.PendingReadOp.submitCallback(PendingReadOp.java:430) at org.apache.bookkeeper.client.PendingReadOp.access$000(PendingReadOp.java:59) at org.apache.bookkeeper.client.PendingReadOp$LedgerEntryRequest.sendNextRead(PendingReadOp.java:171) at org.apache.bookkeeper.client.PendingReadOp$LedgerEntryRequest.logErrorAndReattemptRead(PendingReadOp.java:227) at org.apache.bookkeeper.client.PendingReadOp.readEntryComplete(PendingReadOp.java:380) at org.apache.bookkeeper.proto.BookieClient$2$1.safeRun(BookieClient.java:312) at org.apache.bookkeeper.util.SafeRunnable.run(SafeRunnable.java:31) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144) at java.lang.Thread.run(Thread.java:745)

Trying to get the metadata of the mentioned ledger on the log using bookkeeper shell I get a not found error. Looks like a few ledgers dissapeared for some reason.

Any help on finding the root cause of this issue will be much appreciated. If you need more information please let me know.

@merlimat
Copy link
Contributor

merlimat commented Mar 2, 2017

Bookie handle is not available

This error is printed when the client fails to connect to a particular bookie for reading/writing. If the process is crashing/restarting, it's to be expected to see that in broker logs.

And lots of these on bookies:

2017-03-02 00:13:13,429 - ERROR - [BookKeeperClientWorker-22->1:LedgerFragmentReplicator$2@252] - BK error reading ledger entry: 44413 >org.apache.bookkeeper.client.BKException$BKBookieHandleNotAvailableException at >org.apache.bookkeeper.client.BKException.create(BKException.java:62) at or

These are probably related to the bookie auto-replication (it's very noisy on the logs). Are you running that in the same process as the bookies?

Trying to get the metadata of the mentioned ledger on the log using bookkeeper shell I get a not found error. Looks like a few ledgers dissapeared for some reason.

Are these from data ledgers and were supposed not to be deleted? Can you grep for the ledgerId in the broker/bookie logs to check when (an possibly why) it was deleted?

@estebangarcia
Copy link
Author

estebangarcia commented Mar 2, 2017

Yes we're running the auto-recovery in the same process as the bookies. I had to disable it, because several bookies wouldn't stop logging that exception and the CPU got to a 100%.

Checking the logs I couldn't find a reason for the deletion of the ledger. Maybe it was supposed to be deleted and in that case why it was trying to replicate non-existent ledgers.

@estebangarcia
Copy link
Author

@merlimat None of our bookies have crashed for the past 18hs and we are still getting these messages:

ERROR - [BookKeeperClientWorker-17-1:PersistentDispatcherMultipleConsumers@316] - [persistent://fury/global/apicoremisc_listing_sort__listing_sort_api/apicoremisc_listing_sort__listing_sort_api-partition-7 / apicoremisc_listing_sort_saas_consumer] Error reading entries at 141134:0 : Bookie handle is not available, Read Type Normal - Retrying to read in 63.609 seconds

@estebangarcia
Copy link
Author

After a lot of digging we found that the servers that contained those ledgers were accidentally deleted by us. The whole ensemble was removed with no chance of replication.

sijie added a commit to sijie/pulsar that referenced this issue Mar 4, 2018
* Use `distributedlog-core-shaded` in pulsar worker

* revert to db ledger storage

* Include netty-all

* Fix serviceUrl for functions cli
hangc0276 pushed a commit to hangc0276/pulsar that referenced this issue May 26, 2021
fixes apache#266 
`topics` in `KafkaTopicManager` will cache `PersistentTopic` by `brokerService.getTopic`, it's unnecessary because `PersistentTopic` is cached in `brokerService.getTopic`. we should remove it to avoid getting a `null` topic.
dlg99 pushed a commit to dlg99/pulsar that referenced this issue May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants