Skip to content
This repository has been archived by the owner on Apr 1, 2024. It is now read-only.

ISSUE-8625: ZooKeeperDataCache TimeoutException #1712

Open
sijie opened this issue Nov 19, 2020 · 0 comments
Open

ISSUE-8625: ZooKeeperDataCache TimeoutException #1712

sijie opened this issue Nov 19, 2020 · 0 comments

Comments

@sijie
Copy link
Member

sijie commented Nov 19, 2020

Original Issue: apache#8625


Describe the bug

when we grant permission, exception occurs on the broker:

[ForkJoinPool.commonPool-worker-1] ERROR org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://xxx/xxx/xxxx-partition-1] Policies update failed org.apache.pulsar.broker.service.BrokerServiceException$ServerMetadataException: java.util.concurrent.TimeoutException, scheduled retry in 60 seconds
java.util.concurrent.CompletionException: org.apache.pulsar.broker.service.BrokerServiceException$ServerMetadataException: java.util.concurrent.TimeoutException
	at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) ~[?:1.8.0_252]
	at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) ~[?:1.8.0_252]
	at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:661) ~[?:1.8.0_252]
	at java.util.concurrent.CompletableFuture.uniAcceptStage(CompletableFuture.java:683) ~[?:1.8.0_252]
	at java.util.concurrent.CompletableFuture.thenAccept(CompletableFuture.java:2010) ~[?:1.8.0_252]
	at org.apache.pulsar.broker.service.persistent.PersistentTopic.checkReplicationAndRetryOnFailure(PersistentTopic.java:1027) ~[org.apache.pulsar-pulsar-broker-2.6.2.jar:2.6.2]
	at org.apache.pulsar.broker.service.persistent.PersistentTopic.onPoliciesUpdate(PersistentTopic.java:1844) ~[org.apache.pulsar-pulsar-broker-2.6.2.jar:2.6.2]
	at org.apache.pulsar.broker.service.BrokerService.lambda$null$59(BrokerService.java:1537) ~[org.apache.pulsar-pulsar-broker-2.6.2.jar:2.6.2]
	at java.util.Optional.ifPresent(Optional.java:159) ~[?:1.8.0_252]
	at org.apache.pulsar.broker.service.BrokerService.lambda$null$60(BrokerService.java:1537) ~[org.apache.pulsar-pulsar-broker-2.6.2.jar:2.6.2]
	at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:670) ~[?:1.8.0_252]
	at java.util.concurrent.CompletableFuture.uniAcceptStage(CompletableFuture.java:683) ~[?:1.8.0_252]
	at java.util.concurrent.CompletableFuture.thenAccept(CompletableFuture.java:2010) ~[?:1.8.0_252]
	at org.apache.pulsar.broker.service.BrokerService.lambda$onUpdate$61(BrokerService.java:1532) ~[org.apache.pulsar-pulsar-broker-2.6.2.jar:2.6.2]
	at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap$Section.forEach(ConcurrentOpenHashMap.java:388) ~[org.apache.pulsar-pulsar-common-2.6.2.jar:2.6.2]
	at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap.forEach(ConcurrentOpenHashMap.java:160) ~[org.apache.pulsar-pulsar-common-2.6.2.jar:2.6.2]
	at org.apache.pulsar.broker.service.BrokerService.onUpdate(BrokerService.java:1528) ~[org.apache.pulsar-pulsar-broker-2.6.2.jar:2.6.2]
	at org.apache.pulsar.broker.service.BrokerService.onUpdate(BrokerService.java:171) ~[org.apache.pulsar-pulsar-broker-2.6.2.jar:2.6.2]
	at org.apache.pulsar.zookeeper.ZooKeeperDataCache.lambda$4(ZooKeeperDataCache.java:138) ~[org.apache.pulsar-pulsar-zookeeper-utils-2.6.2.jar:2.6.2]
	at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:670) ~[?:1.8.0_252]
	at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:646) ~[?:1.8.0_252]
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_252]
	at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) ~[?:1.8.0_252]
	at org.apache.pulsar.zookeeper.ZooKeeperCache.lambda$18(ZooKeeperCache.java:385) ~[org.apache.pulsar-pulsar-zookeeper-utils-2.6.2.jar:2.6.2]
	at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:670) ~[?:1.8.0_252]
	at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:646) ~[?:1.8.0_252]
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_252]
	at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) ~[?:1.8.0_252]
	at org.apache.pulsar.zookeeper.ZooKeeperCache.lambda$14(ZooKeeperCache.java:365) ~[org.apache.pulsar-pulsar-zookeeper-utils-2.6.2.jar:2.6.2]
	at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402) [?:1.8.0_252]
	at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) [?:1.8.0_252]
	at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) [?:1.8.0_252]
	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) [?:1.8.0_252]
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157) [?:1.8.0_252]
Caused by: org.apache.pulsar.broker.service.BrokerServiceException$ServerMetadataException: java.util.concurrent.TimeoutException
	at org.apache.pulsar.broker.service.persistent.PersistentTopic.checkReplication(PersistentTopic.java:1081) ~[org.apache.pulsar-pulsar-broker-2.6.2.jar:2.6.2]
	... 29 more
Caused by: java.util.concurrent.TimeoutException
	at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784) ~[?:1.8.0_252]
	at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928) ~[?:1.8.0_252]
	at org.apache.pulsar.zookeeper.ZooKeeperDataCache.get(ZooKeeperDataCache.java:97) ~[org.apache.pulsar-pulsar-zookeeper-utils-2.6.2.jar:2.6.2]
	at org.apache.pulsar.broker.service.persistent.PersistentTopic.checkReplication(PersistentTopic.java:1077) ~[org.apache.pulsar-pulsar-broker-2.6.2.jar:2.6.2]
	... 29 more
WARN  org.apache.pulsar.broker.service.Producer - [xxx] Get unexpected error while autorizing [persistent://xx/xxx/xxx-partition-0]  null
java.util.concurrent.TimeoutException: null
	at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784) ~[?:1.8.0_252]
	at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928) ~[?:1.8.0_252]
	at org.apache.pulsar.broker.authorization.AuthorizationService.canProduce(AuthorizationService.java:226) ~[org.apache.pulsar-pulsar-broker-common-2.6.2.jar:2.6.2]
	at org.apache.pulsar.broker.service.Producer.checkPermissions(Producer.java:593) ~[org.apache.pulsar-pulsar-broker-2.6.2.jar:2.6.2]
	at org.apache.pulsar.broker.service.persistent.PersistentTopic.lambda$onPoliciesUpdate$62(PersistentTopic.java:1830) ~[org.apache.pulsar-pulsar-broker-2.6.2.jar:2.6.2]
	at java.util.concurrent.ConcurrentHashMap$ValuesView.forEach(ConcurrentHashMap.java:4707) ~[?:1.8.0_252]
	at org.apache.pulsar.broker.service.persistent.PersistentTopic.onPoliciesUpdate(PersistentTopic.java:1829) ~[org.apache.pulsar-pulsar-broker-2.6.2.jar:2.6.2]
	at org.apache.pulsar.broker.service.BrokerService.lambda$null$59(BrokerService.java:1537) ~[org.apache.pulsar-pulsar-broker-2.6.2.jar:2.6.2]
	at java.util.Optional.ifPresent(Optional.java:159) ~[?:1.8.0_252]
	at org.apache.pulsar.broker.service.BrokerService.lambda$null$60(BrokerService.java:1537) ~[org.apache.pulsar-pulsar-broker-2.6.2.jar:2.6.2]
	at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:670) ~[?:1.8.0_252]
	at java.util.concurrent.CompletableFuture.uniAcceptStage(CompletableFuture.java:683) ~[?:1.8.0_252]
	at java.util.concurrent.CompletableFuture.thenAccept(CompletableFuture.java:2010) ~[?:1.8.0_252]
	at org.apache.pulsar.broker.service.BrokerService.lambda$onUpdate$61(BrokerService.java:1532) ~[org.apache.pulsar-pulsar-broker-2.6.2.jar:2.6.2]
	at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap$Section.forEach(ConcurrentOpenHashMap.java:388) ~[org.apache.pulsar-pulsar-common-2.6.2.jar:2.6.2]
	at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap.forEach(ConcurrentOpenHashMap.java:160) ~[org.apache.pulsar-pulsar-common-2.6.2.jar:2.6.2]
	at org.apache.pulsar.broker.service.BrokerService.onUpdate(BrokerService.java:1528) ~[org.apache.pulsar-pulsar-broker-2.6.2.jar:2.6.2]
	at org.apache.pulsar.broker.service.BrokerService.onUpdate(BrokerService.java:171) ~[org.apache.pulsar-pulsar-broker-2.6.2.jar:2.6.2]
	at org.apache.pulsar.zookeeper.ZooKeeperDataCache.lambda$4(ZooKeeperDataCache.java:138) ~[org.apache.pulsar-pulsar-zookeeper-utils-2.6.2.jar:2.6.2]
	at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:670) ~[?:1.8.0_252]
	at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:646) ~[?:1.8.0_252]
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_252]
	at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) ~[?:1.8.0_252]
	at org.apache.pulsar.zookeeper.ZooKeeperCache.lambda$18(ZooKeeperCache.java:385) ~[org.apache.pulsar-pulsar-zookeeper-utils-2.6.2.jar:2.6.2]
	at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:670) ~[?:1.8.0_252]
	at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:646) ~[?:1.8.0_252]
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_252]
	at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) ~[?:1.8.0_252]
	at org.apache.pulsar.zookeeper.ZooKeeperCache.lambda$14(ZooKeeperCache.java:365) ~[org.apache.pulsar-pulsar-zookeeper-utils-2.6.2.jar:2.6.2]
	at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402) [?:1.8.0_252]
	at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) [?:1.8.0_252]
	at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) [?:1.8.0_252]
	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) [?:1.8.0_252]
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157) [?:1.8.0_252]

and all producer and consumer also throw exceptions:

4:48:10,988 WARN  BinaryProtoLookupService - [persistent://xxx/xx/xxx] failed to get Partitioned metadata : org.apache.pulsar.client.api.PulsarClientException: Disconnected from server at test-pulsar-broker.pulsar.svc.cluster.local/172.16.83.46:6650
java.util.concurrent.CompletionException: org.apache.pulsar.client.api.PulsarClientException$LookupException: org.apache.pulsar.client.api.PulsarClientException: Disconnected from server at test-pulsar-broker.pulsar.svc.cluster.local/172.16.83.46:6650
	at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) ~[?:1.8.0_212]
	at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) ~[?:1.8.0_212]
	at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:647) ~[?:1.8.0_212]
	at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:632) ~[?:1.8.0_212]
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) [?:1.8.0_212]
	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977) [?:1.8.0_212]
	at org.apache.pulsar.client.impl.ClientCnx.handleLookupResponse(ClientCnx.java:497) [maxwell.jar:?]
	at org.apache.pulsar.common.protocol.PulsarDecoder.channelRead(PulsarDecoder.java:138) [maxwell.jar:?]
	at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [maxwell.jar:?]
	at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [maxwell.jar:?]
	at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [maxwell.jar:?]
	at org.apache.pulsar.shade.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:321) [maxwell.jar:?]
	at org.apache.pulsar.shade.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:295) [maxwell.jar:?]
	at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [maxwell.jar:?]
	at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [maxwell.jar:?]
	at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [maxwell.jar:?]
	at org.apache.pulsar.shade.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [maxwell.jar:?]
	at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [maxwell.jar:?]
	at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [maxwell.jar:?]
	at org.apache.pulsar.shade.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [maxwell.jar:?]
	at org.apache.pulsar.shade.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:792) [maxwell.jar:?]
	at org.apache.pulsar.shade.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:475) [maxwell.jar:?]
	at org.apache.pulsar.shade.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) [maxwell.jar:?]
	at org.apache.pulsar.shade.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [maxwell.jar:?]
	at org.apache.pulsar.shade.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [maxwell.jar:?]
	at org.apache.pulsar.shade.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [maxwell.jar:?]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_212]
Caused by: org.apache.pulsar.client.api.PulsarClientException$LookupException: org.apache.pulsar.client.api.PulsarClientException: Disconnected from server at test-pulsar-broker.pulsar.svc.cluster.local/172.16.83.46:6650
	at org.apache.pulsar.client.impl.ClientCnx.getPulsarClientException(ClientCnx.java:972) ~[maxwell.jar:?]
	at org.apache.pulsar.client.impl.ClientCnx.handleLookupResponse(ClientCnx.java:498) ~[maxwell.jar:?]
	... 20 more

To Reproduce
Steps to reproduce the behavior:

  1. Create a standard pulsar cluster in k8s
  2. Create 20 partitioned topics, and set the number of partitions to 2
  3. Randomly grant permissions to these topics
  4. See error

Expected behavior
No error

Additional context
we try #8304 ,but it doesn't work

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant