-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the dead lock when using hasMessageAvailableAsync and readNextAsync #11183
Fix the dead lock when using hasMessageAvailableAsync and readNextAsync #11183
Conversation
if (e == null && has) { | ||
CompletableFuture<Message<Schemas.PersonOne>> future = reader.readNextAsync(); | ||
// Make sure the future completed | ||
Awaitility.await().pollInterval(1, TimeUnit.MILLISECONDS).untilAsserted(future::isDone); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why need to wait for the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to simulate the race condition here, the problem happens only the future is already completed, so that will continue using the IO thread to execute the callback.
Great work! This is very difficult to troubleshoot, thanks penghui
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great catch !
LGTM
The issue will happens after satisfying the following conditions: 1. The messages are added to the incoming queue before reading messages 2. The result future of the readNextAsync been complete before call future.whenComplete by users, This won't always appear. After that, since we are using the IO thread to call the callback of the hasMessageAvailableAsync, so the IO thread will process the message.getValue(). Then might get a deadlock as followings: ``` jdk.internal.misc.Unsafe.park(boolean, long) Unsafe.java (native) java.util.concurrent.locks.LockSupport.park(Object) LockSupport.java:194 java.util.concurrent.CompletableFuture$Signaller.block() CompletableFuture.java:1796 java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool$ManagedBlocker) ForkJoinPool.java:3128 java.util.concurrent.CompletableFuture.waitingGet(boolean) CompletableFuture.java:1823 java.util.concurrent.CompletableFuture.get() CompletableFuture.java:1998 org.apache.pulsar.client.impl.schema.reader.AbstractMultiVersionReader.getSchemaInfoByVersion(byte[]) AbstractMultiVersionReader.java:115 org.apache.pulsar.client.impl.schema.reader.MultiVersionAvroReader.loadReader(BytesSchemaVersion) MultiVersionAvroReader.java:47 org.apache.pulsar.client.impl.schema.reader.AbstractMultiVersionReader$1.load(BytesSchemaVersion) AbstractMultiVersionReader.java:52 org.apache.pulsar.client.impl.schema.reader.AbstractMultiVersionReader$1.load(Object) AbstractMultiVersionReader.java:49 com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(Object, CacheLoader) LocalCache.java:3529 com.google.common.cache.LocalCache$Segment.loadSync(Object, int, LocalCache$LoadingValueReference, CacheLoader) LocalCache.java:2278 com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(Object, int, CacheLoader) LocalCache.java:2155 com.google.common.cache.LocalCache$Segment.get(Object, int, CacheLoader) LocalCache.java:2045 com.google.common.cache.LocalCache.get(Object, CacheLoader) LocalCache.java:3951 com.google.common.cache.LocalCache.getOrLoad(Object) LocalCache.java:3974 com.google.common.cache.LocalCache$LocalLoadingCache.get(Object) LocalCache.java:4935 org.apache.pulsar.client.impl.schema.reader.AbstractMultiVersionReader.read(byte[], byte[]) AbstractMultiVersionReader.java:86 org.apache.pulsar.client.impl.schema.AbstractStructSchema.decode(byte[], byte[]) AbstractStructSchema.java:60 org.apache.pulsar.client.impl.MessageImpl.getValue() MessageImpl.java:301 org.apache.pulsar.broker.service.SystemTopicBasedTopicPoliciesService.refreshTopicPoliciesCache(Message) SystemTopicBasedTopicPoliciesService.java:302 org.apache.pulsar.broker.service.SystemTopicBasedTopicPoliciesService.lambda$initPolicesCache$7(SystemTopicClient$Reader, Throwable, CompletableFuture, Message, Throwable) SystemTopicBasedTopicPoliciesService.java:254 org.apache.pulsar.broker.service.SystemTopicBasedTopicPoliciesService$$Lambda$817.accept(Object, Object) java.util.concurrent.CompletableFuture.uniWhenComplete(Object, BiConsumer, CompletableFuture$UniWhenComplete) CompletableFuture.java:859 java.util.concurrent.CompletableFuture.uniWhenCompleteStage(Executor, BiConsumer) CompletableFuture.java:883 java.util.concurrent.CompletableFuture.whenComplete(BiConsumer) CompletableFuture.java:2251 org.apache.pulsar.broker.service.SystemTopicBasedTopicPoliciesService.lambda$initPolicesCache$10(SystemTopicClient$Reader, CompletableFuture, Boolean, Throwable) SystemTopicBasedTopicPoliciesService.java:246 org.apache.pulsar.broker.service.SystemTopicBasedTopicPoliciesService$$Lambda$725.accept(Object, Object) java.util.concurrent.CompletableFuture.uniWhenComplete(Object, BiConsumer, CompletableFuture$UniWhenComplete) CompletableFuture.java:859 java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(int) CompletableFuture.java:837 java.util.concurrent.CompletableFuture.postComplete() CompletableFuture.java:506 java.util.concurrent.CompletableFuture.complete(Object) CompletableFuture.java:2073 org.apache.pulsar.client.impl.ConsumerImpl.lambda$hasMessageAvailableAsync$48(CompletableFuture, MessageId) ConsumerImpl.java:2075 org.apache.pulsar.client.impl.ConsumerImpl$$Lambda$723.accept(Object) java.util.concurrent.CompletableFuture$UniAccept.tryFire(int) CompletableFuture.java:714 java.util.concurrent.CompletableFuture.postComplete() CompletableFuture.java:506 java.util.concurrent.CompletableFuture.complete(Object) CompletableFuture.java:2073 org.apache.pulsar.client.impl.ConsumerImpl.lambda$internalGetLastMessageIdAsync$53(CompletableFuture, PulsarApi$CommandGetLastMessageIdResponse) ConsumerImpl.java:2187 org.apache.pulsar.client.impl.ConsumerImpl$$Lambda$720.accept(Object) java.util.concurrent.CompletableFuture$UniAccept.tryFire(int) CompletableFuture.java:714 java.util.concurrent.CompletableFuture.postComplete() CompletableFuture.java:506 java.util.concurrent.CompletableFuture.complete(Object) CompletableFuture.java:2073 org.apache.pulsar.client.impl.ClientCnx.handleGetLastMessageIdSuccess(PulsarApi$CommandGetLastMessageIdResponse) ClientCnx.java:468 org.apache.pulsar.common.protocol.PulsarDecoder.channelRead(ChannelHandlerContext, Object) PulsarDecoder.java:342 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Object) AbstractChannelHandlerContext.java:379 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext, Object) AbstractChannelHandlerContext.java:365 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(Object) AbstractChannelHandlerContext.java:357 io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ChannelHandlerContext, CodecOutputList, int) ByteToMessageDecoder.java:324 io.netty.handler.codec.ByteToMessageDecoder.channelRead(ChannelHandlerContext, Object) ByteToMessageDecoder.java:296 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Object) AbstractChannelHandlerContext.java:379 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext, Object) AbstractChannelHandlerContext.java:365 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(Object) AbstractChannelHandlerContext.java:357 io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(ChannelHandlerContext, Object) DefaultChannelPipeline.java:1410 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Object) AbstractChannelHandlerContext.java:379 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext, Object) AbstractChannelHandlerContext.java:365 io.netty.channel.DefaultChannelPipeline.fireChannelRead(Object) DefaultChannelPipeline.java:919 io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read() AbstractNioByteChannel.java:166 io.netty.channel.nio.NioEventLoop.processSelectedKey(SelectionKey, AbstractNioChannel) NioEventLoop.java:719 io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized() NioEventLoop.java:655 io.netty.channel.nio.NioEventLoop.processSelectedKeys() NioEventLoop.java:581 io.netty.channel.nio.NioEventLoop.run() NioEventLoop.java:493 io.netty.util.concurrent.SingleThreadEventExecutor$4.run() SingleThreadEventExecutor.java:989 io.netty.util.internal.ThreadExecutorMap$2.run() ThreadExecutorMap.java:74 io.netty.util.concurrent.FastThreadLocalRunnable.run() FastThreadLocalRunnable.java:30 java.lang.Thread.run() Thread.java:834 ``` Since we are introduced the internal thread pool for handling the client internal executions. So the fix is using the internal thread to process the callback of the hasMessageAvailableAsync
2ec404c
to
6d562f3
Compare
Great catch! The deadlock it hard to debug. |
…nc (#11183) The issue will happens after satisfying the following conditions: 1. The messages are added to the incoming queue before reading messages 2. The result future of the readNextAsync been complete before call future.whenComplete by users, This won't always appear. After that, since we are using the IO thread to call the callback of the hasMessageAvailableAsync, so the IO thread will process the message.getValue(). Then might get a deadlock as followings: ``` java.util.concurrent.CompletableFuture.get() CompletableFuture.java:1998 org.apache.pulsar.client.impl.schema.reader.AbstractMultiVersionReader.getSchemaInfoByVersion(byte[]) AbstractMultiVersionReader.java:115 org.apache.pulsar.client.impl.schema.reader.MultiVersionAvroReader.loadReader(BytesSchemaVersion) MultiVersionAvroReader.java:47 org.apache.pulsar.client.impl.schema.reader.AbstractMultiVersionReader$1.load(BytesSchemaVersion) AbstractMultiVersionReader.java:52 org.apache.pulsar.client.impl.schema.reader.AbstractMultiVersionReader$1.load(Object) AbstractMultiVersionReader.java:49 com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(Object, CacheLoader) LocalCache.java:3529 com.google.common.cache.LocalCache$Segment.loadSync(Object, int, LocalCache$LoadingValueReference, CacheLoader) LocalCache.java:2278 com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(Object, int, CacheLoader) LocalCache.java:2155 com.google.common.cache.LocalCache$Segment.get(Object, int, CacheLoader) LocalCache.java:2045 com.google.common.cache.LocalCache.get(Object, CacheLoader) LocalCache.java:3951 com.google.common.cache.LocalCache.getOrLoad(Object) LocalCache.java:3974 com.google.common.cache.LocalCache$LocalLoadingCache.get(Object) LocalCache.java:4935 org.apache.pulsar.client.impl.schema.reader.AbstractMultiVersionReader.read(byte[], byte[]) AbstractMultiVersionReader.java:86 org.apache.pulsar.client.impl.schema.AbstractStructSchema.decode(byte[], byte[]) AbstractStructSchema.java:60 org.apache.pulsar.client.impl.MessageImpl.getValue() MessageImpl.java:301 org.apache.pulsar.broker.service.SystemTopicBasedTopicPoliciesService.refreshTopicPoliciesCache(Message) SystemTopicBasedTopicPoliciesService.java:302 org.apache.pulsar.broker.service.SystemTopicBasedTopicPoliciesService.lambda$initPolicesCache$7(SystemTopicClient$Reader, Throwable, CompletableFuture, Message, Throwable) SystemTopicBasedTopicPoliciesService.java:254 org.apache.pulsar.broker.service.SystemTopicBasedTopicPoliciesService$$Lambda$817.accept(Object, Object) java.util.concurrent.CompletableFuture.uniWhenComplete(Object, BiConsumer, CompletableFuture$UniWhenComplete) CompletableFuture.java:859 java.util.concurrent.CompletableFuture.uniWhenCompleteStage(Executor, BiConsumer) CompletableFuture.java:883 java.util.concurrent.CompletableFuture.whenComplete(BiConsumer) CompletableFuture.java:2251 org.apache.pulsar.broker.service.SystemTopicBasedTopicPoliciesService.lambda$initPolicesCache$10(SystemTopicClient$Reader, CompletableFuture, Boolean, Throwable) SystemTopicBasedTopicPoliciesService.java:246 org.apache.pulsar.broker.service.SystemTopicBasedTopicPoliciesService$$Lambda$725.accept(Object, Object) org.apache.pulsar.client.impl.ClientCnx.handleGetLastMessageIdSuccess(PulsarApi$CommandGetLastMessageIdResponse) ClientCnx.java:468 ``` Since we are introduced the internal thread pool for handling the client internal executions. So the fix is using the internal thread to process the callback of the hasMessageAvailableAsync (cherry picked from commit ed42007)
…nc (#11183) The issue will happens after satisfying the following conditions: 1. The messages are added to the incoming queue before reading messages 2. The result future of the readNextAsync been complete before call future.whenComplete by users, This won't always appear. After that, since we are using the IO thread to call the callback of the hasMessageAvailableAsync, so the IO thread will process the message.getValue(). Then might get a deadlock as followings: ``` java.util.concurrent.CompletableFuture.get() CompletableFuture.java:1998 org.apache.pulsar.client.impl.schema.reader.AbstractMultiVersionReader.getSchemaInfoByVersion(byte[]) AbstractMultiVersionReader.java:115 org.apache.pulsar.client.impl.schema.reader.MultiVersionAvroReader.loadReader(BytesSchemaVersion) MultiVersionAvroReader.java:47 org.apache.pulsar.client.impl.schema.reader.AbstractMultiVersionReader$1.load(BytesSchemaVersion) AbstractMultiVersionReader.java:52 org.apache.pulsar.client.impl.schema.reader.AbstractMultiVersionReader$1.load(Object) AbstractMultiVersionReader.java:49 com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(Object, CacheLoader) LocalCache.java:3529 com.google.common.cache.LocalCache$Segment.loadSync(Object, int, LocalCache$LoadingValueReference, CacheLoader) LocalCache.java:2278 com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(Object, int, CacheLoader) LocalCache.java:2155 com.google.common.cache.LocalCache$Segment.get(Object, int, CacheLoader) LocalCache.java:2045 com.google.common.cache.LocalCache.get(Object, CacheLoader) LocalCache.java:3951 com.google.common.cache.LocalCache.getOrLoad(Object) LocalCache.java:3974 com.google.common.cache.LocalCache$LocalLoadingCache.get(Object) LocalCache.java:4935 org.apache.pulsar.client.impl.schema.reader.AbstractMultiVersionReader.read(byte[], byte[]) AbstractMultiVersionReader.java:86 org.apache.pulsar.client.impl.schema.AbstractStructSchema.decode(byte[], byte[]) AbstractStructSchema.java:60 org.apache.pulsar.client.impl.MessageImpl.getValue() MessageImpl.java:301 org.apache.pulsar.broker.service.SystemTopicBasedTopicPoliciesService.refreshTopicPoliciesCache(Message) SystemTopicBasedTopicPoliciesService.java:302 org.apache.pulsar.broker.service.SystemTopicBasedTopicPoliciesService.lambda$initPolicesCache$7(SystemTopicClient$Reader, Throwable, CompletableFuture, Message, Throwable) SystemTopicBasedTopicPoliciesService.java:254 org.apache.pulsar.broker.service.SystemTopicBasedTopicPoliciesService$$Lambda$817.accept(Object, Object) java.util.concurrent.CompletableFuture.uniWhenComplete(Object, BiConsumer, CompletableFuture$UniWhenComplete) CompletableFuture.java:859 java.util.concurrent.CompletableFuture.uniWhenCompleteStage(Executor, BiConsumer) CompletableFuture.java:883 java.util.concurrent.CompletableFuture.whenComplete(BiConsumer) CompletableFuture.java:2251 org.apache.pulsar.broker.service.SystemTopicBasedTopicPoliciesService.lambda$initPolicesCache$10(SystemTopicClient$Reader, CompletableFuture, Boolean, Throwable) SystemTopicBasedTopicPoliciesService.java:246 org.apache.pulsar.broker.service.SystemTopicBasedTopicPoliciesService$$Lambda$725.accept(Object, Object) org.apache.pulsar.client.impl.ClientCnx.handleGetLastMessageIdSuccess(PulsarApi$CommandGetLastMessageIdResponse) ClientCnx.java:468 ``` Since we are introduced the internal thread pool for handling the client internal executions. So the fix is using the internal thread to process the callback of the hasMessageAvailableAsync (cherry picked from commit ed42007)
…nc (apache#11183) The issue will happens after satisfying the following conditions: 1. The messages are added to the incoming queue before reading messages 2. The result future of the readNextAsync been complete before call future.whenComplete by users, This won't always appear. After that, since we are using the IO thread to call the callback of the hasMessageAvailableAsync, so the IO thread will process the message.getValue(). Then might get a deadlock as followings: ``` java.util.concurrent.CompletableFuture.get() CompletableFuture.java:1998 org.apache.pulsar.client.impl.schema.reader.AbstractMultiVersionReader.getSchemaInfoByVersion(byte[]) AbstractMultiVersionReader.java:115 org.apache.pulsar.client.impl.schema.reader.MultiVersionAvroReader.loadReader(BytesSchemaVersion) MultiVersionAvroReader.java:47 org.apache.pulsar.client.impl.schema.reader.AbstractMultiVersionReader$1.load(BytesSchemaVersion) AbstractMultiVersionReader.java:52 org.apache.pulsar.client.impl.schema.reader.AbstractMultiVersionReader$1.load(Object) AbstractMultiVersionReader.java:49 com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(Object, CacheLoader) LocalCache.java:3529 com.google.common.cache.LocalCache$Segment.loadSync(Object, int, LocalCache$LoadingValueReference, CacheLoader) LocalCache.java:2278 com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(Object, int, CacheLoader) LocalCache.java:2155 com.google.common.cache.LocalCache$Segment.get(Object, int, CacheLoader) LocalCache.java:2045 com.google.common.cache.LocalCache.get(Object, CacheLoader) LocalCache.java:3951 com.google.common.cache.LocalCache.getOrLoad(Object) LocalCache.java:3974 com.google.common.cache.LocalCache$LocalLoadingCache.get(Object) LocalCache.java:4935 org.apache.pulsar.client.impl.schema.reader.AbstractMultiVersionReader.read(byte[], byte[]) AbstractMultiVersionReader.java:86 org.apache.pulsar.client.impl.schema.AbstractStructSchema.decode(byte[], byte[]) AbstractStructSchema.java:60 org.apache.pulsar.client.impl.MessageImpl.getValue() MessageImpl.java:301 org.apache.pulsar.broker.service.SystemTopicBasedTopicPoliciesService.refreshTopicPoliciesCache(Message) SystemTopicBasedTopicPoliciesService.java:302 org.apache.pulsar.broker.service.SystemTopicBasedTopicPoliciesService.lambda$initPolicesCache$7(SystemTopicClient$Reader, Throwable, CompletableFuture, Message, Throwable) SystemTopicBasedTopicPoliciesService.java:254 org.apache.pulsar.broker.service.SystemTopicBasedTopicPoliciesService$$Lambda$817.accept(Object, Object) java.util.concurrent.CompletableFuture.uniWhenComplete(Object, BiConsumer, CompletableFuture$UniWhenComplete) CompletableFuture.java:859 java.util.concurrent.CompletableFuture.uniWhenCompleteStage(Executor, BiConsumer) CompletableFuture.java:883 java.util.concurrent.CompletableFuture.whenComplete(BiConsumer) CompletableFuture.java:2251 org.apache.pulsar.broker.service.SystemTopicBasedTopicPoliciesService.lambda$initPolicesCache$10(SystemTopicClient$Reader, CompletableFuture, Boolean, Throwable) SystemTopicBasedTopicPoliciesService.java:246 org.apache.pulsar.broker.service.SystemTopicBasedTopicPoliciesService$$Lambda$725.accept(Object, Object) org.apache.pulsar.client.impl.ClientCnx.handleGetLastMessageIdSuccess(PulsarApi$CommandGetLastMessageIdResponse) ClientCnx.java:468 ``` Since we are introduced the internal thread pool for handling the client internal executions. So the fix is using the internal thread to process the callback of the hasMessageAvailableAsync (cherry picked from commit ed42007)
…nc (apache#11183) The issue will happens after satisfying the following conditions: 1. The messages are added to the incoming queue before reading messages 2. The result future of the readNextAsync been complete before call future.whenComplete by users, This won't always appear. After that, since we are using the IO thread to call the callback of the hasMessageAvailableAsync, so the IO thread will process the message.getValue(). Then might get a deadlock as followings: ``` java.util.concurrent.CompletableFuture.get() CompletableFuture.java:1998 org.apache.pulsar.client.impl.schema.reader.AbstractMultiVersionReader.getSchemaInfoByVersion(byte[]) AbstractMultiVersionReader.java:115 org.apache.pulsar.client.impl.schema.reader.MultiVersionAvroReader.loadReader(BytesSchemaVersion) MultiVersionAvroReader.java:47 org.apache.pulsar.client.impl.schema.reader.AbstractMultiVersionReader$1.load(BytesSchemaVersion) AbstractMultiVersionReader.java:52 org.apache.pulsar.client.impl.schema.reader.AbstractMultiVersionReader$1.load(Object) AbstractMultiVersionReader.java:49 com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(Object, CacheLoader) LocalCache.java:3529 com.google.common.cache.LocalCache$Segment.loadSync(Object, int, LocalCache$LoadingValueReference, CacheLoader) LocalCache.java:2278 com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(Object, int, CacheLoader) LocalCache.java:2155 com.google.common.cache.LocalCache$Segment.get(Object, int, CacheLoader) LocalCache.java:2045 com.google.common.cache.LocalCache.get(Object, CacheLoader) LocalCache.java:3951 com.google.common.cache.LocalCache.getOrLoad(Object) LocalCache.java:3974 com.google.common.cache.LocalCache$LocalLoadingCache.get(Object) LocalCache.java:4935 org.apache.pulsar.client.impl.schema.reader.AbstractMultiVersionReader.read(byte[], byte[]) AbstractMultiVersionReader.java:86 org.apache.pulsar.client.impl.schema.AbstractStructSchema.decode(byte[], byte[]) AbstractStructSchema.java:60 org.apache.pulsar.client.impl.MessageImpl.getValue() MessageImpl.java:301 org.apache.pulsar.broker.service.SystemTopicBasedTopicPoliciesService.refreshTopicPoliciesCache(Message) SystemTopicBasedTopicPoliciesService.java:302 org.apache.pulsar.broker.service.SystemTopicBasedTopicPoliciesService.lambda$initPolicesCache$7(SystemTopicClient$Reader, Throwable, CompletableFuture, Message, Throwable) SystemTopicBasedTopicPoliciesService.java:254 org.apache.pulsar.broker.service.SystemTopicBasedTopicPoliciesService$$Lambda$817.accept(Object, Object) java.util.concurrent.CompletableFuture.uniWhenComplete(Object, BiConsumer, CompletableFuture$UniWhenComplete) CompletableFuture.java:859 java.util.concurrent.CompletableFuture.uniWhenCompleteStage(Executor, BiConsumer) CompletableFuture.java:883 java.util.concurrent.CompletableFuture.whenComplete(BiConsumer) CompletableFuture.java:2251 org.apache.pulsar.broker.service.SystemTopicBasedTopicPoliciesService.lambda$initPolicesCache$10(SystemTopicClient$Reader, CompletableFuture, Boolean, Throwable) SystemTopicBasedTopicPoliciesService.java:246 org.apache.pulsar.broker.service.SystemTopicBasedTopicPoliciesService$$Lambda$725.accept(Object, Object) org.apache.pulsar.client.impl.ClientCnx.handleGetLastMessageIdSuccess(PulsarApi$CommandGetLastMessageIdResponse) ClientCnx.java:468 ``` Since we are introduced the internal thread pool for handling the client internal executions. So the fix is using the internal thread to process the callback of the hasMessageAvailableAsync (cherry picked from commit ed42007)
…nc (apache#11183) The issue will happens after satisfying the following conditions: 1. The messages are added to the incoming queue before reading messages 2. The result future of the readNextAsync been complete before call future.whenComplete by users, This won't always appear. After that, since we are using the IO thread to call the callback of the hasMessageAvailableAsync, so the IO thread will process the message.getValue(). Then might get a deadlock as followings: ``` java.util.concurrent.CompletableFuture.get() CompletableFuture.java:1998 org.apache.pulsar.client.impl.schema.reader.AbstractMultiVersionReader.getSchemaInfoByVersion(byte[]) AbstractMultiVersionReader.java:115 org.apache.pulsar.client.impl.schema.reader.MultiVersionAvroReader.loadReader(BytesSchemaVersion) MultiVersionAvroReader.java:47 org.apache.pulsar.client.impl.schema.reader.AbstractMultiVersionReader$1.load(BytesSchemaVersion) AbstractMultiVersionReader.java:52 org.apache.pulsar.client.impl.schema.reader.AbstractMultiVersionReader$1.load(Object) AbstractMultiVersionReader.java:49 com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(Object, CacheLoader) LocalCache.java:3529 com.google.common.cache.LocalCache$Segment.loadSync(Object, int, LocalCache$LoadingValueReference, CacheLoader) LocalCache.java:2278 com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(Object, int, CacheLoader) LocalCache.java:2155 com.google.common.cache.LocalCache$Segment.get(Object, int, CacheLoader) LocalCache.java:2045 com.google.common.cache.LocalCache.get(Object, CacheLoader) LocalCache.java:3951 com.google.common.cache.LocalCache.getOrLoad(Object) LocalCache.java:3974 com.google.common.cache.LocalCache$LocalLoadingCache.get(Object) LocalCache.java:4935 org.apache.pulsar.client.impl.schema.reader.AbstractMultiVersionReader.read(byte[], byte[]) AbstractMultiVersionReader.java:86 org.apache.pulsar.client.impl.schema.AbstractStructSchema.decode(byte[], byte[]) AbstractStructSchema.java:60 org.apache.pulsar.client.impl.MessageImpl.getValue() MessageImpl.java:301 org.apache.pulsar.broker.service.SystemTopicBasedTopicPoliciesService.refreshTopicPoliciesCache(Message) SystemTopicBasedTopicPoliciesService.java:302 org.apache.pulsar.broker.service.SystemTopicBasedTopicPoliciesService.lambda$initPolicesCache$7(SystemTopicClient$Reader, Throwable, CompletableFuture, Message, Throwable) SystemTopicBasedTopicPoliciesService.java:254 org.apache.pulsar.broker.service.SystemTopicBasedTopicPoliciesService$$Lambda$817.accept(Object, Object) java.util.concurrent.CompletableFuture.uniWhenComplete(Object, BiConsumer, CompletableFuture$UniWhenComplete) CompletableFuture.java:859 java.util.concurrent.CompletableFuture.uniWhenCompleteStage(Executor, BiConsumer) CompletableFuture.java:883 java.util.concurrent.CompletableFuture.whenComplete(BiConsumer) CompletableFuture.java:2251 org.apache.pulsar.broker.service.SystemTopicBasedTopicPoliciesService.lambda$initPolicesCache$10(SystemTopicClient$Reader, CompletableFuture, Boolean, Throwable) SystemTopicBasedTopicPoliciesService.java:246 org.apache.pulsar.broker.service.SystemTopicBasedTopicPoliciesService$$Lambda$725.accept(Object, Object) org.apache.pulsar.client.impl.ClientCnx.handleGetLastMessageIdSuccess(PulsarApi$CommandGetLastMessageIdResponse) ClientCnx.java:468 ``` Since we are introduced the internal thread pool for handling the client internal executions. So the fix is using the internal thread to process the callback of the hasMessageAvailableAsync (cherry picked from commit ed42007)
…nc (apache#11183) The issue will happens after satisfying the following conditions: 1. The messages are added to the incoming queue before reading messages 2. The result future of the readNextAsync been complete before call future.whenComplete by users, This won't always appear. After that, since we are using the IO thread to call the callback of the hasMessageAvailableAsync, so the IO thread will process the message.getValue(). Then might get a deadlock as followings: ``` java.util.concurrent.CompletableFuture.get() CompletableFuture.java:1998 org.apache.pulsar.client.impl.schema.reader.AbstractMultiVersionReader.getSchemaInfoByVersion(byte[]) AbstractMultiVersionReader.java:115 org.apache.pulsar.client.impl.schema.reader.MultiVersionAvroReader.loadReader(BytesSchemaVersion) MultiVersionAvroReader.java:47 org.apache.pulsar.client.impl.schema.reader.AbstractMultiVersionReader$1.load(BytesSchemaVersion) AbstractMultiVersionReader.java:52 org.apache.pulsar.client.impl.schema.reader.AbstractMultiVersionReader$1.load(Object) AbstractMultiVersionReader.java:49 com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(Object, CacheLoader) LocalCache.java:3529 com.google.common.cache.LocalCache$Segment.loadSync(Object, int, LocalCache$LoadingValueReference, CacheLoader) LocalCache.java:2278 com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(Object, int, CacheLoader) LocalCache.java:2155 com.google.common.cache.LocalCache$Segment.get(Object, int, CacheLoader) LocalCache.java:2045 com.google.common.cache.LocalCache.get(Object, CacheLoader) LocalCache.java:3951 com.google.common.cache.LocalCache.getOrLoad(Object) LocalCache.java:3974 com.google.common.cache.LocalCache$LocalLoadingCache.get(Object) LocalCache.java:4935 org.apache.pulsar.client.impl.schema.reader.AbstractMultiVersionReader.read(byte[], byte[]) AbstractMultiVersionReader.java:86 org.apache.pulsar.client.impl.schema.AbstractStructSchema.decode(byte[], byte[]) AbstractStructSchema.java:60 org.apache.pulsar.client.impl.MessageImpl.getValue() MessageImpl.java:301 org.apache.pulsar.broker.service.SystemTopicBasedTopicPoliciesService.refreshTopicPoliciesCache(Message) SystemTopicBasedTopicPoliciesService.java:302 org.apache.pulsar.broker.service.SystemTopicBasedTopicPoliciesService.lambda$initPolicesCache$7(SystemTopicClient$Reader, Throwable, CompletableFuture, Message, Throwable) SystemTopicBasedTopicPoliciesService.java:254 org.apache.pulsar.broker.service.SystemTopicBasedTopicPoliciesService$$Lambda$817.accept(Object, Object) java.util.concurrent.CompletableFuture.uniWhenComplete(Object, BiConsumer, CompletableFuture$UniWhenComplete) CompletableFuture.java:859 java.util.concurrent.CompletableFuture.uniWhenCompleteStage(Executor, BiConsumer) CompletableFuture.java:883 java.util.concurrent.CompletableFuture.whenComplete(BiConsumer) CompletableFuture.java:2251 org.apache.pulsar.broker.service.SystemTopicBasedTopicPoliciesService.lambda$initPolicesCache$10(SystemTopicClient$Reader, CompletableFuture, Boolean, Throwable) SystemTopicBasedTopicPoliciesService.java:246 org.apache.pulsar.broker.service.SystemTopicBasedTopicPoliciesService$$Lambda$725.accept(Object, Object) org.apache.pulsar.client.impl.ClientCnx.handleGetLastMessageIdSuccess(PulsarApi$CommandGetLastMessageIdResponse) ClientCnx.java:468 ``` Since we are introduced the internal thread pool for handling the client internal executions. So the fix is using the internal thread to process the callback of the hasMessageAvailableAsync
The issue will happens after satisfying the following conditions:
This won't always appear.
After that, since we are using the IO thread to call the callback of the hasMessageAvailableAsync,
so the IO thread will process the message.getValue(). Then might get a deadlock as followings:
Since we are introduced the internal thread pool for handling the client internal executions.
So the fix is using the internal thread to process the callback of the hasMessageAvailableAsync