Timeout while producing message #173

sschepens · 2017-01-25T14:19:52Z

Last night we experienced a couple of timeouts (1s) producing messages to broker, we found that at that exact time the broker triggered a ledger close. Can these two events be related?

Producer logs:

[log_time:08:49:16.219] [thread:pulsar-timer-7-1] [level:INFO ] [logger:ProducerImpl] - [persistent://fury/global/bf8dee5d854a4267a4b8ba7546d7d5d5-mediations-tasks/bf8dee5d854a4267a4b8ba7546d7d5d5-mediations-tasks-partition-5] [default-cluster1-4-3532] Message send timed out. Failing 1 messages

java.util.concurrent.CompletionException: com.yahoo.pulsar.client.api.PulsarClientException$TimeoutException: Could not send message to broker within given timeout
	at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
	at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
	at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
	at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
	at com.yahoo.pulsar.client.impl.ProducerImpl$1.sendComplete(ProducerImpl.java:152)
	at com.yahoo.pulsar.client.impl.ProducerImpl.lambda$failPendingMessages$6(ProducerImpl.java:932)
	at java.lang.Iterable.forEach(Iterable.java:75)
	at com.yahoo.pulsar.client.impl.ProducerImpl.failPendingMessages(ProducerImpl.java:927)
	at com.yahoo.pulsar.client.impl.ProducerImpl.lambda$failPendingMessages$7(ProducerImpl.java:950)
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:418)
	at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:312)
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:873)
	at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
	at java.lang.Thread.run(Thread.java:745)
Caused by: com.yahoo.pulsar.client.api.PulsarClientException$TimeoutException: Could not send message to broker within given timeout
	at com.yahoo.pulsar.client.impl.ProducerImpl.run(ProducerImpl.java:904)
	at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:588)
	at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:662)
	at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:385)
	... 2 more

Broker logs:

2017-01-25 08:49:15,142 - INFO  - [bookkeeper-ml-workers-36-1:OpAddEntry@155] - [fury/global/bf8dee5d854a4267a4b8ba7546d7d5d5-mediations-tasks/persistent/bf8dee5d854a4267a4b8ba7546d7d5d5-mediations-tasks-partition-5] Closing ledger 6999 for being full

2017-01-25 08:50:19,939 - INFO  - [main-EventThread:ManagedLedgerImpl@937] - [fury/global/bf8dee5d854a4267a4b8ba7546d7d5d5-mediations-tasks/persistent/bf8dee5d854a4267a4b8ba7546d7d5d5-mediations-tasks-partition-5] Created new ledger 7740

What's also curious is that there's a one minute difference between ledger close and new open events.

The text was updated successfully, but these errors were encountered:

sschepens · 2017-02-07T15:37:07Z

@merlimat any idea about this?
I realized brokers use a couple of ThreadPools to synchronously execute tasks, such as ledger rolling and other stuff, could this be related? Maybe with many topics and consumers, and many rolling tasks those threadpools are causing some waiting?

ivankelly · 2018-08-30T15:03:56Z

@sschepens This could well be down to zookeeper being overloaded. Have you experienced this issue again?

ivankelly · 2018-12-05T22:15:41Z

Closing since there's been no update in over 18 months.

pushkarsawant · 2020-10-20T20:21:40Z

Hi,

We are experiencing same issue with 2.6.0. We are seeing intermittent timeouts while producing message. It's logged as a WARN.

In the broker logs, there are no errors. Only Info message regarding closing existing ledger and opening a new ledger. In our case the closing and opening message is about 2 minutes apart.

support create topic with configed partition number. Fix apache#173

sijie pushed a commit to sijie/pulsar that referenced this issue Mar 4, 2018

Add more logs while starting thread container (apache#173)

0e4a4da

ivankelly added triage/week-35 deprecated/question Questions should happened in GitHub Discussions labels Aug 30, 2018

ivankelly removed the triage/week-35 label Dec 5, 2018

ivankelly closed this as completed Dec 5, 2018

hangc0276 pushed a commit to hangc0276/pulsar that referenced this issue May 26, 2021

support create topic with configed partition number (apache#174)

78d9ba3

support create topic with configed partition number. Fix apache#173

xiaotongwang1 mentioned this issue Aug 4, 2021

Pulsar 2.7.0+ KOP 2.7.2.x getPartitionedTopicMetadata timeout #11532

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timeout while producing message #173

Timeout while producing message #173

sschepens commented Jan 25, 2017

sschepens commented Feb 7, 2017

ivankelly commented Aug 30, 2018

ivankelly commented Dec 5, 2018

pushkarsawant commented Oct 20, 2020

Timeout while producing message #173

Timeout while producing message #173

Comments

sschepens commented Jan 25, 2017

sschepens commented Feb 7, 2017

ivankelly commented Aug 30, 2018

ivankelly commented Dec 5, 2018

pushkarsawant commented Oct 20, 2020