-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Brokers spuriously error in response to the PRODUCER command for newly added partitions to a topic #23451
Comments
@dwang-qm Pulsar 3.1.x is not supported. Please re-test with a supported version. That is currently Pulsar broker 3.0.7 or 3.3.2 . Thanks! |
That's the version on our production environment. It would take a significant investment of hours to setup an environment to reproduce this in, since this appears to be a race condition. My observations on the cause of the bug are valid for current |
Thanks, @dwang-qm. That's understandable. In any case, thanks for reporting the issue, even it if it's for a version that the OSS project doesn't support. In the OSS Project, we request reporters at least to try to test the most recent released versions to see if the problem is already fixed. In many cases problems are fixed in supported versions. Isolating the issue will be a useful contribution to the project and after isolating an issue, the fix is usually very easy to implement. |
Good observations @dwang-qm! |
Thanks for the response! When Pulsar 4.0 is released, we will upgrade to it and see if we can observe the issue. In the meantime, we're going to update the client, since it looks similar to the issue filed against the client I referenced. I did see many retries in the logs, albeit against the same broker, within a short period of time, and they all failed. Unfortunately, the application code is not in control of any retries necessary due to "eventual consistency," since the client tries to create the new producer automatically after it periodically polls and detects new partitions have been added. If you don't mind answering, I noticed that the reason that the client was trying to talk to the broker to create the producer in the first place is because it got the broker (in my case pulsar-broker-35) from a LOOKUP it did earlier. If so, that's confusing because it means it thinks that broker owns the topic (in my case named xxxx-partition-1), but wouldn't that broker have to actively claim the topic somehow, in which case it should've had the correct metadata to see that there was now more than one partition? See, xxxx is originally a partitioned topic with only one partition (so with only xxxx-partition-0 as a valid topic name), but after the topic metadata update, the C++ client looked up the new partitions and was told to connect to xxxx-partition-1 on pulsar-broker-35. But pulsar-broker-35 thinks there's only one partition in the topic. How could it when it seems to have claimed xxxx-partition-1? |
The default lookup logic is in code in NamespaceService.findBrokerServiceUrl and searchForCandidateBroker methods: pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/namespace/NamespaceService.java Lines 473 to 539 in 9012422
pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/namespace/NamespaceService.java Lines 573 to 712 in 9012422
Then the client makes a lookup request ( CommandLookupTopic in proto) with the authoritative flag, the target broker will attempt to claim the ownership of the topic without doing lookups. That's why the solution tolerates eventual consistency.Each lookup could get redirected multiple times until it hits the target broker. The target broker will also process the lookup request and when it's authoritative , it will attempt to claim the ownership.The partitioned topic metadata is a completely separate concern in Pulsar. The client needs that information for subscribing to multi-partitioned topics. The broker itself doesn't need that information to serve topics since an individual partition of a multi-partitioned topic behaves completely independently. I hope this explanation makes sense. |
Thank you for the explanation and source code snippets! After reviewing them, I think I understand. Suppose, there's a multi-partition topic called Then, the client sends the Do you think this could happen? I think just changing pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BrokerService.java Line 1091 in 9d2606d
to The issue with just retrying is that I have observed retries against the same broker (necessarily! since it owns the topic) over the course of seconds and it still failing because it remains unsynced with zookeeper. Of course, it eventually succeeds, but seconds of delays causes an extreme build-up in our ingestion pipeline and stuff starts failing. Thank you again for your continued attention and responses! |
@dwang-qm exactly. Well done! |
@dwang-qm one more clarification: is the main negative impact the broker logs? Does the creation of the producer from the client perspective succeed? |
Search before asking
Read release policy
Version
Broker: Pulsar 3.1.2
Client: Pulsar v3.4.2
Using Zookeeper as the metadata store.
Minimal reproduce step
What did you expect to see?
Producers successfully created.
What did you see instead?
In the broker logs, the "Illegal topic partition name" error message.
Anything else?
I believe the issue is that when the broker responds to a
PRODUCER
command, it callsServerCnx:: handleProducer
, which callsBrokerService::getOrCreateTopic
, which callsBrokerService::getTopic
, which callsBrokerService::fetchPartitionedTopicMetadataAsync(TopicName topicName)
, which callsBrokerService::fetchPartitionedTopicMetadataAsync(TopicName topicName, boolean refreshCacheAndGet)
, withrefreshCacheAndGet
set to false. This means thatNamespaceResources:: getPartitionedTopicMetadataAsync
is called withrefresh
alwaysfalse
, which means thatgetAsync
is called onNamespaceResources
rather thanrefreshAndGetAsync
. This means that thesync
call on Zookeeper is not called before performing the read.According to the Zookeeper Programmer Guide (https://zookeeper.apache.org/doc/r3.1.2/zookeeperProgrammers.html), "ZooKeeper does not guarantee that at every instance in time, two different clients will have identical views of ZooKeeper data. Due to factors like network delays, one client may perform an update before another client gets notified of the change. Consider the scenario of two clients, A and B. If client A sets the value of a znode /a from 0 to 1, then tells client B to read /a, client B may read the old value of 0, depending on which server it is connected to. If it is important that Client A and Client B read the same value, Client B should should call the sync() method from the ZooKeeper API method before it performs its read."
This seems to indicate that without doing the sync, the broker could get an out of date picture of the number of partitions the topic has, resulting in spuriously erroring. I believe that reading the partition metadata for handling the
PRODUCER
command should use authoritative reads (callingsync
before performing the Zookeeper reads).This bug filed against
pulsar-cpp-client
may be related: apache/pulsar-client-cpp#319. Thepulsar-cpp-client
developers seem to have mitigated it by adding retries, but the broker-side failure should still not happen.Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: