-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support multiple zookeeper quorum to store cluster-management-configu… #196
Conversation
@@ -906,6 +924,14 @@ public void setZooKeeperSessionTimeoutMillis(long zooKeeperSessionTimeoutMillis) | |||
this.zooKeeperSessionTimeoutMillis = zooKeeperSessionTimeoutMillis; | |||
} | |||
|
|||
public long getDataZooKeeperSessionTimeoutMillis() { | |||
return dataZooKeeperSessionTimeoutMillis; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we return zooKeeperSessionTimeout if dataZookeeperServers is empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually we are initializing zooKeeperSessionTimeoutMillis=30000
and dataZooKeeperSessionTimeoutMillis=60000
with default value.
@@ -53,7 +54,11 @@ | |||
|
|||
@Parameter(names = { "-zk", | |||
"--zookeeper" }, description = "Local ZooKeeper quorum connection string", required = true) | |||
private String zookeeper; | |||
private String localZookeeper; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let us not use zookeeper and localZookeeper interchangeably. ServiceConfiguration refers to zookeeper. Let us stick to one naming convention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Updated it to zookeeper
.
…ration and ledger-metadata separately
👍 Can you check the build? |
@rdhabalia I'll get to review this one soon... |
The bigger question I have is how to take advantage of this, meaning which changes would be needed to "survive" a quorum loss on the data ZK and how to make sure that everything is back in order in every condition. :) Anyway, is this still a big concern, given that apache/zookeeper#157 has been merged and will be available in zk-3.4.10 ? |
Here, the problem which we are trying to fix is: Prevent cold-restart of broker due to dataZK's large snapshot size:
Therefore, if we have separate dataZK and clusterManagementZK then dataZk will not cause broker restart and pulsar-broker can still survive without dataZk.
as dataZk is down, all clients fail while publish/consume and creation of producer/consumer. So, there is no data loss however, when dataZk comes back that's the only signal we have to make sure everything is back in order and probably monitoring such as SLA-monitoring can help to confirm broker's stability.
I think this ZK-patch will help to reduce leader-election time and it will prevent zk-session timeout at broker and also prevent broker-restart. This will definitely help in this problem statement. But, I think it's always good to protect broker against zk-session timeout. |
True, but that's not going to change much, you still cannot survive the quorum loss on clusterManagementZK, plus there need to be significant changes in order to continue operating when loosing dataZk quorum.
The tricky part is how to come back and have everything in sync between broker and ZK quorum. Eg: version numbers for transactions that were caught in-flight during the quorum loss.
The point there is that, by making the leader election not dependent on the snapshot size, the chances of loosing one or the other ZK quorum are not that different. The only difference at that point would be the memory footprint of the quorum and (apart from configuring the JVM heap accordingly), the easiest way to improve the conditions there would be to use binary protobuf for the z-nodes content. Even just doing that for ML and cursors (and at least initially, not for BK ledgers) would considerably reduce the ZK data set size. I would expect the same info to take 1/10th of size in binary vs text format. |
Ok. few more points on it:
I think our main concern is larger footprint causes instability (larger gc) and quorum-loss in zk which impacts broker's cold-restart and it has recursive negative impact on both zk and broker. So, clusterManagementZK will just store the ephemeral-nodes such as bundle-ownership and loadbalancer information, which creates smallest snapshot size and that's the main reason to prevent zk quorum loss.
This should be fix by reading latest zk-data and refreshing the version when broker sees bad-version when system comes back. |
Has anyone estimated the impact on snapshot size for workloads people care about as proposed in #281 (comment)? |
Also, it would be good to record worse case GC pauses in ZK servers. My suspect is that, in a correctly configured machine (no swap, heap large enough for data-set), the GC pauses would be an order of magnitude less than what is needed to cause a zk server to be dropped from quorum. |
Also, the risk of #281, compared to the risk involved in transitioning to the 2 zk quorums is way lower. |
@merlimat, Can we revisit this? We could still use this feature when we setup new cluster. |
…ache#196) * adding check for failures subroutine and fixing worker delete bug * adding license header
@rdhabalia are you still working on this pull request? or can it be closed? |
* Fix consumer not found Signed-off-by: xiaolong.ran <rxl@apache.org> * fix ci error Signed-off-by: xiaolong.ran <rxl@apache.org>
Master issue: apache#138 This PR adds tests for images of apache#195 . In addition, it fixes the local test error when running tests on MacOS. Because the containers cannot connect to the host's KoP or Kafka service listened on 127.0.0.1. * Fix local tests error by listening to site local address * Add tests for Kafka Java clients * Remove the comments
This PR is evidently stale or abandoned. Reopen if this is not so. |
@rdhabalia:Thanks for your contribution. For this PR, do we need to update docs? |
…ration and ledger-metadata separately
Motivation
Pulsar is using a single zookeeper ensemble to store cluster management and ledger information. When the number of topics hosted by Pulsar reaches significantly high ( > 1million), zookeeper data footprint and requests sent to zookeeper increases, especially during cold restart and sometime it affects broker's availability.
Since the data needed to keep the cluster up and running is relatively small (clusterManagementConfiguration) compared to the ledger information, using a separate zookeeper ensemble to store this information will increase the availability of the cluster.
Modifications
Added optional
dataZookeeperServers
to service configuration to introduce separatedataZk
that stores ledger meta-data separately into provideddataZk
.Result