Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems when ZK Leader shuts down #308

Closed
sschepens opened this issue Mar 22, 2017 · 4 comments
Closed

Problems when ZK Leader shuts down #308

sschepens opened this issue Mar 22, 2017 · 4 comments
Labels
deprecated/question Questions should happened in GitHub Discussions

Comments

@sschepens
Copy link
Contributor

@merlimat we had to replace our current local ZK leader in a cluster and this seems to cause a LOT of issues in the cluster.
Brokers seem to have shut down all at the same time, leaving the cluster unable to handle traffic until restarted.
Also, a lot of consumers seem to have been reset to a previous moment in time, generating a huge amount of backlog.

We see these logs before the broker apparently shut down:

March 22nd 2017, 15:31:43.156	2017-03-22 18:31:43,155 - INFO  - [main-SendThread(ip-10-64-102-223.ec2.internal:2181):ClientCnxn$SendThread@1158] - Unable to read additional data from server sessionid 0x35a384d476e093b, likely server has closed socket, closing socket connection and attempting reconnect
March 22nd 2017, 15:31:43.258	2017-03-22 18:31:43,258 - INFO  - [main-EventThread:ZooKeeperDataCache@131] - [State:CONNECTED Timeout:30000 sessionid:0x35a384d476e093b local:null remoteserver:null lastZxid:17254516425 xid:1622313 sent:1622313 recv:1791041 queuedpkts:1 pendingresp:0 queuedevents:0] Received ZooKeeper watch event: WatchedEvent state:Disconnected type:None path:null
March 22nd 2017, 15:31:43.258	2017-03-22 18:31:43,258 - WARN  - [main-EventThread:LeaderElectionService$1@111] - Got something wrong on watch: WatchedEvent state:Disconnected type:None path:null
March 22nd 2017, 15:31:43.258	2017-03-22 18:31:43,258 - WARN  - [main-EventThread:LeaderElectionService$1@92] - Type of the event is [None] and path is [null]
March 22nd 2017, 15:31:43.259	2017-03-22 18:31:43,258 - INFO  - [main-EventThread:ZooKeeperDataCache@131] - [State:CONNECTED Timeout:30000 sessionid:0x35a384d476e093b local:null remoteserver:null lastZxid:17254516425 xid:1622313 sent:1622313 recv:1791041 queuedpkts:2 pendingresp:0 queuedevents:0] Received ZooKeeper watch event: WatchedEvent state:Disconnected type:None path:null
March 22nd 2017, 15:31:43.259	2017-03-22 18:31:43,258 - INFO  - [main-EventThread:ZooKeeperDataCache@131] - [State:CONNECTED Timeout:30000 sessionid:0x35a384d476e093b local:null remoteserver:null lastZxid:17254516425 xid:1622313 sent:1622313 recv:1791041 queuedpkts:2 pendingresp:0 queuedevents:0] Received ZooKeeper watch event: WatchedEvent state:Disconnected type:None path:null
March 22nd 2017, 15:31:43.259	2017-03-22 18:31:43,258 - INFO  - [main-EventThread:ZooKeeperDataCache@131] - [State:CONNECTED Timeout:30000 sessionid:0x35a384d476e093b local:null remoteserver:null lastZxid:17254516425 xid:1622313 sent:1622313 recv:1791041 queuedpkts:2 pendingresp:0 queuedevents:0] Received ZooKeeper watch event: WatchedEvent state:Disconnected type:None path:null
March 22nd 2017, 15:31:43.259	2017-03-22 18:31:43,258 - INFO  - [main-EventThread:ZooKeeperDataCache@131] - [State:CONNECTED Timeout:30000 sessionid:0x35a384d476e093b local:null remoteserver:null lastZxid:17254516425 xid:1622313 sent:1622313 recv:1791041 queuedpkts:1 pendingresp:0 queuedevents:0] Received ZooKeeper watch event: WatchedEvent state:Disconnected type:None path:null
March 22nd 2017, 15:31:43.259	2017-03-22 18:31:43,259 - INFO  - [main-EventThread:ZooKeeperDataCache@131] - [State:CONNECTED Timeout:30000 sessionid:0x35a384d476e093b local:null remoteserver:null lastZxid:17254516425 xid:1622313 sent:1622313 recv:1791041 queuedpkts:2 pendingresp:0 queuedevents:0] Received ZooKeeper watch event: WatchedEvent state:Disconnected type:None path:null
March 22nd 2017, 15:31:43.259	2017-03-22 18:31:43,259 - INFO  - [main-EventThread:ZooKeeperSessionWatcher@87] - Received zookeeper notification, eventType=None, eventState=Disconnected
March 22nd 2017, 15:31:43.259	2017-03-22 18:31:43,259 - INFO  - [main-EventThread:ZooKeeperCache@346] - [State:CONNECTED Timeout:30000 sessionid:0x35a384d476e093b local:null remoteserver:null lastZxid:17254516425 xid:1622313 sent:1622313 recv:1791041 queuedpkts:2 pendingresp:0 queuedevents:0] Received ZooKeeper watch event: WatchedEvent state:Disconnected type:None path:null
March 22nd 2017, 15:31:43.587	2017-03-22 18:31:43,586 - INFO  - [main-SendThread(ip-10-64-102-117.ec2.internal:2181):ClientCnxn$SendThread@1032] - Opening socket connection to server ip-10-64-102-117.ec2.internal/10.64.102.117:2181. Will not attempt to authenticate using SASL (unknown error)

A couple of questions:
1 - Why do brokers shutdown on ZK Leader disconnection?
2 - Why could this affect the backlog of consumers? we're running with a branch that persists individualDeletedMessages.

@saandrews
Copy link
Contributor

saandrews commented Mar 23, 2017 via email

@merlimat
Copy link
Contributor

we had to replace our current local ZK leader in a cluster and this seems to cause a LOT of issues in the cluster.

Restarting the leader should not be causing a ZK quorum loss, provided the other 2 (or 4) servers are fine. Can you check the ZK servers logs at the time the leader was taken out? There should be some lines in the logs that tells how much time it took to elect a new leader.

If the leader election takes (overall) more than the session timeout configured in the brokers, the brokers will restart themselves.

The question here is why should the leader election take so much time. Can you share the hardware and ZK config?

Another important factor is the snapshot size. Leader election is dependent on that. If the size of the metadata kept in ZK grows to the 100s of MB it can start slowing down the leader election. Internally, we had a patch to fix that. It has been merged upstream and will be released in ZK 3.4.10 (https://issues.apache.org/jira/browse/ZOOKEEPER-2678).

If leader election timing is the cause in this case, increasing the sessionTimeout in the brokers can make them able to "survive" and reconnect to quorum before the session is expired, thus not needing to restart.

1 - Why do brokers shutdown on ZK Leader disconnection?

The main reason is to preserve the ownership of the topic. If I cannot ensure I have a lock on a particular set of topics, then some other broker will take them over. Second reason is how to reconciliate the state of the metadata between what the broker has in memory and what is in ZK. That can diverge during a session loss because of timeouts and other problems.

We have already been discussing a bit on improving this behavior by having a "degraded" mode in which brokers with no ZK session can continue sending and delivering messages without attempting to do any metadata operations (no new topics, no ledger rollovers..). The design is still up in the air, but it's an area that we surely are going to improve.

2 - Why could this affect the backlog of consumers? we're running with a branch that persists individualDeletedMessages.

Not sure about this one. One possibility is that after the restart, the cursor recovery failed and that triggers a rollback. The logic there is that if we cannot recover a cursor for a permanent error, and thus we don't know the position, we roll it back to the oldest available message to avoid data loss.

Can you share the logs in the topic loading phase after the restart?

@sijie sijie added deprecated/question Questions should happened in GitHub Discussions triage/week-34 labels Aug 22, 2018
@sijie
Copy link
Member

sijie commented May 19, 2019

Close this issue due to inactivities. Please reopen it if there are more things we should be looking into it.

@snowcrumble
Copy link
Contributor

The question here is why should the leader election take so much time. Can you share the hardware and ZK config?

My zk elect time is about 10 seconds, is that reasonable?

zk-0 is new follower
zk-1 is old leader which was on the crashed node
zk2 is new leader

below is the logs:
zk-0.log

04:51:47.726 [QuorumPeer[myid=1]/0.0.0.0:2181] INFO  org.apache.zookeeper.server.quorum.QuorumPeer - FOLLOWING
04:51:47.727 [QuorumPeer[myid=1]/0.0.0.0:2181] INFO  org.apache.zookeeper.server.ZooKeeperServer - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir data/zookeeper/version-2 snapdir data/zookeeper/version-2
04:51:47.727 [QuorumPeer[myid=1]/0.0.0.0:2181] INFO  org.apache.zookeeper.server.quorum.Learner - FOLLOWING - LEADER ELECTION TOOK - 6405
04:51:47.728 [QuorumPeer[myid=1]/0.0.0.0:2181] INFO  org.apache.zookeeper.server.quorum.QuorumPeer - Resolved hostname: zk-2.zookeeper.pulsar.svc.cluster.local to address: zk-2.zookeeper.pulsar.svc.cluster.local/10.30.2.2
04:51:47.740 [QuorumPeer[myid=1]/0.0.0.0:2181] INFO  org.apache.zookeeper.server.quorum.Learner - Getting a diff from the leader 0x100024c1c

zk-2.log

04:51:47.727 [QuorumPeer[myid=3]/0.0.0.0:2181] INFO  org.apache.zookeeper.server.quorum.QuorumPeer - LEADING
04:51:47.727 [QuorumPeer[myid=3]/0.0.0.0:2181] INFO  org.apache.zookeeper.server.ZooKeeperServer - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir data/zookeeper/version-2 snapdir data/zookeeper/version-2
04:51:47.727 [QuorumPeer[myid=3]/0.0.0.0:2181] INFO  org.apache.zookeeper.server.quorum.Leader - LEADING - LEADER ELECTION TOOK - 10421
04:51:47.734 [LearnerHandler-/10.30.2.141:43982] INFO  org.apache.zookeeper.server.quorum.LearnerHandler - Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@6fe50a68

dlg99 pushed a commit to dlg99/pulsar that referenced this issue Aug 26, 2024
This function was recently changed causing a failure in broker metrics.

(cherry picked from commit 751aea3)
lhotari pushed a commit that referenced this issue Aug 26, 2024
Co-authored-by: Paul Gier <paul.gier@datastax.com>
lhotari pushed a commit that referenced this issue Aug 28, 2024
Co-authored-by: Paul Gier <paul.gier@datastax.com>
(cherry picked from commit cd3519a)
lhotari pushed a commit that referenced this issue Aug 28, 2024
Co-authored-by: Paul Gier <paul.gier@datastax.com>
(cherry picked from commit cd3519a)
grssam pushed a commit to grssam/pulsar that referenced this issue Sep 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deprecated/question Questions should happened in GitHub Discussions
Projects
None yet
Development

No branches or pull requests

5 participants