-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems when ZK Leader shuts down #308
Comments
Did you check how long did the zookeeper take to elect the new leader? Most
likely it took longer and brokers lost their session and shut them down.
Did you notice if the followers were lagging behind or it's snapshot is
corrupt/incomplete before you took down the leader?
…On Wed, Mar 22, 2017 at 12:46 PM, sschepens ***@***.***> wrote:
@merlimat <https://github.com/merlimat> we had to replace our current
local ZK leader in a cluster and this seems to cause a LOT of issues in the
cluster.
Brokers seem to have shut down all at the same time, leaving the cluster
unable to handle traffic until restarted.
Also, a lot of consumers seem to have been reset to a previous moment in
time, generating a huge amount of backlog.
We see these logs before the broker apparently shut down:
March 22nd 2017, 15:31:43.156 2017-03-22 18:31:43,155 - INFO - ***@***.*** - Unable to read additional data from server sessionid 0x35a384d476e093b, likely server has closed socket, closing socket connection and attempting reconnect
March 22nd 2017, 15:31:43.258 2017-03-22 18:31:43,258 - INFO - ***@***.*** - [State:CONNECTED Timeout:30000 sessionid:0x35a384d476e093b local:null remoteserver:null lastZxid:17254516425 <(725)%20451-6425> xid:1622313 sent:1622313 recv:1791041 queuedpkts:1 pendingresp:0 queuedevents:0] Received ZooKeeper watch event: WatchedEvent state:Disconnected type:None path:null
March 22nd 2017, 15:31:43.258 2017-03-22 18:31:43,258 - WARN - ***@***.*** - Got something wrong on watch: WatchedEvent state:Disconnected type:None path:null
March 22nd 2017, 15:31:43.258 2017-03-22 18:31:43,258 - WARN - ***@***.*** - Type of the event is [None] and path is [null]
March 22nd 2017, 15:31:43.259 2017-03-22 18:31:43,258 - INFO - ***@***.*** - [State:CONNECTED Timeout:30000 sessionid:0x35a384d476e093b local:null remoteserver:null lastZxid:17254516425 <(725)%20451-6425> xid:1622313 sent:1622313 recv:1791041 queuedpkts:2 pendingresp:0 queuedevents:0] Received ZooKeeper watch event: WatchedEvent state:Disconnected type:None path:null
March 22nd 2017, 15:31:43.259 2017-03-22 18:31:43,258 - INFO - ***@***.*** - [State:CONNECTED Timeout:30000 sessionid:0x35a384d476e093b local:null remoteserver:null lastZxid:17254516425 <(725)%20451-6425> xid:1622313 sent:1622313 recv:1791041 queuedpkts:2 pendingresp:0 queuedevents:0] Received ZooKeeper watch event: WatchedEvent state:Disconnected type:None path:null
March 22nd 2017, 15:31:43.259 2017-03-22 18:31:43,258 - INFO - ***@***.*** - [State:CONNECTED Timeout:30000 sessionid:0x35a384d476e093b local:null remoteserver:null lastZxid:17254516425 <(725)%20451-6425> xid:1622313 sent:1622313 recv:1791041 queuedpkts:2 pendingresp:0 queuedevents:0] Received ZooKeeper watch event: WatchedEvent state:Disconnected type:None path:null
March 22nd 2017, 15:31:43.259 2017-03-22 18:31:43,258 - INFO - ***@***.*** - [State:CONNECTED Timeout:30000 sessionid:0x35a384d476e093b local:null remoteserver:null lastZxid:17254516425 <(725)%20451-6425> xid:1622313 sent:1622313 recv:1791041 queuedpkts:1 pendingresp:0 queuedevents:0] Received ZooKeeper watch event: WatchedEvent state:Disconnected type:None path:null
March 22nd 2017, 15:31:43.259 2017-03-22 18:31:43,259 - INFO - ***@***.*** - [State:CONNECTED Timeout:30000 sessionid:0x35a384d476e093b local:null remoteserver:null lastZxid:17254516425 <(725)%20451-6425> xid:1622313 sent:1622313 recv:1791041 queuedpkts:2 pendingresp:0 queuedevents:0] Received ZooKeeper watch event: WatchedEvent state:Disconnected type:None path:null
March 22nd 2017, 15:31:43.259 2017-03-22 18:31:43,259 - INFO - ***@***.*** - Received zookeeper notification, eventType=None, eventState=Disconnected
March 22nd 2017, 15:31:43.259 2017-03-22 18:31:43,259 - INFO - ***@***.*** - [State:CONNECTED Timeout:30000 sessionid:0x35a384d476e093b local:null remoteserver:null lastZxid:17254516425 <(725)%20451-6425> xid:1622313 sent:1622313 recv:1791041 queuedpkts:2 pendingresp:0 queuedevents:0] Received ZooKeeper watch event: WatchedEvent state:Disconnected type:None path:null
March 22nd 2017, 15:31:43.587 2017-03-22 18:31:43,586 - INFO - ***@***.*** - Opening socket connection to server ip-10-64-102-117.ec2.internal/10.64.102.117:2181. Will not attempt to authenticate using SASL (unknown error)
A couple of questions:
1 - Why do brokers shutdown on ZK Leader disconnection?
2 - Why could this affect the backlog of consumers? we're running with a
branch that persists individualDeletedMessages.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#308>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/ATk4LBfklfMvdfwW1R67bVM-F_pJHPCsks5roXqIgaJpZM4Mlump>
.
|
Restarting the leader should not be causing a ZK quorum loss, provided the other 2 (or 4) servers are fine. Can you check the ZK servers logs at the time the leader was taken out? There should be some lines in the logs that tells how much time it took to elect a new leader. If the leader election takes (overall) more than the session timeout configured in the brokers, the brokers will restart themselves. The question here is why should the leader election take so much time. Can you share the hardware and ZK config? Another important factor is the snapshot size. Leader election is dependent on that. If the size of the metadata kept in ZK grows to the 100s of MB it can start slowing down the leader election. Internally, we had a patch to fix that. It has been merged upstream and will be released in ZK 3.4.10 (https://issues.apache.org/jira/browse/ZOOKEEPER-2678). If leader election timing is the cause in this case, increasing the
The main reason is to preserve the ownership of the topic. If I cannot ensure I have a lock on a particular set of topics, then some other broker will take them over. Second reason is how to reconciliate the state of the metadata between what the broker has in memory and what is in ZK. That can diverge during a session loss because of timeouts and other problems. We have already been discussing a bit on improving this behavior by having a "degraded" mode in which brokers with no ZK session can continue sending and delivering messages without attempting to do any metadata operations (no new topics, no ledger rollovers..). The design is still up in the air, but it's an area that we surely are going to improve.
Not sure about this one. One possibility is that after the restart, the cursor recovery failed and that triggers a rollback. The logic there is that if we cannot recover a cursor for a permanent error, and thus we don't know the position, we roll it back to the oldest available message to avoid data loss. Can you share the logs in the topic loading phase after the restart? |
Close this issue due to inactivities. Please reopen it if there are more things we should be looking into it. |
My zk elect time is about 10 seconds, is that reasonable? zk-0 is new follower below is the logs:
zk-2.log
|
This function was recently changed causing a failure in broker metrics. (cherry picked from commit 751aea3)
…he#23227) Co-authored-by: Paul Gier <paul.gier@datastax.com>
@merlimat we had to replace our current local ZK leader in a cluster and this seems to cause a LOT of issues in the cluster.
Brokers seem to have shut down all at the same time, leaving the cluster unable to handle traffic until restarted.
Also, a lot of consumers seem to have been reset to a previous moment in time, generating a huge amount of backlog.
We see these logs before the broker apparently shut down:
A couple of questions:
1 - Why do brokers shutdown on ZK Leader disconnection?
2 - Why could this affect the backlog of consumers? we're running with a branch that persists individualDeletedMessages.
The text was updated successfully, but these errors were encountered: