-
Notifications
You must be signed in to change notification settings - Fork 7.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZOOKEEPER-4508: Fix endless-loop in ClientCnxn.SendThread.run when all zk servers down #1847
ZOOKEEPER-4508: Fix endless-loop in ClientCnxn.SendThread.run when all zk servers down #1847
Conversation
8f8910e
to
905d56a
Compare
905d56a
to
6bd083f
Compare
Another user reported this in ZOOKEEPER-4692. Ping @eolivelli @tisonkun @symat @maoling @cnauroth for review. |
…l zk servers down The observable behavior is that client will not get expired event from watcher. The cause is twofold: 1. `updateLastSendAndHeard` is called in reconnection so the session will not timeout. 2. No `break` after session timeout in `ClientCnxn.SendThread.run`.
6bd083f
to
d741b0a
Compare
hi,kezhu. can you explain this bug in detail or give reproduce step? thanks. I am confusing. |
@@ -1192,7 +1192,6 @@ public void run() { | |||
startConnect(serverAddress); | |||
// Update now to start the connection timer right after we make a connection attempt | |||
clientCnxnSocket.updateNow(); | |||
clientCnxnSocket.updateLastSendAndHeard(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you removing updateLastSendAndHeard ? (here and there)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Semantically, it is because we are not heard(lastHeard
) anything here and there. If we update lastHeard
in these two places, then getIdleRecv
will be reset to 0 in every re-connect which will cause no SessionTimeoutException
.
For lastSend
, I think it does not matter as it is only used for ping
in CONNECTED
state after successful ConnectRequest
which will updateLastSend
. I don't see a reason for updateLastSend
in these two place.
@RabbitDong-on See
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After digging into the code history, I am doubt whether ZooKeeper tried to support session expiration base on sole client side timing. But it is indeed a problem if client could not decide to expire a session on its own when it is not able to contact a server.
@@ -1233,13 +1241,20 @@ public void run() { | |||
to = connectTimeout - clientCnxnSocket.getIdleRecv(); | |||
} | |||
|
|||
if (to <= 0) { | |||
if (expirationTimeout - clientCnxnSocket.getIdleRecv() <= 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
THREAD_SAFETY_VIOLATION: Read/Write race. Non-private method ClientCnxn$SendThread.run()
reads without synchronization from this.this$0.expirationTimeout
. Potentially races with write in method ClientCnxn$SendThread.onConnected(...)
.
Reporting because this access may occur on a background thread.
ℹ️ Expand to see all @sonatype-lift commands
You can reply with the following commands. For example, reply with @sonatype-lift ignoreall to leave out all findings.
Command | Usage |
---|---|
@sonatype-lift ignore |
Leave out the above finding from this PR |
@sonatype-lift ignoreall |
Leave out all the existing findings from this PR |
@sonatype-lift exclude <file|issue|path|tool> |
Exclude specified file|issue|path|tool from Lift findings by updating your config.toml file |
Note: When talking to LiftBot, you need to refresh the page to see its response.
Click here to add LiftBot to another repo.
Superceded by #2058 which propose a client side session expiration timeout formally. |
The observable behavior is that client will not get expired event from watcher.
The cause is twofold:
updateLastSendAndHeard
is called in reconnection so the sessionwill not timeout.
break
after session timeout inClientCnxn.SendThread.run
.