-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle the new session after session expiry #770
Conversation
Pull latest
datastream-server/src/main/java/com/linkedin/datastream/server/zk/ZkAdapter.java
Show resolved
Hide resolved
datastream-server/src/main/java/com/linkedin/datastream/server/Coordinator.java
Outdated
Show resolved
Hide resolved
datastream-server/src/main/java/com/linkedin/datastream/server/zk/ZkAdapter.java
Outdated
Show resolved
Hide resolved
datastream-server/src/main/java/com/linkedin/datastream/server/zk/ZkAdapter.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recall we had a discussion earlier where we said that when a node expires, and then reconnects, it may need to update the locks it used to hold (if it still has the same tasks) to indicate that it's the new live instance, right? Should this fix be addressed here?
The PR where we discussed this: #747
Look for this comment:
"Quick question, if a Session expiry happens, the _instanceName remains the same? Just wondering if we could have a case where we're trying to release the lock but an expiry + connect happened before we call this, creating a new liveinstance node for this host. Will the task still be releasable?"
Not sure if such a fix is required, but it'll be great if you can validate and explain why or why not.
datastream-server/src/main/java/com/linkedin/datastream/server/zk/ZkAdapter.java
Outdated
Show resolved
Hide resolved
datastream-server/src/main/java/com/linkedin/datastream/server/zk/ZkAdapter.java
Outdated
Show resolved
Hide resolved
@vmaheshw can you please look at this comment and leave a response about whether this is a concern or not. If it is, please address it. If it is not, please explain why not. I just want to ensure that there is no weird race conditions that we need to think about here even though from what I understand this shouldn't be a problem. |
Sorry, I forgot to reply. Yes, if the expiry+connect happened, before trying to release the lock, it will get the error "Not the owner" (As the owner was previous instance), then the task will move to some other instance and that instance while trying to acquire lock, will find this as orphan lock and force acquire it. |
This is the final change to handle new session after session expiry. In this change, we have re-initialized all the local states, listeners, event threads and made the node re-join the cluster.
This is the final change to handle new session after session expiry. In this change, we have re-initialized all the local states, listeners, event threads and made the node re-join the cluster.