-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid taking mutex while calling callback on executor-thread #455
Conversation
} else if (!parent.isEmpty() && !tree.containsKey(parent)) { | ||
cb.processResult(KeeperException.Code.NONODE.intValue(), path, ctx, null); | ||
} else { | ||
tree.put(path, Pair.create(new String(data), 0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the tree
is currently an unsynchronized TreeMap
. Do we need to worry about it? I think that was the original need for the mutex.
If we just synchronize across the tree.put()
, would that resolve the deadlock here?
@@ -93,6 +94,7 @@ private void init(ExecutorService executor) { | |||
} else { | |||
this.executor = Executors.newFixedThreadPool(1, new DefaultThreadFactory("mock-zookeeper")); | |||
} | |||
this.callbackExecutor = Executors.newFixedThreadPool(4, new DefaultThreadFactory("mock-zookeeper-callback")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A single thread should be enough for the tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept it more than 1 in case in future if callback which will be executed by this thread again calls mockZk then Single-thread-pool won't be able to proceed further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, maybe a CacheExecutor should work too then
cb.processResult(failReturnCode.intValue(), path, ctx, null); | ||
} else if (stopped) { | ||
cb.processResult(KeeperException.Code.CONNECTIONLOSS.intValue(), path, ctx, null); | ||
} else if (tree.containsKey(path)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here tree is access outside the mutex :)
Maybe we could make it a ConcurrentSkipListMap
. In any case that was the original implementation, later I changed it to use mutexes to resolve race conditions in the unit tests...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then should we can avoid synchronized
methods in most of the places as well. right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that doesn't resolve the race conditions. In multiple place we are checking the tree map and then performing some updates, and that has to be atomic (eg: the z-node version increment)
Is the deadlock caused by calling the zk callbacks with the mutex taken? If so, we could use a lock object so that we can unlock it before triggering the callback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, as we can see thread-dump in #434, deadlock is caused by zkCallback with mutext taken.
@rdhabalia I have tried to make a minimal change: replace |
Ok. closing this one then, |
@rdhabalia Actually, your PR is working and mine is not :) |
I think our PR is to prevent deadlock which causes build to be stuck and it seems failure in your PR is due to intermittent test-failure and not due to PR. Restarted build again let's see if it passes this time, else we can merge this one. |
Merged the other PR. re-closing this one |
Motivation
As discussed in #434 it should help to fix #434, #404 and hopefully #237
Modifications
Fix possible dead-lock between MockZooKeeper and
Caffeine.ConcurrentHashMap (from ZKCache)
where 1 thread first takes mutex on MockZk and same thread process callback which eventually hits Caffeine.ConcurrentHashMap and other end 2nd thread first hits Caffeine.ConcurrentHashMap and it tries to get data from MockZk. So, we need to break this deadlock. I will create a PR for MockZk.