Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid taking mutex while calling callback on executor-thread #455

Closed
wants to merge 4 commits into from

Conversation

rdhabalia
Copy link
Contributor

Motivation

As discussed in #434 it should help to fix #434, #404 and hopefully #237

Modifications

Fix possible dead-lock between MockZooKeeper and Caffeine.ConcurrentHashMap (from ZKCache) where 1 thread first takes mutex on MockZk and same thread process callback which eventually hits Caffeine.ConcurrentHashMap and other end 2nd thread first hits Caffeine.ConcurrentHashMap and it tries to get data from MockZk. So, we need to break this deadlock. I will create a PR for MockZk.

@rdhabalia rdhabalia added this to the 1.18 milestone Jun 7, 2017
@rdhabalia rdhabalia self-assigned this Jun 7, 2017
} else if (!parent.isEmpty() && !tree.containsKey(parent)) {
cb.processResult(KeeperException.Code.NONODE.intValue(), path, ctx, null);
} else {
tree.put(path, Pair.create(new String(data), 0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the tree is currently an unsynchronized TreeMap. Do we need to worry about it? I think that was the original need for the mutex.

If we just synchronize across the tree.put(), would that resolve the deadlock here?

@@ -93,6 +94,7 @@ private void init(ExecutorService executor) {
} else {
this.executor = Executors.newFixedThreadPool(1, new DefaultThreadFactory("mock-zookeeper"));
}
this.callbackExecutor = Executors.newFixedThreadPool(4, new DefaultThreadFactory("mock-zookeeper-callback"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A single thread should be enough for the tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept it more than 1 in case in future if callback which will be executed by this thread again calls mockZk then Single-thread-pool won't be able to proceed further.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, maybe a CacheExecutor should work too then

cb.processResult(failReturnCode.intValue(), path, ctx, null);
} else if (stopped) {
cb.processResult(KeeperException.Code.CONNECTIONLOSS.intValue(), path, ctx, null);
} else if (tree.containsKey(path)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here tree is access outside the mutex :)

Maybe we could make it a ConcurrentSkipListMap. In any case that was the original implementation, later I changed it to use mutexes to resolve race conditions in the unit tests...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then should we can avoid synchronized methods in most of the places as well. right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that doesn't resolve the race conditions. In multiple place we are checking the tree map and then performing some updates, and that has to be atomic (eg: the z-node version increment)

Is the deadlock caused by calling the zk callbacks with the mutex taken? If so, we could use a lock object so that we can unlock it before triggering the callback.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as we can see thread-dump in #434, deadlock is caused by zkCallback with mutext taken.

@merlimat
Copy link
Contributor

merlimat commented Jun 8, 2017

@rdhabalia I have tried to make a minimal change: replace synchronized with ReentrantMutex and release the mutex before triggering callbacks, in order to fix the deadlock without changing the locking logic in that class. Sending as a separate PR.

@rdhabalia
Copy link
Contributor Author

Ok. closing this one then,

@rdhabalia rdhabalia closed this Jun 8, 2017
@merlimat
Copy link
Contributor

merlimat commented Jun 8, 2017

@rdhabalia Actually, your PR is working and mine is not :)
I'd rather merge this one right now.

@merlimat merlimat reopened this Jun 8, 2017
@rdhabalia
Copy link
Contributor Author

I think our PR is to prevent deadlock which causes build to be stuck and it seems failure in your PR is due to intermittent test-failure and not due to PR. Restarted build again let's see if it passes this time, else we can merge this one.

@merlimat
Copy link
Contributor

merlimat commented Jun 9, 2017

Merged the other PR. re-closing this one

@merlimat merlimat closed this Jun 9, 2017
@rdhabalia rdhabalia deleted the mockZk branch September 1, 2017 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Travis build is getting stuck for 10min
2 participants