Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible deadlock when registering extensions and doing normal readWrites #371

Closed
cipherCOM opened this issue Oct 7, 2016 · 5 comments
Closed

Comments

@cipherCOM
Copy link

We're currently experiencing rare deadlocks with YapDatabase. I tried to inspect what internally happens and came to the following reasoning:

A normal readWrite will always go to through the connectionQueue first and then through the internal writeQueue. A comment is placed even directly above that the order matters.

// Order matters.
// First go through the serial connection queue.
// Then go through serial write queue for the database.
// Once we're inside the database writeQueue, we know that we are the only write transaction.
// No other transaction can possibly modify the database except us, even in other connections.
_dispatch_barrier_sync_f_slow + 814,
__44-[YapDatabaseConnection readWriteWithBlock:]_block_invoke + 777, ### 2) writeQueue
_dispatch_client_callout + 8,
_dispatch_barrier_sync_f_slow + 1347,
-[YapDatabaseConnection readWriteWithBlock:] + 367, ### 1) connectionQueue

The issue is that when using extension and registering them the framework internally goes a different way of acquiring locks. First the writeQueue and then the connectionQueue. This causes in my opinion the experienced locking problems if this happens simultaneously.

_dispatch_barrier_sync_f_slow + 814,
-[YapDatabaseConnection registerExtension:withName:] + 653, ### 2) connectionQueue
-[YapDatabase _registerExtension:withName:connection:] + 1958,
__90-[YapDatabase asyncRegisterExtension:withName:connection:completionQueue:completionBlock:]_block_invoke + 82, ### 1) writeQueue

Would that really be a flaw of the framework or should we rather try to completely separate setup steps like registering extensions and accessing the DB?

@chrisballinger
Copy link
Contributor

Have you tried passing your own connection to registerExtension?

On Fri, Oct 7, 2016 at 1:41 AM, cipherCOM notifications@github.com wrote:

We're currently experiencing rare deadlocks with YapDatabase. I tried to
inspect what internally happens and came to the following reasoning:

A normal readWrite will always go to through the connectionQueue first and
then through the internal writeQueue. A comment is placed even directly
above that the order matters.

// Order matters.
// First go through the serial connection queue.
// Then go through serial write queue for the database.
// Once we're inside the database writeQueue, we know that we are the only write transaction.
// No other transaction can possibly modify the database except us, even in other connections.

_dispatch_barrier_sync_f_slow + 814,
__44-[YapDatabaseConnection readWriteWithBlock:]_block_invoke + 777, ### 2) writeQueue
_dispatch_client_callout + 8,
_dispatch_barrier_sync_f_slow + 1347,
-[YapDatabaseConnection readWriteWithBlock:] + 367, ### 1) connectionQueue

The issue is that when using extension and registering them the framework
internally goes a different way of acquiring locks. First the writeQueue
and then the connectionQueue. This causes in my opinion the experienced
locking problems if this happens simultaneously.

_dispatch_barrier_sync_f_slow + 814,
-[YapDatabaseConnection registerExtension:withName:] + 653, ### 2) connectionQueue
-[YapDatabase _registerExtension:withName:connection:] + 1958,
__90-[YapDatabase asyncRegisterExtension:withName:connection:completionQueue:completionBlock:]_block_invoke + 82, ### 1) writeQueue

Would that really be a flaw of the framework or should we rather try to
completely separate setup steps like registering extensions and accessing
the DB?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#371, or mute the thread
https://github.com/notifications/unsubscribe-auth/AAfqH3tPg8EP4LeZRbZx1h5ywlEaMV_3ks5qxgXRgaJpZM4KQzJe
.

@cipherCOM
Copy link
Author

I inspected the API you proposed and this let me realize, that the connectionQueues I posted above must be different ones since internally for extensions the registrationConnection is used and for read write access our external connection.
Now I can't quite put my finger down to what causes the deadlock anymore. The stack trace states it clearly that internal methods remain at the top for the main thread and one background thread, but the reason is a bit unclear.
I will try to reproduce it once more and report back with a more complete stack trace and some object pointers to more clearly state what is clogging each other.

@cipherCOM
Copy link
Author

Okay got it tracked down. So here is what's happening:

The problem would be that we actually pass our connection to registerExtension like @chrisballinger proposed. As we're registering multiple extensions we can reach this deadlock because we have a few synchronous readWrite metadata operations that can happen on the main thread. So basically while registering extensions it can happen that we want synchronous access to the DB from the main thread. Because the same connection is used the order of lock acquisition between the two methods actually bites us.

main thread:
-[YapDatabaseConnection readWriteWithBlock:]
    dispatch_sync(connectionQueue, ^{
->  dispatch_sync(database->writeQueue, ^{ @autoreleasepool {

(lldb) po connectionQueue
<OS_dispatch_queue: YapDatabaseConnection[0x7fcd81fa63d0] = { xrefcnt = 0x1, refcnt = 0x1, suspend_cnt = 0x1, locked = 0, target = com.apple.root.default-qos.overcommit[0x10eda9dc0], width = 0x0, running = 0x1, barrier = 0 }>

2   libdispatch.dylib                   0x0000000112b97a35 _dispatch_barrier_sync_f_slow + 814,
3   XXXXXXXXX                           0x000000010dcacc49 __44-[YapDatabaseConnection readWriteWithBlock:]_block_invoke + 777,
4   libdispatch.dylib                   0x0000000112bb2614 _dispatch_client_callout + 8,
5   libdispatch.dylib                   0x0000000112b98002 _dispatch_barrier_sync_f_invoke + 365,
6   XXXXXXXXX                           0x000000010dcac8ff -[YapDatabaseConnection readWriteWithBlock:] + 367,
...
39  XXXXXXXXX                           0x000000010d6d1997 main + 151,
background thread:
    dispatch_async(writeQueue, ^{ @autoreleasepool {
->  dispatch_sync(connectionQueue, ^{ @autoreleasepool {

(lldb) po connectionQueue
<OS_dispatch_queue: YapDatabaseConnection[0x7fcd81fa63d0] = { xrefcnt = 0x1, refcnt = 0x1, suspend_cnt = 0x1, locked = 0, target = com.apple.root.default-qos.overcommit[0x10eda9dc0], width = 0x0, running = 0x1, barrier = 0 }>

2   libdispatch.dylib                   0x0000000112b97a35 _dispatch_barrier_sync_f_slow + 814,
3   XXXXXXXXX                           0x000000010dcbfd5d -[YapDatabaseConnection registerExtension:withName:] + 653,
4   XXXXXXXXX                           0x000000010dc726f6 -[YapDatabase _registerExtension:withName:connection:] + 1958,
5   XXXXXXXXX                           0x000000010dc71042 __90-[YapDatabase asyncRegisterExtension:withName:connection:completionQueue:completionBlock:]_block_invoke + 82,
6   libdispatch.dylib                   0x0000000112b93186 _dispatch_call_block_and_release + 12,
7   libdispatch.dylib                   0x0000000112bb2614 _dispatch_client_callout + 8,
8   libdispatch.dylib                   0x0000000112b996a7 _dispatch_queue_drain + 2176,
9   libdispatch.dylib                   0x0000000112b98cc0 _dispatch_queue_invoke + 235,
10  libdispatch.dylib                   0x0000000112b9c3b9 _dispatch_root_queue_drain + 1359,
11  libdispatch.dylib                   0x0000000112b9db17 _dispatch_worker_thread3 + 111,
12  libsystem_pthread.dylib             0x0000000112f204de _pthread_wqthread + 1129,
13  libsystem_pthread.dylib             0x0000000112f1e341 start_wqthread + 13

With this knowledge I'm not passing our connection to the registerExtension methods anymore so the internal registrationConnection is used. Now the deadlock doesn't occur anymore because the only shared queue is the writeQueue.

Nevertheless the question remains: should the registerExtension or readWrite methods be changed to ensure the same order of lock acquisition?

robbiehanson added a commit that referenced this issue Jul 18, 2017
…n registration. This is no longer supported. In its place is a way of passing a connection 'config', which allows one to tweak the memory options of the internal database connection. This was the original motivation for the functionality (performance optimizations). Fixes issue #371
@robbiehanson
Copy link
Contributor

robbiehanson commented Jul 18, 2017

You're spot on with your deadlock assessment. Simply put, passing a custom connection to the registerExtension family of methods is unsafe. And there doesn't appear to be a safe way to fix this particular issue.

At the heart of the matter is the order in which things are expected to occur when starting up the database system. Generally, people will do something like this:

database = [[YapDatabase alloc] init...];
// async register extension #1
// async register extension #2
// ...
// continue with app launch

When using an internal database connection for the registration connection, everything works nicely:

  • the async registration processes are both queued via dispatch_async(writeQueue, ...)
  • this ensures that any write the app attempts to perform will occur AFTER the extension registrations
  • which, in turn, ensures the extensions remain in proper working order

However, if passing a custom connection, the deadlock issue appears. Fixing the deadlock issue would create a separate bug:

  • the async registration processes are both queued via dispatch_async(connectionQueue, ...)
  • this only postpones write attempts performed on the same connection
  • which means, the app could very easily perform writes to the database before all the extensions are up-and-running
  • which could cause all kinds of chaos

So I've chosen to remove the functionality altogether. A custom connection can no longer be passed to the registerExtension family of methods.

However, the original motivation for the feature had to do with performance optimizations. In particular, things like registering/updating a YapDatabaseView could cause a fair amount of thrashing if the objectCacheLimit is set too low. And so, to replace the previous functionality, the methods now allow you to pass an (optional) YapDatabaseConnectionConfig object.

This allows you to pass memory settings, such as objectCacheLimit. Which achieves the original performance motivations.

@cipherCOM
Copy link
Author

Thanks that you responded to the issue and solved it. I will have to check if this new API can be adapted by us (can't quite remember the context now) but I can totally understand that this is the way to go. Oh and kudos for the always detailed answers / documentation by your team. Very much appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants