Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 6160: Deadlock removed #12315

Closed
wants to merge 3 commits into from
Closed

Bug 6160: Deadlock removed #12315

wants to merge 3 commits into from

Conversation

obel1x
Copy link

@obel1x obel1x commented Nov 6, 2018

This is a workaround to avoid Deadlocks and duplicate Keys in UPDATE SELECT- Statement as described in #6160 and maybe the cause for #9305 #12228 #6899 #6187

@@ -110,7 +141,7 @@ public function insertIfNotExist($table, $input, array $compare = null) {
}
$query = substr($query, 0, -5);
$query .= ' HAVING COUNT(*) = 0';

*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you remove the whole comment - that's what we have version control for. 😉

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, thanks for the hint. Lack of me beeing new to github. Will do this if we agree if the changes can be done that way. Currently i don't know someone can do it better.

Copy link
Member

@MorrisJobke MorrisJobke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code itself makes a lot of sense and seem to work as stuff that uses those methods (group creation, app config setting, file upload,...) still work fine. 👍

@MorrisJobke MorrisJobke added the 3. to review Waiting for reviews label Nov 6, 2018
@MorrisJobke MorrisJobke added this to the Nextcloud 15 milestone Nov 6, 2018
@MorrisJobke MorrisJobke mentioned this pull request Nov 6, 2018
29 tasks
@nickvergessen
Copy link
Member

Well this just leaves more time for concurrent queries to break it between the select and the insert. Exactly the reason why the method was created and used in first place.

@rullzer
Copy link
Member

rullzer commented Nov 6, 2018

Can't we just wrap it in an transaction?
Also. Why not use the query builder?

@MorrisJobke
Copy link
Member

@nickvergessen @rullzer What do you think about #6899 (comment) ? If you say that this is okay, I would port it by tomorrow so it will be in beta 1.

@obel1x
Copy link
Author

obel1x commented Nov 7, 2018

yep, this is why it was first meant to be a version for testing and to see if it solves the problem in general.
In theory original used query really looks better, but all dbms seem to have large problems with it, not working as expected and even causing duplicate keys sometimes. Sad, but understandable as dmbs are trimmed for integrity and performance.

As for the aspect that there could be a concurrent access doing it that way: In almost any cases only the select is done, while entry is already there. You don't need lock for this. The update is only done once: when a file is created. I don't now how often the same file (of the same user) is created in the same moment twice, but i currently cannot think about the constellation this happens. You could use transaction + locking for this, but that would mean a lot of overhead for a each select. I would expect performance impacts then. I wouldn't do this, while even in worst case you could get duplicate entries and that's it.

So looking at the changes the second time it even improves performance compared to the original a lot while most time file is already there. In this case my changes are only doing selects (that can be speed up). But for the original Update...Select MySql wants to copy the table in temporary table first - each time says the docs. And this not only if the file is created - but for every time the function is used.

So finally original query bring dbms in trouble causing deadlocks and duplicate values and is inefficient either way each time the function is used. I would definitely change it somehow.
So please feel free to make adjustments as you proposed or just not merge and do it otherwise, but keep in mind that users are more interested in (fast) working programs than good looking code.
Btw @MorrisJobke: as said i really disadvice to solve Deadlocks with retrys - they should not happen at all and should be handled with better coding. If you hide them by retrying you loose the ability to solve the real issue. In general, better exception handling is fine.

At last: I have not used query builder, because i don't know what it is. I used text editor (kwrite). But i am new and willing to learn. Can you point me to some explanations to query builder to use for nc?

Nevermind what i am talking to the things above: Very good work you are doing with NC, keep it up!

@kesselb
Copy link
Contributor

kesselb commented Nov 7, 2018

retrywrapper is a nice feature but will not help in this case. The insert query is not locked by another query its locked by itself. This could explain #6899 (comment). (Not true 😕)

if (\OC::$server->getDatabaseConnection()->insertIfNotExist('*PREFIX*filecache', $values, [
should be replaced with something like this because they unique index is there anywhere and most reports for this deadlocks are from oc_filecache.

try {
// do insert
} catch (UniqueConstraintViolationException $e) {
// ignore when already exist
}

@MorrisJobke MorrisJobke added the bug label Nov 7, 2018
@obel1x
Copy link
Author

obel1x commented Nov 8, 2018

retrywrapper is a nice feature but will not help in this case. The insert query is not locked by another query its locked by itself. This could explain #6899 (comment). (Not true confused)

why was the deadlock not caused by itself? for me it seemed so very much as i were the only user that time doing no other things with it.

server/lib/private/Files/Cache/Cache.php

Line 271 in a661f04
if (\OC::$server->getDatabaseConnection()->insertIfNotExist('PREFIXfilecache', $values, [
should be replaced with something like this because they unique index is there anywhere and most reports for this deadlocks are from oc_filecache.

try {
// do insert
} catch (UniqueConstraintViolationException $e) {
// ignore when already exist
}

I saw the function beeing used also in other context, not sure whether there is always an index on the table. Even if so, i wouldn't let the dbms run into execptions first if i could prevent them.

So how do we proceed? As for now i have not seen any negative test results/aspects further, but at least three Issues beeing confirmed to be solved by it and i guess around 10 Issues will be solved further. I would rate this beeing mayor, as users are quiet upset seeing that many errors. All those errors are gone with that merge.

Would you agree merging it if i remove the comments as @MorrisJobke suggested?

@kesselb
Copy link
Contributor

kesselb commented Nov 8, 2018

Would you agree merging it if i remove the comments as @MorrisJobke suggested?

Sure. I'm not against your pr but I think its the wrong way 🙈

@MorrisJobke
Copy link
Member

Well this just leaves more time for concurrent queries to break it between the select and the insert. Exactly the reason why the method was created and used in first place.

I can fully understand this one, because now the time is higher, because it needs to go back to PHP instead of being just a sub select, which has less time, but still the option to cause this. I guess we need to catch an exception here. There is just no further way to eliminate this except in an atomic SQL function, which is not present in all DBs.

So the approach would be to keep this method how it is currently, because theoretically it makes things worse:

  • before: INSERT with SUB-SELECT
  • after: SELECT in PHP and then fire an INSERT

Or do I miss something here? I don't know why it makes something better, but the obvious approach is to additionally to this method call also wrap the method call into a try catch with the exception that is expected (most likely a UniqueConstraintException), but unfortunately you cannot know if this is the case for all queries or should we just do this by default inside the function?

@MorrisJobke
Copy link
Member

I saw the function beeing used also in other context, not sure whether there is always an index on the table. Even if so, i wouldn't let the dbms run into execptions first if i could prevent them.

That is what I wanted to describe above - there is no guarantee (except a single SQL statement itself which is currently not available on all supported platforms) to do this. So triggering and catching the exception is unfortunately the only way to go for now.

@MorrisJobke
Copy link
Member

There are exact 26 usages (over all apps + core - apps == all apps in our app store + the ones I could find on GitHub/GitLab):

bildschirmfoto 2018-11-08 um 10 41 27

@MorrisJobke
Copy link
Member

I guess if we focus on the stuff in apps/ and lib/ we covered most of the critical cases IMO.

@MorrisJobke MorrisJobke mentioned this pull request Nov 8, 2018
24 tasks
@nickvergessen
Copy link
Member

So the approach would be to keep this method how it is currently, because theoretically it makes things worse:

that is exactly my thought too

Removed old Stuff
@obel1x
Copy link
Author

obel1x commented Nov 8, 2018

year, don't get me wrong, i really understand the bad feeling that in theory it worse and thats why i would really also prefer better ways. But as for now i did not see the other approaches beeing better. Again i can only say that causing exeptions for me is really a wrong way imho. And the theoretical timeslot between updates is really very much theory. Think of when this will happen and how often you will see it - same file to be created at the same time from different users? And what would happen then? An exception 👍 Or am i missing anything?

So in practical for me it has even speed up NC while the complexity of update..select at dmbs level to find out, that file is there in 99% is high dbms load.

Well, its up to you to decide. At my work i need to do those nasty decisions every day ;)
I now did the changes. Let me know if i can do anything more.
Hf, Daniel

@MorrisJobke
Copy link
Member

MorrisJobke commented Nov 8, 2018

And the theoretical timeslot between updates is really very much theory. Think of when this will happen and how often you will see it - same file to be created at the same time from different users? And what would happen then? An exception 👍 Or am i missing anything?

It is actually not that theoretical - just look into the linked tickets. And if we are talking about short time slots, then look into this bug here: owncloud/core#24052

That one took me a long time to debug with on a high volume instance. It was basically this:

if($this->isCached($cacheKey)) {
    # here the cached values TTL ended and it got not cached anymore
    $this->getCached($cacheKey); # 💥 
}

Keep in mind that the TTL (time to live) was IIRC 12h. And this one happened on the instance quite often (a handful of times per day). The time between those two entries is far less than a fraction of a millisecond and it hit on time for a 12h timeout to hit this exact spot. Never say: "it is super unlikely" - at some point it will bite you no matter how unlikely it is. So better be safe here and catch the exception to be really bullet proof and not only theoretically safe. 😉

@MorrisJobke MorrisJobke self-assigned this Nov 8, 2018
MorrisJobke added a commit that referenced this pull request Nov 9, 2018
* caused by a concurrect insert that happens between the INSERT and it's sub-SELECT which was there to actually avoid it within insertIfNotExists - sub selects are not atomic and allow this
* see also #12315

Avoids an error that has this stack trace:

```
Doctrine\DBAL\Exception\UniqueConstraintViolationException

An exception occurred while executing 'INSERT INTO "oc_file_locks" ("key","lock","ttl") SELECT ?,?,? FROM "oc_file_locks" WHERE "key" = ? HAVING COUNT(*) = 0' with params ["files/737d52477f1fb583a5bfe5eb33e820da", 1, 1524066628, "files/737d52477f1fb583a5bfe5eb33e820da"]:

SQLSTATE[23505]: Unique violation: 7 ERROR:  duplicate key value violates unique constraint "lock_key_index"
DETAIL:  Key (key)=(files/737d52477f1fb583a5bfe5eb33e820da) already exists.

 #0 3rdparty/doctrine/dbal/lib/Doctrine/DBAL/DBALException.php(128): Doctrine\DBAL\Driver\AbstractPostgreSQLDriver->convertException('An exception oc...', Object(Doctrine\DBAL\Driver\PDOException))
 #1 3rdparty/doctrine/dbal/lib/Doctrine/DBAL/Connection.php(1015): Doctrine\DBAL\DBALException::driverExceptionDuringQuery(Object(Doctrine\DBAL\Driver\PDOPgSql\Driver), Object(Doctrine\DBAL\Driver\PDOException), 'INSERT INTO "oc...', Array)
 #2 lib/private/DB/Connection.php(213): Doctrine\DBAL\Connection->executeUpdate('INSERT INTO "oc...', Array, Array)
 #3 lib/private/DB/Adapter.php(114): OC\DB\Connection->executeUpdate('INSERT INTO "oc...', Array)\n#4 lib/private/DB/Connection.php(251): OC\DB\Adapter->insertIfNotExist('*PREFIX*file_lo...', Array, Array)\n#5 lib/private/Lock/DBLockingProvider.php(118): OC\DB\Connection->insertIfNotExist('*PREFIX*file_lo...', Array, Array)\n#6 lib/private/Lock/DBLockingProvider.php(163): OC\Lock\DBLockingProvider->initLockField('files/737d52477...', 1)
 #7 lib/private/Files/Storage/Common.php(704): OC\Lock\DBLockingProvider->acquireLock('files/737d52477...', 1)
 #8 lib/private/Files/View.php(1931): OC\Files\Storage\Common->acquireLock('files/Documents', 1, Object(OC\Lock\DBLockingProvider))
 #9 lib/private/Files/View.php(2041): OC\Files\View->lockPath('/*******/files/...', 1, false)
 #10 lib/private/Files/Node/Node.php(360): OC\Files\View->lockFile('/*******/files/...', 1)
 #11 apps/files_sharing/lib/Controller/ShareAPIController.php(928): OC\Files\Node\Node->lock(1)
 #12 apps/files_sharing/lib/Controller/ShareAPIController.php(589): OCA\Files_Sharing\Controller\ShareAPIController->lock(Object(OC\Files\Node\Folder))
 #13 [internal function]: OCA\Files_Sharing\Controller\ShareAPIController->getShares('true', 'false', 'false', Object(OC\Files\Node\Folder), 'false')
 #14 lib/private/AppFramework/Http/Dispatcher.php(160): call_user_func_array(Array, Array)
 #15 lib/private/AppFramework/Http/Dispatcher.php(90): OC\AppFramework\Http\Dispatcher->executeController(Object(OCA\Files_Sharing\Controller\ShareAPIController), 'getShares')
 #16 lib/private/AppFramework/App.php(114): OC\AppFramework\Http\Dispatcher->dispatch(Object(OCA\Files_Sharing\Controller\ShareAPIController), 'getShares')
 #17 lib/private/AppFramework/Routing/RouteActionHandler.php(47): OC\AppFramework\App::main('OCA\\Files_Shari...', 'getShares', Object(OC\AppFramework\DependencyInjection\DIContainer), Array)
 #18 [internal function]: OC\AppFramework\Routing\RouteActionHandler->__invoke(Array)
 #19 lib/private/Route/Router.php(299): call_user_func(Object(OC\AppFramework\Routing\RouteActionHandler), Array)
 #20 ocs/v1.php(78): OC\Route\Router->match('/ocsapp/apps/fi...')
 #21 ocs/v2.php(23): require_once('/usr/share/weba...')
 #22 {main}
````

Signed-off-by: Morris Jobke <hey@morrisjobke.de>
@MorrisJobke
Copy link
Member

I started with the catching approach - first one is the issue in the locking: #12366

@MorrisJobke
Copy link
Member

MorrisJobke commented Nov 9, 2018

List to have a look at:

MorrisJobke added a commit that referenced this pull request Nov 9, 2018
* caused by a concurrect insert that happens between the INSERT and it's sub-SELECT which was there to actually avoid it within insertIfNotExists - sub selects are not atomic and allow this
* see also #12315

Avoids an error that has this stack trace:

```
Doctrine\DBAL\Exception\UniqueConstraintViolationException: An exception occurred while executing 'INSERT INTO "oc_filecache" ("mimepart","mimetype","mtime","size","etag","storage_mtime","permissions","checksum","path_hash","path","parent","name","storage") SELECT ?,?,?,?,?,?,?,?,?,?,?,?,? FROM "oc_filecache" WHERE "storage" = ? AND "path_hash" = ? HAVING COUNT(*) = 0' with params [1, 2, 1502911047, -1, "59949a47cabae", 1502911047, 31, "", "fb66dca5f27af6f15c1d1d81e6f8d28b", "files_trashbin", 1742774, "files_trashbin", 136, 136, "fb66dca5f27af6f15c1d1d81e6f8d28b"]:

SQLSTATE[23505]: Unique violation: 7 ERROR: duplicate key value violates unique constraint "fs_storage_path_hash"
DETAIL: Key (storage, path_hash)=(136, fb66dca5f27af6f15c1d1d81e6f8d28b) already exists.

3rdparty/doctrine/dbal/lib/Doctrine/DBAL/DBALException.php - line 128: Doctrine\DBAL\Driver\AbstractPostgreSQLDriver->convertException('An exception oc...', Object(Doctrine\DBAL\Driver\PDOException))
3rdparty/doctrine/dbal/lib/Doctrine/DBAL/Connection.php - line 1015: Doctrine\DBAL\DBALException driverExceptionDuringQuery(Object(Doctrine\DBAL\Driver\PDOPgSql\Driver), Object(Doctrine\DBAL\Driver\PDOException), 'INSERT INTO "oc...', Array)
lib/private/DB/Connection.php - line 213: Doctrine\DBAL\Connection->executeUpdate('INSERT INTO "oc...', Array, Array)
lib/private/DB/Adapter.php - line 114: OC\DB\Connection->executeUpdate('INSERT INTO "oc...', Array)
lib/private/DB/Connection.php - line 251: OC\DB\Adapter->insertIfNotExist('*PREFIX*filecac...', Array, Array)
lib/private/Files/Cache/Cache.php - line 273: OC\DB\Connection->insertIfNotExist('*PREFIX*filecac...', Array, Array)
lib/private/Files/Cache/Cache.php - line 230: OC\Files\Cache\Cache->insert('files_trashbin', Array)
lib/private/Files/Cache/Scanner.php - line 279: OC\Files\Cache\Cache->put('files_trashbin', Array)
lib/private/Files/Cache/Scanner.php - line 216: OC\Files\Cache\Scanner->addToCache('files_trashbin', Array, -1)
lib/private/Files/Cache/Scanner.php - line 175: OC\Files\Cache\Scanner->scanFile('files_trashbin')
lib/private/Files/Cache/Scanner.php - line 322: OC\Files\Cache\Scanner->scanFile('files_trashbin/...', 3, -1, NULL, false)
lib/private/Files/Cache/Updater.php - line 124: OC\Files\Cache\Scanner->scan('files_trashbin/...', false, 3, false)
lib/private/Files/View.php - line 321: OC\Files\Cache\Updater->update('files_trashbin/...', 1502911047)
lib/private/Files/View.php - line 1151: OC\Files\View->writeUpdate(Object(OCA\Files_Trashbin\Storage), 'files_trashbin/...')
lib/private/Files/View.php - line 269: OC\Files\View->basicOperation('mkdir', '/files_trashbin...', Array)
apps/files_trashbin/lib/Trashbin.php - line 154: OC\Files\View->mkdir('files_trashbin/...')
apps/files_trashbin/lib/Trashbin.php - line 225: OCA\Files_Trashbin\Trashbin setUpTrash('klas')
apps/files_trashbin/lib/Storage.php - line 247: OCA\Files_Trashbin\Trashbin move2trash('Photos', false)
apps/files_trashbin/lib/Storage.php - line 189: OCA\Files_Trashbin\Storage->doDelete('files/Photos', 'rmdir')
lib/private/Files/View.php - line 1136: OCA\Files_Trashbin\Storage->rmdir('files/Photos')
lib/private/Files/View.php - line 348: OC\Files\View->basicOperation('rmdir', '/Photos', Array)
apps/dav/lib/Connector/Sabre/Directory.php - line 303: OC\Files\View->rmdir('/Photos')
3rdparty/sabre/dav/lib/DAV/Tree.php - line 179: OCA\DAV\Connector\Sabre\Directory->delete()
3rdparty/sabre/dav/lib/DAV/CorePlugin.php - line 287: Sabre\DAV\Tree->delete('Photos')
[internal function] Sabre\DAV\CorePlugin->httpDelete(Object(Sabre\HTTP\Request), Object(Sabre\HTTP\Response))
3rdparty/sabre/event/lib/EventEmitterTrait.php - line 105: call_user_func_array(Array, Array)
3rdparty/sabre/dav/lib/DAV/Server.php - line 479: Sabre\Event\EventEmitter->emit('method DELETE', Array)
3rdparty/sabre/dav/lib/DAV/Server.php - line 254: Sabre\DAV\Server->invokeMethod(Object(Sabre\HTTP\Request), Object(Sabre\HTTP\Response))
apps/dav/appinfo/v1/webdav.php - line 71: Sabre\DAV\Server->exec()
remote.php - line 162: require_once('/var/www/nextcl...')
{main}
```

Signed-off-by: Morris Jobke <hey@morrisjobke.de>
@MorrisJobke
Copy link
Member

I moved this list to #12369

@MorrisJobke
Copy link
Member

Let's go for #12371. This is basically what @blizzz @nickvergessen and @danielkesselberg suggested as well.

@MorrisJobke MorrisJobke closed this Nov 9, 2018
@MorrisJobke
Copy link
Member

@obel1x Thanks for your contribution anyways. It was a good start and helped me to understand the underlying problem. 👍

You could also check out the other issues - maybe some that are suited for people that are not too familiar with the code base: https://github.com/nextcloud/server/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22

@MorrisJobke MorrisJobke removed this from the Nextcloud 15 milestone Nov 9, 2018
@MorrisJobke MorrisJobke removed their assignment Nov 9, 2018
@MorrisJobke
Copy link
Member

@nickvergessen @blizzz It's possible that we still need this to remove the deadlocks that could also occur, because we have a sub query. 🤔

@MorrisJobke MorrisJobke self-assigned this Nov 9, 2018
@MorrisJobke MorrisJobke reopened this Nov 9, 2018
@MorrisJobke
Copy link
Member

Two big ones are addressed in #12411 (file cache) and #12413 (file locks)

@obel1x obel1x deleted the obel1x-6160 branch November 23, 2018 07:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3. to review Waiting for reviews bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants