Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Regression] Client crashes as soon as sync dir becomes unavailable or when sync dir is not available on startup #6049

Closed
adroste opened this issue Sep 23, 2017 · 31 comments
Assignees
Labels
macOS p3-medium Normal priority ReadyToTest QA, please validate the fix/enhancement type:bug
Milestone

Comments

@adroste
Copy link

adroste commented Sep 23, 2017

Expected behaviour

If storage becomes unavailable, the client should notice and pause sync.

Actual behaviour

The client crashes.

Steps to reproduce

  1. Sync to folder /x
  2. Unmount /x
  3. Observe ownCloud crashing

Client configuration

Client version: 2.3.3 (8242)
Operating system: macOS Sierra 10.12.6
OS language: English
Installation path of client: Default

@SamuAlfageme SamuAlfageme changed the title Client crashes as soon as storage becomes unavailable [Regression][SQlite] - Client crashes as soon as storage becomes unavailable Sep 25, 2017
@SamuAlfageme SamuAlfageme added this to the 2.4.0 milestone Sep 25, 2017
@guruz
Copy link
Contributor

guruz commented Sep 25, 2017

@michaelstingl @SamuAlfageme If enough people report this, it might mandate a 2.3.4.. let's see.

@SamuAlfageme
Copy link
Contributor

SamuAlfageme commented Sep 25, 2017

Reproduced & confirmed the regression. (e.g. on OS X):

$ sudo bindfs ~/ownCloud /Volumes/OC
$ diskutil unmountDisk force /Volumes/OC

I think there's some kind of delay on the crash-reporter (last update I see was 18h ago)

@valentijnscholten
Copy link

I am seeing similar behaviour on Windows 10 Pro x64 (more similar to #6050 actually).
When the user runs the owncloud client and doesn't have write access to the owncloud folder, the client won't start.

Actually it will crash before generating any logging. Or it will just hang. And then crash silently.

Also the crash reporter doesn't work in this case, it will crash as well. So no crash reports

Workaround is to run as administrator. Or to assign write permissions to the stack folder.

It makes sense the client won't work without write permissions, but it should inform the user about the missing permissions.

@adroste
Copy link
Author

adroste commented Sep 25, 2017

@valentijnscholten This is a little different from my problem. I don't have problem with permissions, but with non existing paths (e.g. external hard drive not mounted) causing the client to crash.

@valentijnscholten
Copy link

valentijnscholten commented Sep 26, 2017

@progmem64 yes, my issue is more similar to #6050 (folder being readonly), but that is closed and points to this issue for follow up.

@ckamm
Copy link
Contributor

ckamm commented Sep 26, 2017

@SamuAlfageme @guruz I can't reproduce on linux. You can't unmount the filesystem with the sync folder while the client is open because it keeps the journal file openend. If I put it on an USB stick and plug it out without unmounting, nothing bad happens. I get Error: "unable to open database file" for "/media/kamm/E9B9-AE9E/test/._sync_dba920715f43.db" and the sync fails with something like Unable to find sync dir

@ckamm ckamm assigned guruz and unassigned ckamm Sep 26, 2017
@ckamm ckamm changed the title [Regression][SQlite] - Client crashes as soon as storage becomes unavailable [Regression][OSX] - Client crashes as soon as storage becomes unavailable Sep 26, 2017
@SamuAlfageme
Copy link
Contributor

@ckamm hmmm.. will try to reproduce on leenux (report in #6050 (comment) was over readonly-FS in Fedora and I assumed the same reason was behind it).

@guruz guruz changed the title [Regression][OSX] - Client crashes as soon as storage becomes unavailable [Regression][macOS][Windows] Client crashes as soon as sync dir becomes unavailable or when sync dir is not available on startup Sep 28, 2017
@SamuAlfageme SamuAlfageme mentioned this issue Oct 2, 2017
9 tasks
@jturcotte
Copy link
Member

I'm trying the force unmount thing on macOS, and I'm getting #3046, which is not a regression (it not really fixed and still appears in our crash reports).
Force unmounting is not a common scenario, and I can't get it to crash if I unmount normally.

So the regression part could be specific to Windows.

@jturcotte
Copy link
Member

Just saw that the bug report was created for macOS so this can't really be specific to Windows. I've been trying to reproduce it with current master though, not 2.3.3.

@SamuAlfageme
Copy link
Contributor

@jturcotte @ckamm then I was wrong to assume #6050 and this one were the same. 😕

@guruz guruz mentioned this issue Oct 2, 2017
1 task
@jturcotte
Copy link
Member

It's also possible that one needs a bit of luck in order to reproduce it, on any of the involved OSes. Getting the OS to unmount the disk at the right moment is quite difficult. I tried a few times on macOS and Windows, but could only get it to crash once out of my 10 tries (and it was #3046, so it's either a different issue, or it's not a regression).

Having a crash report would help figuring out what is happening. @progmem64 Could take note of the time you submit one or many of the crash reports, so that we can fish them out of the crash reporter?

@ogoffart
Copy link
Contributor

On the crash reporter, i see crash because of the

        ENFORCE(_errId == SQLITE_OK, "Error when closing DB");

in SqlDatabase::close .

Maybe we should remove this ENFORCE and let the code recover. (I thin most of the code should be able to recover)
I don't know if this is the actual reason for this particular crash tough.

@ckamm
Copy link
Contributor

ckamm commented Oct 17, 2017

@ogoffart Was the stacktrace at all interesting?

Yes, it seems this could be an ASSERT instead. It doesn't fix the bug, but there's a reasonable chance of things working anyway.

@ogoffart
Copy link
Contributor

https://sentry.io/owncloud/desktop-win-and-mac/issues/370119057/

QtCore in qt_message_fatal
QtCore in QMessageLogger::fatal
libocsync.2.4.0.dylib in OCC::SqlDatabase::close
libocsync.2.4.0.dylib in OCC::SyncJournalDb::sqlFail
libocsync.2.4.0.dylib in OCC::SyncJournalDb::checkConnect
libocsync.2.4.0.dylib in OCC::SyncJournalDb::getSelectiveSyncList
owncloud in OCC::AccountSettings::refreshSelectiveSyncStatus
owncloud in OCC::AccountSettings::AccountSettings
owncloud in OCC::SettingsDialogMac::accountAdded
owncloud in OCC::SettingsDialogMac::SettingsDialogMac

As I said, i am not sure it is indeed the backtrace relative to this one crash.

@guruz
Copy link
Contributor

guruz commented Oct 17, 2017

@ogoffart doesn't look so much like it if it's in accountAdded?
@SamuAlfageme Do you still have the verbose client log?
@progmem64 How were you unmounting? Just by clicking the eject button in Finder? Or unmount in console? Or diskutil eject?

@ogoffart
Copy link
Contributor

ogoffart commented Oct 17, 2017

(@guruz: accountAdded is called on startup when the settingsdialog is created)

@ckamm
Copy link
Contributor

ckamm commented Nov 10, 2017

It might be that #6129 helps with this problem. @progmem64 would you be up for testing the 2.4 nightly?

@guruz guruz modified the milestones: 2.5.0, 2.4.2-maybe Feb 12, 2018
@SamuAlfageme
Copy link
Contributor

Can confirm it still reproduces on 2.4.1 (build 9367)

@guruz
Copy link
Contributor

guruz commented Mar 19, 2018

@SamuAlfageme How did you unmount?

@SamuAlfageme
Copy link
Contributor

@guruz followed the same procedure as in #6049 (comment)

@felixboehm felixboehm added p3-medium Normal priority and removed sev2-high labels May 3, 2018
@guruz
Copy link
Contributor

guruz commented May 14, 2018

I get this on my machine with 2.4 and an idle sync:

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=10, address=0x10cdf6020)
  * frame #0: 0x00007fff5a874f6d libsystem_platform.dylib`_platform_memmove$VARIANT$Haswell + 77
    frame #1: 0x00000001083b8d8e libocsync.0.dylib`walIndexTryHdr + 78
    frame #2: 0x00000001083b89b1 libocsync.0.dylib`walIndexReadHdr + 81
    frame #3: 0x00000001083bfd00 libocsync.0.dylib`sqlite3WalCheckpoint + 288
    frame #4: 0x00000001083bfa61 libocsync.0.dylib`sqlite3WalClose + 193
    frame #5: 0x00000001083b4f34 libocsync.0.dylib`sqlite3PagerClose + 180
    frame #6: 0x00000001083c89ed libocsync.0.dylib`sqlite3BtreeClose + 189
    frame #7: 0x0000000108399d6f libocsync.0.dylib`sqlite3LeaveMutexAndCloseZombie + 159
    frame #8: 0x00000001083a59c4 libocsync.0.dylib`sqlite3Close + 276
    frame #9: 0x00000001083a58a7 libocsync.0.dylib`sqlite3_close + 23
    frame #10: 0x000000010834b34b libocsync.0.dylib`OCC::SqlDatabase::close() + 59
    frame #11: 0x0000000108358038 libocsync.0.dylib`OCC::SyncJournalDb::close() + 984
    frame #12: 0x0000000107413281 libowncloudsync.0.dylib`OCC::SyncEngine::finalize(bool) + 129
    frame #13: 0x00000001074139f8 libowncloudsync.0.dylib`OCC::SyncEngine::startSync() + 1352
    frame #14: 0x0000000107453ba4 libowncloudsync.0.dylib`OCC::SyncEngine::qt_static_metacall(QObject*, Q

This looks to me the same as https://sentry.io/owncloud/desktop-win-and-mac/issues/402884797/ mentioned by @ckamm (Why is it marked as 'resolved' in sentry?)

With master branch (newer sqlite3 version) I don't get the crash so easily, I however get it while downloading a file with a different back trace:

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=10, address=0x10ffdd000)
  * frame #0: 0x00007fff5a870bc1 libsystem_platform.dylib`_platform_memcmp + 33
    frame #1: 0x000000010970affe libocsync.0.dylib`sqlite3WalBeginWriteTransaction + 126
    frame #2: 0x0000000109709db2 libocsync.0.dylib`sqlite3PagerBegin + 226
    frame #3: 0x00000001096e06ae libocsync.0.dylib`sqlite3BtreeBeginTrans + 606
    frame #4: 0x0000000109718fdd libocsync.0.dylib`sqlite3VdbeExec + 16205
    frame #5: 0x00000001096e4076 libocsync.0.dylib`sqlite3Step + 518
    frame #6: 0x00000001096e3cdd libocsync.0.dylib`sqlite3_step + 125
    frame #7: 0x000000010967f3d4 libocsync.0.dylib`OCC::SqlQuery::exec() + 564
    frame #8: 0x0000000109692779 libocsync.0.dylib`OCC::SyncJournalDb::setDownloadInfo(QString const&, OCC::SyncJournalDb::DownloadInfo const&) + 921
    frame #9: 0x0000000108deb856 libowncloudsync.0.dylib`OCC::PropagateDownloadFile::slotGetFinished() + 1222
    frame #10: 0x0000000108df13dc libowncloudsync.0.dylib`QtPrivate::FunctorCall<QtPrivate::IndexesList<>, QtPrivate::List<>, void, void (OCC::PropagateDownloadFile::*)()>::call(void (OCC::PropagateDownloadFile::*)(), OCC::PropagateDownloadFile*, void**) + 140
    frame #11: 0x0000000108df1343 libowncloudsync.0.dylib`void QtPrivate::FunctionPointer<void (OCC::PropagateDownloadFile::*)()>::call<QtPrivate::List<>, void>(void (OCC::PropagateDownloadFile::*)(), OCC::PropagateDownloadFile*, void**) + 99
    frame #12: 0x0000000108df1276 libowncloudsync.0.dylib`QtPrivate::QSlotObject<void (OCC::PropagateDownloadFile::*)(), QtPrivate::List<>, void>::impl(int, QtPrivate::QSlotObjectBase*, QObject*, void**, bool*) + 166
    frame #13: 0x000000010a019f7b QtCore`QMetaObject::activate(QObject*, int, int, void**) + 2347
    frame #14: 0x0000000108e61f02 libowncloudsync.0.dylib`OCC::GETFileJob::finishedSignal() + 34
    frame #15: 0x0000000108e66b6a libowncloudsync.0.dylib`OCC::GETFileJob::finished() + 154

BTW
I also get something like

05-14 20:15:40:652 [ warning default ]:	QIODevice::write (QFile, "/Volumes/SC256MB/syncfolder/.owncloudsync.log"): device not open

or

05-14 20:35:13:312 [ warning sync.networkjob.get ]:	Error while writing to file -1 8192 "Input/output error"

With JOURNAL_MODE=delete I manage to get a a crash only in beginning of sync when creating the tables:

05-14 20:41:33:538 [ warning sync.database ]:	ERROR committing to the database:  "cannot commit - no transaction is active"
05-14 20:41:33:538 [ debug sync.database ]	[ OCC::SyncJournalDb::startTransaction ]:	Database Transaction is running, not starting another one!
05-14 20:41:33:538 [ debug sync.database.sql ]	[ OCC::SqlQuery::exec ]:	SQL exec "CREATE INDEX IF NOT EXISTS blacklist_index ON blacklist(path collate nocase);"
05-14 20:41:33:538 [ warning sync.database.sql ]:	Sqlite exec statement error: 10 "disk I/O error" in "CREATE INDEX IF NOT EXISTS blacklist_index ON blacklist(path collate nocase);"
05-14 20:41:33:538 [ warning sync.database.sql ]:	IOERR extended errcode:  266
05-14 20:41:33:538 [ warning sync.database.sql ]:	IOERR system errno:  5
05-14 20:41:33:538 [ warning sync.database ]:	ERROR committing to the database:  "cannot commit - no transaction is active"
05-14 20:41:33:538 [ warning sync.database ]:	SQL Error "updateErrorBlacklistTableStructure: create index blacklit" "disk I/O error"
05-14 20:41:33:539 [ critical default ]:	ASSERT: "false" in file /Users/guruz/woboq/owncloud/client/mirall/src/common/syncjournaldb.cpp, line 268
05-14 20:41:33:539 [ warning sync.database ]:	Failed to update the database structure!
05-14 20:41:33:539 [ info sync.database ]:	Forcing remote re-discovery by deleting folder Etags
05-14 20:41:33:539 [ warning sync.database.sql ]:	Sqlite prepare statement error: "out of memory" in "UPDATE metadata SET md5='_invalid_' WHERE type=2;"
05-14 20:41:33:539 [ fatal default ]:	ENFORCE: "allow_failure" in file /Users/guruz/woboq/owncloud/client/mirall/src/common/ownsql.cpp, line 269 with message: SQLITE Prepare error

Then while the download it doesn't crash, it just errors 👍 :

05-14 20:45:23:350 [ warning sync.database.sql ]:	Sqlite exec statement error: 10 "disk I/O error" in "INSERT OR REPLACE INTO blacklist (path, lastTryEtag, lastTryModtime, retrycount, errorstring, lastTryTime, ignoreDuration, renameTarget, errorCategory) VALUES ( ?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9)"
05-14 20:45:23:350 [ warning sync.database.sql ]:	IOERR extended errcode:  266
05-14 20:45:23:350 [ warning sync.database.sql ]:	IOERR system errno:  5
05-14 20:45:23:350 [ warning sync.propagator ]:	Could not complete propagation of "default.lay" by OCC::PropagateDownloadFile(0x7fdeee5b9e70) with status 2 and error: "Input/output error"

I'll add a commit that checks for "/Volumes" in sync directory and sets JOURNAL_MODE=delete in code. This should help with most cases..

@guruz
Copy link
Contributor

guruz commented May 14, 2018

PR for macOS in #6526

Do we need something for Windows?
https://sentry.io/owncloud/desktop-win-and-mac/issues/426903988/
https://sentry.io/owncloud/desktop-win-and-mac/issues/130205177/
@ogoffart @ckamm If those are really the same..

@ckamm
Copy link
Contributor

ckamm commented May 15, 2018

Hmm. It's really unfortunate we can make sqlite crash like that. Might a strategically placed file.exists() before interacting with the db help? We could check whether _dbFile exists in checkConnect, instead of relying on _db.isOpen().

ckamm added a commit that referenced this issue May 16, 2018
With WAL mode sqlite seems to occasionally crash when the
underlying filesystem goes away.
ckamm added a commit that referenced this issue May 17, 2018
With WAL mode sqlite seems to occasionally crash when the
underlying filesystem goes away.
ckamm added a commit that referenced this issue May 17, 2018
With WAL mode sqlite seems to occasionally crash when the
underlying filesystem goes away.

(cherry picked from commit b1224cf)
ckamm pushed a commit that referenced this issue May 17, 2018
@ckamm ckamm added the ReadyToTest QA, please validate the fix/enhancement label May 17, 2018
ckamm pushed a commit that referenced this issue May 17, 2018
@SamuAlfageme
Copy link
Contributor

I've been stressing out the client with forced unmounts both on win and macOS at different points in the sync cycle and I don't seem to be able to make it crash. Closing here as resolved 🎉

TheOneRing added a commit that referenced this issue Jul 14, 2022
In context of #6049 a check for the existance of the db file was introduced.
That check is performed before every db access... so serveral 100 times per second.
TheOneRing added a commit that referenced this issue Jul 14, 2022
In context of #6049 a check for the existance of the db file was introduced.
That check is performed before every db access... so serveral 100 times per second.
TheOneRing added a commit that referenced this issue Jul 14, 2022
In context of #6049 a check for the existance of the db file was introduced.
That check is performed before every db access... so several 100 times per second.
TheOneRing added a commit that referenced this issue Jul 22, 2022
In context of #6049 a check for the existance of the db file was introduced.
That check is performed before every db access... so several 100 times per second.
TheOneRing added a commit that referenced this issue Jul 26, 2022
In context of #6049 a check for the existance of the db file was introduced.
That check is performed before every db access... so several 100 times per second.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
macOS p3-medium Normal priority ReadyToTest QA, please validate the fix/enhancement type:bug
Projects
None yet
Development

No branches or pull requests

8 participants