Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] #2135

Open
sharadnair opened this issue Jan 11, 2022 · 8 comments
Open

[BUG] #2135

sharadnair opened this issue Jan 11, 2022 · 8 comments
Labels

Comments

@sharadnair
Copy link

Version
5.0.10

We are using LiteDB in a very automated environment collecting information from hardware and storing it in LiteDb. This is a multithreaded application with multiple threads opening, writing/reading and closing LiteDb connection frequently. The database is opened in a "Shared" mode and we use locks in our code for thread safety. For the most part, everything works perfectly, but occasionally we noticed the database access is getting locked and the application hangs, there is no option apart from restarting the application.
After reviewing the dump files, we managed to identify the issue to be with
mutex.WaitOne() call within the LiteDb implementation.

image

This call for some reason never returns and the database gets locked. Restarting the application releases this lock and the program is able to continue

This is call stack we get
System.Threading.WaitHandle.WaitOneNative
System.Threading.WaitHandle.InternalWaitOne
System.Threading.WaitHandle.WaitOne
System.Threading.WaitHandle.WaitOne
LiteDB.SharedEngine.OpenDatabase
LiteDB.SharedEngine.Insert
LiteDB.LiteCollection<System.__Canon>.Insert
Emydex.DCI.Service.Common.Repository.DebugLogRepository.Store

The frustrating part is it works 98% of the time making it difficult to identify the cause of this block and with so much activity in production it is nearly impossible to reproduce the same behaviour in our test lab.

Sending the logs and information to see if anyone can spot or provide some help in identifying why the database could be locked occasionally.

We would prefer to use the "direct" mode but since this is a heavily multithreaded application using direct mode throws the exception "File is in use exception". Is the solution to this is using a Shared (singleton) instance of the database instead of opening and closing each time ?

2022-10-1--13-23-07DiagnosticToolDump.txt

@sharadnair sharadnair added the bug label Jan 11, 2022
@dethknite
Copy link

dethknite commented Jan 12, 2022

As a workaround.. perhaps adding a queuing mechanism/function to the DB writes is possible. I know I do something similar and then write the queued records in bulk add commands (more efficient and faster in a single access to db as well). It could even happen every few seconds.. just enough to eliminate deadlock scenarios and not be as i/o intensive (wasteful i/o).

Doing such would effectively have one thread writing to the DB... and all the other threads queuing up records.

//Where bulkRecords is a List bulkRecords
col.InsertBulk(bulkRecords);

@sharadnair
Copy link
Author

Thanks for the quick reply. The information is already in memory and the writes to the database is happening on a separate timer thread but the issue is still happening. To narrow it down we recently split the event logs to a separate database and collection and this database is now only accessed by a single thread but we still are getting the database is locked issue intermittently which is very surprising. We are currently opening and closing the database each time we have to write the information from the background timer thread so the next step is to make sure of singleton pattern and leave the database open to see if we can get around this issue.
Found it very strange that even with one single database and one thread accessing it we are still experiencing the issue with mutex.WaitOne.
Is there any harm in opening the database at the start of the application and writing/reading without opening/closing for each read/write ?

@dethknite
Copy link

dethknite commented Jan 12, 2022

Sounds like you already have it setup to do bulk insert since it is on a single thread and all in memory. As for opening the DB once at the beginning.. you can do this, though for me and decreasing locking issues, having a new connection for each function I am calling (add record, add bulk records, upgrade db, validating db, query xxx, get first record, get last record, get record count, deleting records, deleting old records quickly, truncating, rebuilding, etc.) seemed to be ideal as I could specify the kind of connection (read shared, read/write shared, etc), and I want things to all play nicely together.

TBH, it is hard to tell where the deadlock is occurring from said exception posted. It obviously bubbles up to the Opening
a LiteDB connection (LiteDB.SharedEngine.OpenDatabase), but following it up it is the mutex thread that is locking things. Using WaitOne is notorious for being used incorrectly and locking the main thread. I still encounter a thread issue now and then that makes it in a release and don't find out for a month until a user submits an error. Troubleshooting deadlocks is a pain. I always find this resource helpful on reviewing logic and potential deadlocks. (http://blog.stephencleary.com/2012/07/dont-block-on-async-code.html)

@dethknite
Copy link

dethknite commented Jan 12, 2022

I just had an additional thought (since it sounds like you have a single thread opening a LiteDB connection). How are you opening the connection? I would recommend wrapping it in a using() statement... to ensure it is closed and disposed on completion, such as:

using (var db = new LiteDatabase(connReadWriteShared))
{
//Do what you want to such as get collection
var col = db.GetCollection("collName");
}

@sharadnair
Copy link
Author

Currently, we are opening and closing the database each time
` public int Delete(BsonExpression query)
{
lock (_lockObj)
{
using (var db = _db.OpenDatabase())
{
var col = db.GetCollection(_collectionName);

                return col.Count() > 0 ? col.DeleteMany(query) : 0;
            }
        }
    }

public LiteDatabase OpenDatabase()
{
var connectionString = $"Filename={ConnectionString}";

        if (_isShared)
        {
            connectionString += ";Connection = shared";
        }

        return new LiteDatabase(connectionString);
    }

`
Our application is installed as a Windows Service which is interacting with hardware to collect information and then writing it to the collection and for the most part, all the code works fine but occasionally (a couple of times a day) we end up with the database being locked.
Difficult to identify the root cause due to the level of automation we are dealing with.

@dethknite
Copy link

I would not use locking around opening the LiteDB connection (LiteDB has inherent locking built in to it). Perhaps just surround the DB connection routines with a try/catch, and on failures, have a wait 500ms, then retry logic with a max that writes to the log if fails out.

I noted you are using the lock() { } (which just uses Monitor Enter/Exit and is for locking in an application appdomain), and Mutex locks, which are used for inter-process locking. From the sound of it, you have a service and perhaps multiple processes that are running, and so you are using the Mutex for all the processes/svc to work together and not deadlock. Though locking at both levels is fine, I would recommend sticking with mutex locking if you are at the process level as that should handle everything, and remove any of the potential deadlocks in a process being caused from double locks.

What makes me think this way, is the first screenshot attached where the lock(mutex) occurs. Then in that lock, code paths are followed based on if anything is on the stack. If anything gets locked up within that lock { }, the mutex is stuck locked and all other processes or code trying to act on that mutex are deadlocked as well.

@sharadnair
Copy link
Author

We managed to identify the issue finally :)
We were storing the data in memory for performance reasons and a separate thread was performing bulk insert in the background. Turns out the bulk inserts were causing the mutex lock.
We were opening the connection to the litedb , iterating thru a for loop and calling insert within the loop. We assumed creating a new instance of LiteDatabase using

using db = new LiteDatabase(connectionString); foreach (var entry in entries) { //This call opens and closes the database and this locks up occassionally db.Insert(entry) } }

But only after digging into the source code, we noticed every call to the insert was opening and closing the database and the mutex lock was coming into play. Since this bulk insert was happening in a for loop and depending on the speed of the processor it ended up locking itself. We now put a thread sleep within each call to insert and this seems to have resolved our issue.
We hope to take this learning and implement a central and standard way of managing this in our framework.
Thanks for all the help, it was very useful.

@dethknite
Copy link

Glad to hear you solved it. From what you posted here... one optimization you could do to keep it one db insert is to store the entries in memory as a List of entries. Then you can simply insert them all in one call with bulk insert.

List<entry> lstEntries = new List<entry>(); public void AddBulk(List<entry> lstEntries) { using db = new LiteDatabase(connectionString) { db.InsertBulk(lstEntries); } }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants