Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock in SqliteQueueDatabase when calling stop() #2874

Closed
travnick opened this issue Apr 19, 2024 · 12 comments
Closed

Deadlock in SqliteQueueDatabase when calling stop() #2874

travnick opened this issue Apr 19, 2024 · 12 comments

Comments

@travnick
Copy link

SqliteQueueDatabase.stop() method acquires the lock (e.g. in main thread):

    def stop(self):
        logger.debug('environment stop requested.')
        with self._lock:

and then waits for writer to join self._writer.join()

In another place Writer.run() the connection from SqliteQueueDatabase is fetched:

    def run(self):
        conn = self.database.connection()

connection() performs connect() if required:

    def connection(self):
        if self.is_closed():
            self.connect()

Finally, we hit the lock again (in Writer thread):

    def connect(self, reuse_if_open=False):
        with self._lock:

So if stop() was called while the connection is closed, then we may hit the deadlock.

And so it is, since self._state is Thread Local data - which is initialized at first access by a new thread:

    def is_closed(self):
        return self._state.closed

What's more, because of thread local data each thread creates its own connection (when connection() is called) what looks like being against the documentation and design of SqliteQueueDatabase:
SqliteQueueDatabase is designed to simplify things by sending all write queries through a single, long-lived connection.

After a short review, this code is a source of bugs:

  • thread local ConnectionState (thread safe by design), is coupled with RLock(),
  • global ConnectionState (per Database instance) is not thread safe and not guarded at all.
        if thread_safe:
            self._state = _ConnectionLocal()
            self._lock = threading.RLock()
        else:
            self._state = _ConnectionState()
            self._lock = _NoopLock()
@coleifer
Copy link
Owner

Thanks looks like #2179 is partly to blame.

coleifer added a commit that referenced this issue Apr 19, 2024
Also avoid deadlocks in `SqliteQueueDatabase`. Refs #2874.
@coleifer
Copy link
Owner

What's more, because of thread local data each thread creates its own connection (when connection() is called) what looks like being against the documentation and design of SqliteQueueDatabase:
SqliteQueueDatabase is designed to simplify things by sending all write queries through a single, long-lived connection.

This is correct - we send all writes through the queue. Reads occur in separately and synchronously in whatever thread/connection executed them. WAL-mode allows this.

I believe the issue can be resolved by using a separate Lock instance for the start, stop, etc methods on the SqliteQueueDatabase.

See e6483e2

@travnick
Copy link
Author

I've put few comments on this commit.

What exactly does the Database._lock is about to protect in these Database methods?

    def connect(self, reuse_if_open=False):
        with self._lock:
    def close(self):
        with self._lock:

coleifer added a commit that referenced this issue Apr 19, 2024
@coleifer
Copy link
Owner

Thank you, I removed the unnecessary locking on the pause/unpause - you're absolutely right.

The idea with the other methods (connect and close) was to avoid potential issues w/driver code and also since we're looking at and updating some attributes on the database instance itself.

@travnick
Copy link
Author

I mean which fields exactly? I suppose it may be simplified, or should be improved - depending on the class design requirements.

@coleifer
Copy link
Owner

Shit maybe I don't need it anymore, it's realy just the stuff about setting the server version, more or less.

Let's try it: 6711884

@travnick
Copy link
Author

Since thread safety is quite hard thing to have it done properly, I suggest providing documentation about thread safety considerations for the whole class, and for each method separately (if different).

Then it will be possible to verify, if the code matches the design :) Right now I'm not sure what the aim is.

@coleifer
Copy link
Owner

It's quite simple. Since the database is a singleton instance and shared by threads (in multithreaded applications), the default is to maintain a connection per thread and any associated state (for us this is just the stuff stored in the self._state threadlocal). If you wish to share a connection among threads or are always running single-threaded you can initialize the database with thread_safe=False but there's no harm in keeping it as the default (True). That's really all there is to it.

@travnick
Copy link
Author

travnick commented Apr 22, 2024

Not thread safe code ⚠️ :

  • _set_server_version() - used by connect
  • init()
  • get_sql_context() - static variable context_class; Context.__call__() (and the whole class?) is not thread safe

It's hard to analyse the whole, since the implementation inheritance is used, so derived classes may break the base class thread safety.

At the very end - or a beginning of SqliteQueueDatabase, I can see that SQLite already isolates separate connections, so since enabling thread_safe=True makes the connection for each thread separately it looks like there is no need for SqliteQueueDatabase at all. Am I right?

@coleifer
Copy link
Owner

Set server version is probably fine to leave as-is, same with init(). Context is also fine since a new instance is created whenever a sql context is required.

As for SqliteQueueDatabase I think you're misunderstanding the purpose. Sqlite only allows 1 write connection - there are no concurrent writers. So if you have a multithreaded application using Sqlite, one way to get around this is to pass all writes through a dedicated writer connection.

@travnick
Copy link
Author

Ok, didn't know about that limitation. They write all around something like WAL provides more concurrency as readers do not block writers and a writer does not block readers. Reading and writing can proceed concurrently., but only in few places, that in fact only one writing thread is allowed until BEGIN CONCURRENT is used.

Oh, right, return self.context_class(**context) is creating a new object, since context_class is a type alias. Thought it's __call__ of Context object.

If init() is designed to be called only in one responsible thread, then it's fine - a documentation note would be nice :)

I see now that server_version of Database is newer set to 0 (_set_server_version is never called, only implementations in derived classes are). In SQLite it's static class variable too, but in MySql and Postgresql looks like writing and accessing it by multiple threads makes a data hazard.

coleifer added a commit that referenced this issue Apr 22, 2024
@coleifer
Copy link
Owner

Yes, that's what I mentioned about the connect and close. I've brought the locking back around those methods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants