Detect connection pool deadlocks #63

grosch · 2020-03-16T04:35:25Z

Something like this will hang the client, and in db log we get

LOG: unexpected EOF on client connection with an open transaction

Client.find(id, on: req.db).flatMap {
    /// Validate we want to make updates
    req.db.transaction { db in
        // Do stuff

I don't really want to create a transaction if I don't have to. In the case where I discovered this I'm doing multiple queries against multiple tables, and only under certain conditions do I then create a transaction and update multiple different tables.

tanner0101 · 2020-03-17T19:29:20Z

Nesting a transaction should work as you have it there. Which db driver are you using?

grosch · 2020-03-17T19:40:14Z

Fluent

grosch · 2020-03-17T19:40:25Z

Sigh. And by Fluent I mean PostgreSQL

grosch · 2020-03-18T04:04:31Z

This is all it takes to cause deadlock in the app

func badMonkey(req: Request) throws -> EventLoopFuture<HTTPStatus> {
    req.db.transaction { db in
        Point.query(on: req.db).all().transform(to: .noContent)
    }
}

Notice how I used req.db inside the transaction.

tanner0101 · 2020-03-19T15:30:34Z

Hmm that's different from the original example. It makes sense that would not work since you should only use the db supplied to the closure while in a transaction. Try refactoring like this:

func badMonkey(req: Request) throws -> EventLoopFuture<HTTPStatus> {
    let points = Point.query(on: req.db).all()
    return req.db.transaction { db in
        points...
    }.transform(to: .noContent)
}

Or this:

func badMonkey(req: Request) throws -> EventLoopFuture<HTTPStatus> {
    req.db.transaction { db in
        Point.query(on: db).all().transform(to: .noContent)
    }
}

To clarify, for this to work you need two connections:

req.db.transaction { db in
    Point.query(on: req.db).all().transform(to: .noContent)
}

One in a transaction and one normal connection. These must be separate connections since once a connection is in a transaction all queries on that connection will be a part of the transaction.

This is deadlocking immediately since FluentPostgresDriver currently defaults to 1 connection max per pool: https://github.com/vapor/fluent-postgres-driver/blob/master/Sources/FluentPostgresDriver/FluentPostgresConfiguration.swift#L22

Increasing that number should stop this from always deadlocking, but I'm not sure you can avoid deadlocks here. Say you up the number to 2 max connections per event loop. Two requests come in at the same time to badMonkey and start the transaction. Now these two requests have consumed two connections and will be deadlocked waiting for another connection. To avoid deadlocks you must never have a pooled resource require requesting another pooled resource in order to release.

Fluent takes care of this if you use db instead of req.db while inside the transaction. Ideally we could provide some sort of error if we detect a deadlock is possible, but I'm not sure how that would be implemented.

grosch · 2020-03-19T16:12:22Z

Ideally we could provide some sort of error if we detect a deadlock is possible, but I'm not sure how that would be implemented.

Exactly. If you have some way to flag it happened, awesome, otherwise close this. I know I'm supposed to always use the db provided by the transaction but we run into trouble when updating a large method where we realize, "Oh, I need a transaction here" and so we wrap it, change most req.db to db but miss one.

tanner0101 · 2020-03-20T22:12:05Z

I'll leave this open as a reminder to try and find a way to detect deadlocks.

MrMage · 2020-05-08T16:33:16Z

Why would this cause deadlocks? My understanding from @tanner0101 's explanations was that each query should get a new connection. So this shouldn't cause problems as long as the pool has at least 2 connections. Also, shouldn't the connection from the first query have been returned to the pool once the ode inside the `flatMap` block is run?

…

On 8. May 2020, at 18:21, grosch ***@***.***> wrote: Something like this will hang the client, and in db log we get LOG: unexpected EOF on client connection with an open transaction Client.find(id, on: req.db).flatMap { /// Validate we want to make updates req.db.transaction { db in // Do stuff I don't really want to create a transaction if I don't have to. In the case where I discovered this I'm doing multiple queries against multiple tables, and only under certain conditions do I then create a transaction and update multiple different tables. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

tanner0101 · 2020-05-18T21:10:56Z

You can create a deadlock by doing something like:

let pool = EventLoopConnectionPool(..., maxConnections: 1)
pool.withConnection { a in
    pool.withConnection { b in
        pool.eventLoop.makeSuccededFuture()
    }
}

Since the pool only has one connection and we're waiting for another connection from the pool before we can return the first connection, this code can never finish.

Fluent queries, when run on req.db, do pool.withConnection internally. So nesting req.db usage inside a transaction closure (which is inside a pool.withConnection closure) is similar to the above example and will require at least 2 connections.

MrMage · 2020-05-19T10:58:35Z

Since the pool only has one connection and we're waiting for another connection from the pool before we can return the first connection, this code can never finish.

Would that also apply if the pool has 2 or more connections? Wouldn't the second call to pool.withConnection just request a second connection from the pool in that case? (Personally, I run most of my pools with ~5 connections and drop "stale" connections from time to time when they have not been used for a while.)

tanner0101 · 2020-05-19T17:19:19Z

Yes, if the pool had a max of two connections, then that code would not be a problem.

To generalize, if you have a pool with n max connections and you request n + 1 connections from the pool, you will deadlock. In my example n = 1 (maxConnections: 1) and the code requires 2 connections from the pool at once to complete. Thus deadlock.

If you have n = 2 (maxConnections: 2), then you could create a deadlock by doing:

let pool = EventLoopConnectionPool(..., maxConnections: 2)
pool.withConnection { first in
    pool.withConnection { second in
        pool.withConnection { third in
            // we have three connections at the same time
            // this is impossible since maxConnections = 2
            print(first, second, third)
            ...
        }
    }
}

MrMage · 2020-05-20T07:57:23Z

Right, that is the way I would have expected it to work. As I said, my personal "solution" is to just make sure my connection pool is large enough...

tanner0101 · 2020-05-20T15:49:21Z

Yup that's a good solution. I do think it would be helpful if the connection pool could detect deadlocks to assist in debugging issues around a connection pool being too small. But I'm not sure that will be possible to implement.

MrMage · 2020-05-21T18:12:52Z

I do think it would be helpful if the connection pool could detect deadlocks to assist in debugging issues around a connection pool being too small. But I'm not sure that will be possible to implement.

Agreed, to both the usefulness and (sadly) the feasibility.

MrLotU · 2020-05-22T07:22:19Z

From taking a quick peek at the code, wouldn't simply keeping count of available connections - requested connections do the trick? That way if that number becomes 0 and yet another connection is requested, you can fail the future, correct?

Though I might be oversimplifying it here :)

MrMage · 2020-05-22T08:22:20Z

From taking a quick peek at the code, wouldn't simply keeping count of available connections - requested connections do the trick? That way if that number becomes 0 and yet another connection is requested, you can fail the future, correct?

Yes, but then any requests received while all e.g. 5 connections are momentarily in use would simply fail. So we would start to randomly drop connections any time all connections are "taken" rather than simply wait until they become available again.

However, I'd say that having a timeout for this waiting could be a good idea. That way, "deadlocked" requests could e.g. be released after, say, 10 seconds. Not ideal, but better than locking indefinitely, I guess.

MrLotU · 2020-05-22T09:50:59Z

Yes, but then any requests received while all e.g. 5 connections are momentarily in use would simply fail.

Very good point, totally overlooked that. In that case a timeout does sound better than deadlocking. This would at least free up the connections after x amount of time, instead of at some point deadlocking everything because there are no more connections that can be freed.

mxcl · 2020-07-03T17:19:19Z

I feel it should not be possible to deadlock Vapor apps if me or my developers accidentally use rq.db rather than db, a common and easy to make mistake; usually we are doing rq.db.

I wonder if it’s possible that once a transaction is started the value of db on rq could be changed by Vapor.

I understand some mitigation has been done in #67, but I'd rather see Vapor make design choices that prevent these kinds of scenarios in the first place. To err is human.

tanner0101 · 2020-07-14T22:53:18Z

It's a difficult balancing act. Vapor should try to prevent common errors but it should also not get in the way of the developer. If we silently swap out req.db for example, that would prevent you from being able to make any requests outside of the transaction even if you wanted to. I can imagine a few cases where that would be useful.

At least #67 has prevented applications from hanging forever if deadlocks are created. But I agree we can do better.

A few ideas:

"leaky" pooling

I've opened an issue for this here #72. This allows the connection pool to temporarily open new connections beyond its maximum if needed.

detect deadlocks

Maybe there is some way for withConnection closures to share context and detect / prevent impending deadlocks. I'm less sure this is possible, but it's worth looking into.

researching other frameworks

How have other web frameworks dealt with this? Node.js / Java would probably be good places to look.

MrLotU · 2020-07-15T09:24:46Z

detect deadlocks

I think with the async nature of our connection pool this is going to be a tricky one, mostly because requesting n + 1 connection doesn't always mean a deadlock. It could also mean you'd get the n + 1th connection once the 1st is done.

researching other frameworks

Did a quick peek into Python's Peewee ORM, and they either block for a certain time, indefinitely, or raise an error when requesting n + 1 connections.

I don't think always raising when requesting n + 1 conns are requested is a good idea (see above). The timeout measures are something we have in place too (Except the option to block indefinite, though you could set the timeout to 24 hours or something stupid large like that).

Also looked at Node's Sequelize ORM, and they accept timeouts for acquiring a new connection, a min and max.

Full options from Sequelize

option	description
options.pool.max	Maximum number of connection in pool
options.pool.min	Minimum number of connection in pool
options.pool.idle	The maximum time, in milliseconds, that a connection can be idle before being released.
options.pool.acquire	The maximum time, in milliseconds, that pool will try to get connection before throwing error
options.pool.evict	The time interval, in milliseconds, after which sequelize-pool will remove idle connections.
options.pool.validate	A function that validates a connection. Called with client. The default function checks that client is an object, and that its state is not disconnected
options.pool.maxUses	The number of times a connection can be used before discarding it for a replacement, used for eventual cluster rebalancing.

For Java I found the Hibernate ORM and the following documentation on connection pooling.

The documentation mentions two options here.
C3PO provides a min and max size, as well as a timeout for getting a connection.
Proxool accepts a max and minimum connection count.

Overall seems none of the ORMs fully prevent deadlocks, other than timeouts when you are in a deadlock and I think this is the approach we should follow too

mxcl · 2020-07-15T17:27:45Z

I don’t agree that this is an acceptable compromise, but this is not my project, you have your own list of priorities that you adhere to.

In our app I wrote a wrapper that prevents this from happening and am ok with that solution.

tanner0101 · 2020-07-15T17:33:23Z

@mxcl is the wrapper in your app following a similar approach to replacing req.db? Also interested for your thoughts on #72 and if that seems like something that would benefit you.

mxcl · 2020-07-15T17:39:24Z

We’re using https://github.com/candor/sublimate which is the “synchronous” solution I was talking about ~6 months ago. Inside of which the db is swapped out once a transaction begins.

Overall I aim for reliability above all, so this is why the set of trade offs for me end up there, but I certainly understand every library and framework has different base policies that effect everything above that. We’re all experienced in releasing such tooling here and I have much respect for this project.

tanner0101 changed the title ~~Can't use transactions in a nested query~~ Detect deadlocks May 8, 2020

tanner0101 transferred this issue from vapor/fluent-kit May 8, 2020

tanner0101 changed the title ~~Detect deadlocks~~ Detect connection pool deadlocks May 8, 2020

MrLotU mentioned this issue May 23, 2020

Timeout connection requests when in a deadlock. #67

Merged

MrLotU closed this as completed in #67 Jun 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detect connection pool deadlocks #63

Detect connection pool deadlocks #63

grosch commented Mar 16, 2020

tanner0101 commented Mar 17, 2020 •

edited

Loading

grosch commented Mar 17, 2020

grosch commented Mar 17, 2020

grosch commented Mar 18, 2020

tanner0101 commented Mar 19, 2020 •

edited

Loading

grosch commented Mar 19, 2020

tanner0101 commented Mar 20, 2020

MrMage commented May 8, 2020 via email

tanner0101 commented May 18, 2020 •

edited

Loading

MrMage commented May 19, 2020 •

edited

Loading

tanner0101 commented May 19, 2020 •

edited

Loading

MrMage commented May 20, 2020

tanner0101 commented May 20, 2020

MrMage commented May 21, 2020

MrLotU commented May 22, 2020

MrMage commented May 22, 2020

MrLotU commented May 22, 2020

mxcl commented Jul 3, 2020

tanner0101 commented Jul 14, 2020 •

edited

Loading

MrLotU commented Jul 15, 2020

mxcl commented Jul 15, 2020

tanner0101 commented Jul 15, 2020

mxcl commented Jul 15, 2020

Detect connection pool deadlocks #63

Detect connection pool deadlocks #63

Comments

grosch commented Mar 16, 2020

tanner0101 commented Mar 17, 2020 • edited Loading

grosch commented Mar 17, 2020

grosch commented Mar 17, 2020

grosch commented Mar 18, 2020

tanner0101 commented Mar 19, 2020 • edited Loading

grosch commented Mar 19, 2020

tanner0101 commented Mar 20, 2020

MrMage commented May 8, 2020 via email

tanner0101 commented May 18, 2020 • edited Loading

MrMage commented May 19, 2020 • edited Loading

tanner0101 commented May 19, 2020 • edited Loading

MrMage commented May 20, 2020

tanner0101 commented May 20, 2020

MrMage commented May 21, 2020

MrLotU commented May 22, 2020

MrMage commented May 22, 2020

MrLotU commented May 22, 2020

mxcl commented Jul 3, 2020

tanner0101 commented Jul 14, 2020 • edited Loading

MrLotU commented Jul 15, 2020

detect deadlocks

researching other frameworks

mxcl commented Jul 15, 2020

tanner0101 commented Jul 15, 2020

mxcl commented Jul 15, 2020

tanner0101 commented Mar 17, 2020 •

edited

Loading

tanner0101 commented Mar 19, 2020 •

edited

Loading

tanner0101 commented May 18, 2020 •

edited

Loading

MrMage commented May 19, 2020 •

edited

Loading

tanner0101 commented May 19, 2020 •

edited

Loading

tanner0101 commented Jul 14, 2020 •

edited

Loading