Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect connection pool deadlocks #63

Closed
grosch opened this issue Mar 16, 2020 · 23 comments · Fixed by #67
Closed

Detect connection pool deadlocks #63

grosch opened this issue Mar 16, 2020 · 23 comments · Fixed by #67

Comments

@grosch
Copy link

grosch commented Mar 16, 2020

Something like this will hang the client, and in db log we get

LOG: unexpected EOF on client connection with an open transaction

Client.find(id, on: req.db).flatMap {
    /// Validate we want to make updates
    req.db.transaction { db in
        // Do stuff

I don't really want to create a transaction if I don't have to. In the case where I discovered this I'm doing multiple queries against multiple tables, and only under certain conditions do I then create a transaction and update multiple different tables.

@tanner0101
Copy link
Member

tanner0101 commented Mar 17, 2020

Nesting a transaction should work as you have it there. Which db driver are you using?

@grosch
Copy link
Author

grosch commented Mar 17, 2020

Fluent

@grosch
Copy link
Author

grosch commented Mar 17, 2020

Sigh. And by Fluent I mean PostgreSQL

@grosch
Copy link
Author

grosch commented Mar 18, 2020

This is all it takes to cause deadlock in the app

func badMonkey(req: Request) throws -> EventLoopFuture<HTTPStatus> {
    req.db.transaction { db in
        Point.query(on: req.db).all().transform(to: .noContent)
    }
}

Notice how I used req.db inside the transaction.

@tanner0101
Copy link
Member

tanner0101 commented Mar 19, 2020

Hmm that's different from the original example. It makes sense that would not work since you should only use the db supplied to the closure while in a transaction. Try refactoring like this:

func badMonkey(req: Request) throws -> EventLoopFuture<HTTPStatus> {
    let points = Point.query(on: req.db).all()
    return req.db.transaction { db in
        points...
    }.transform(to: .noContent)
}

Or this:

func badMonkey(req: Request) throws -> EventLoopFuture<HTTPStatus> {
    req.db.transaction { db in
        Point.query(on: db).all().transform(to: .noContent)
    }
}

To clarify, for this to work you need two connections:

req.db.transaction { db in
    Point.query(on: req.db).all().transform(to: .noContent)
}

One in a transaction and one normal connection. These must be separate connections since once a connection is in a transaction all queries on that connection will be a part of the transaction.

This is deadlocking immediately since FluentPostgresDriver currently defaults to 1 connection max per pool: https://github.com/vapor/fluent-postgres-driver/blob/master/Sources/FluentPostgresDriver/FluentPostgresConfiguration.swift#L22

Increasing that number should stop this from always deadlocking, but I'm not sure you can avoid deadlocks here. Say you up the number to 2 max connections per event loop. Two requests come in at the same time to badMonkey and start the transaction. Now these two requests have consumed two connections and will be deadlocked waiting for another connection. To avoid deadlocks you must never have a pooled resource require requesting another pooled resource in order to release.

Fluent takes care of this if you use db instead of req.db while inside the transaction. Ideally we could provide some sort of error if we detect a deadlock is possible, but I'm not sure how that would be implemented.

@grosch
Copy link
Author

grosch commented Mar 19, 2020

Ideally we could provide some sort of error if we detect a deadlock is possible, but I'm not sure how that would be implemented.

Exactly. If you have some way to flag it happened, awesome, otherwise close this. I know I'm supposed to always use the db provided by the transaction but we run into trouble when updating a large method where we realize, "Oh, I need a transaction here" and so we wrap it, change most req.db to db but miss one.

@tanner0101
Copy link
Member

I'll leave this open as a reminder to try and find a way to detect deadlocks.

@tanner0101 tanner0101 changed the title Can't use transactions in a nested query Detect deadlocks May 8, 2020
@tanner0101 tanner0101 transferred this issue from vapor/fluent-kit May 8, 2020
@tanner0101 tanner0101 changed the title Detect deadlocks Detect connection pool deadlocks May 8, 2020
@MrMage
Copy link
Contributor

MrMage commented May 8, 2020 via email

@tanner0101
Copy link
Member

tanner0101 commented May 18, 2020

You can create a deadlock by doing something like:

let pool = EventLoopConnectionPool(..., maxConnections: 1)
pool.withConnection { a in
    pool.withConnection { b in
        pool.eventLoop.makeSuccededFuture()
    }
}

Since the pool only has one connection and we're waiting for another connection from the pool before we can return the first connection, this code can never finish.

Fluent queries, when run on req.db, do pool.withConnection internally. So nesting req.db usage inside a transaction closure (which is inside a pool.withConnection closure) is similar to the above example and will require at least 2 connections.

@MrMage
Copy link
Contributor

MrMage commented May 19, 2020

Since the pool only has one connection and we're waiting for another connection from the pool before we can return the first connection, this code can never finish.

Would that also apply if the pool has 2 or more connections? Wouldn't the second call to pool.withConnection just request a second connection from the pool in that case? (Personally, I run most of my pools with ~5 connections and drop "stale" connections from time to time when they have not been used for a while.)

@tanner0101
Copy link
Member

tanner0101 commented May 19, 2020

Yes, if the pool had a max of two connections, then that code would not be a problem.

To generalize, if you have a pool with n max connections and you request n + 1 connections from the pool, you will deadlock. In my example n = 1 (maxConnections: 1) and the code requires 2 connections from the pool at once to complete. Thus deadlock.

If you have n = 2 (maxConnections: 2), then you could create a deadlock by doing:

let pool = EventLoopConnectionPool(..., maxConnections: 2)
pool.withConnection { first in
    pool.withConnection { second in
        pool.withConnection { third in
            // we have three connections at the same time
            // this is impossible since maxConnections = 2
            print(first, second, third)
            ...
        }
    }
}

@MrMage
Copy link
Contributor

MrMage commented May 20, 2020

Right, that is the way I would have expected it to work. As I said, my personal "solution" is to just make sure my connection pool is large enough...

@tanner0101
Copy link
Member

Yup that's a good solution. I do think it would be helpful if the connection pool could detect deadlocks to assist in debugging issues around a connection pool being too small. But I'm not sure that will be possible to implement.

@MrMage
Copy link
Contributor

MrMage commented May 21, 2020

I do think it would be helpful if the connection pool could detect deadlocks to assist in debugging issues around a connection pool being too small. But I'm not sure that will be possible to implement.

Agreed, to both the usefulness and (sadly) the feasibility.

@MrLotU
Copy link
Member

MrLotU commented May 22, 2020

From taking a quick peek at the code, wouldn't simply keeping count of available connections - requested connections do the trick? That way if that number becomes 0 and yet another connection is requested, you can fail the future, correct?

Though I might be oversimplifying it here :)

@MrMage
Copy link
Contributor

MrMage commented May 22, 2020

From taking a quick peek at the code, wouldn't simply keeping count of available connections - requested connections do the trick? That way if that number becomes 0 and yet another connection is requested, you can fail the future, correct?

Yes, but then any requests received while all e.g. 5 connections are momentarily in use would simply fail. So we would start to randomly drop connections any time all connections are "taken" rather than simply wait until they become available again.

However, I'd say that having a timeout for this waiting could be a good idea. That way, "deadlocked" requests could e.g. be released after, say, 10 seconds. Not ideal, but better than locking indefinitely, I guess.

@MrLotU
Copy link
Member

MrLotU commented May 22, 2020

Yes, but then any requests received while all e.g. 5 connections are momentarily in use would simply fail.

Very good point, totally overlooked that. In that case a timeout does sound better than deadlocking. This would at least free up the connections after x amount of time, instead of at some point deadlocking everything because there are no more connections that can be freed.

@mxcl
Copy link

mxcl commented Jul 3, 2020

I feel it should not be possible to deadlock Vapor apps if me or my developers accidentally use rq.db rather than db, a common and easy to make mistake; usually we are doing rq.db.

I wonder if it’s possible that once a transaction is started the value of db on rq could be changed by Vapor.

I understand some mitigation has been done in #67, but I'd rather see Vapor make design choices that prevent these kinds of scenarios in the first place. To err is human.

@tanner0101
Copy link
Member

tanner0101 commented Jul 14, 2020

It's a difficult balancing act. Vapor should try to prevent common errors but it should also not get in the way of the developer. If we silently swap out req.db for example, that would prevent you from being able to make any requests outside of the transaction even if you wanted to. I can imagine a few cases where that would be useful.

At least #67 has prevented applications from hanging forever if deadlocks are created. But I agree we can do better.

A few ideas:

"leaky" pooling

I've opened an issue for this here #72. This allows the connection pool to temporarily open new connections beyond its maximum if needed.

detect deadlocks

Maybe there is some way for withConnection closures to share context and detect / prevent impending deadlocks. I'm less sure this is possible, but it's worth looking into.

researching other frameworks

How have other web frameworks dealt with this? Node.js / Java would probably be good places to look.

@MrLotU
Copy link
Member

MrLotU commented Jul 15, 2020

detect deadlocks

I think with the async nature of our connection pool this is going to be a tricky one, mostly because requesting n + 1 connection doesn't always mean a deadlock. It could also mean you'd get the n + 1th connection once the 1st is done.

researching other frameworks

Did a quick peek into Python's Peewee ORM, and they either block for a certain time, indefinitely, or raise an error when requesting n + 1 connections.

I don't think always raising when requesting n + 1 conns are requested is a good idea (see above). The timeout measures are something we have in place too (Except the option to block indefinite, though you could set the timeout to 24 hours or something stupid large like that).


Also looked at Node's Sequelize ORM, and they accept timeouts for acquiring a new connection, a min and max.

Full options from Sequelize
option description
options.pool.max Maximum number of connection in pool
options.pool.min Minimum number of connection in pool
options.pool.idle The maximum time, in milliseconds, that a connection can be idle before being released.
options.pool.acquire The maximum time, in milliseconds, that pool will try to get connection before throwing error
options.pool.evict The time interval, in milliseconds, after which sequelize-pool will remove idle connections.
options.pool.validate A function that validates a connection. Called with client. The default function checks that client is an object, and that its state is not disconnected
options.pool.maxUses The number of times a connection can be used before discarding it for a replacement, used for eventual cluster rebalancing.

For Java I found the Hibernate ORM and the following documentation on connection pooling.

The documentation mentions two options here.
C3PO provides a min and max size, as well as a timeout for getting a connection.
Proxool accepts a max and minimum connection count.


Overall seems none of the ORMs fully prevent deadlocks, other than timeouts when you are in a deadlock and I think this is the approach we should follow too

@mxcl
Copy link

mxcl commented Jul 15, 2020

I don’t agree that this is an acceptable compromise, but this is not my project, you have your own list of priorities that you adhere to.

In our app I wrote a wrapper that prevents this from happening and am ok with that solution.

@tanner0101
Copy link
Member

@mxcl is the wrapper in your app following a similar approach to replacing req.db? Also interested for your thoughts on #72 and if that seems like something that would benefit you.

@mxcl
Copy link

mxcl commented Jul 15, 2020

We’re using https://github.com/candor/sublimate which is the “synchronous” solution I was talking about ~6 months ago. Inside of which the db is swapped out once a transaction begins.

Overall I aim for reliability above all, so this is why the set of trade offs for me end up there, but I certainly understand every library and framework has different base policies that effect everything above that. We’re all experienced in releasing such tooling here and I have much respect for this project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants