-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XXX: Hack statement ID to be random instead of sequential #1740
Conversation
Use random statement IDs instead of sequence to avoid conflicts (almost 100% of the time). This obviously is not the solution to the problem. It is only the proof of concept to verify that this is the root cause of the problem.
a74e03f
to
5543374
Compare
Serious question: why use PgBouncer? It doesn't seem to offer anything special over using a connection pool in the application. I suppose it makes it easier to pool connections from multiple applications, and it lets you pretend to have more than the maximum by multiplexing requests, but it just seems like a really hacky way to do it in general. In turn you seem to give up one really powerful feature: prepared statements, along with a bunch of other nice-to-have features like advisory locks and LISTEN/NOTIFY. I think the ideal solution to me is for us to somehow detect when we're talking to PgBouncer and just stop caching prepared statements. But that's not quite enough as the Statement pooling mode we're kind of out of luck though. |
@mareq did you see my above comment? |
@abonander Yes, sorry, I got side-tracked. You are making a valid point there. We have managed to dodge the issue by essentially throwing more money at the problem, so no PgBouncer is no longer needed. Still need to investigate our root cause and based on that we will need to come up with the resolution. So I do not know yet, if that will include PgBouncer or not. In any case, thank you for your help 👍. If you do not mind keeping this open for the time being, I will come back once I have something to share. Otherwise please reject this PR and I will eventually open a new one if and when needed. |
@mareq Thank you! I may use this hack too.
In my case I have no choice. My application needs to query NEAR Explorer DB (https://github.com/near/near-indexer-for-explorer) and it looks like they use PgBouncer. This causes quite amount of undesired errors in my app in production :/ |
I think the best advice I can give if you have no choice but to use PgBouncer is to avoid prepared statements entirely. If you invoke methods of Executor directly with query strings, it will not use prepared statements. You can't use bind arguments this way, sadly. That's inherently tied to the prepared statement interface. Unfortunately, the way PgBouncer's transaction pooling mode is designed, I don't think there's anything we can do from the client-side to ever guarantee with any level of confidence that prepared statements will work correctly. The one exception would be to always begin a transaction before using the prepared statement interface via In that case, we should provide some way to ensure that prepared statement IDs don't collide, but I don't know if I like just generating random IDs and hoping for the best. It seems inelegant, makes the IDs on average longer than they need to be, doesn't help with debugging, and has a nonzero chance to still suffer collisions (especially in a VM or container environment where Since Postgres returns a specific error code if the client submits a duplicate prepared statement ID, I'm thinking we should use that to detect this condition and set Otherwise, I would also recommend joining the discussion on the PgBouncer side via the following issue, and see if it's possible to convince them to properly support prepared statements: pgbouncer/pgbouncer#695 |
@abonander Thank you! Using the method from |
Closing this since we're unlikely to accept it as-is. If anyone wants to take another stab at it, I'd like to see the approach I described in #1740 (comment) |
I don't think that's true. Postgres has the extended protocol, with support for anonymous prepared statements which just sends a If you use pg_bench, it has a Adding prepared statements support to poolers isn't straight forward. When multiplexing N clients with M servers, the pooler has to keep track of which client prepared which statement on which server, and since clients don't randomize their prepared statement names (and they are not guaranteed to be unique), I don't think there is really a way to know if a prepared statement has been prepared already or not. |
That's exactly how one uses Postgres at scale. It's not uncommon for enterprises to launch thousands of containers to serve traffic or to have hundreds of microservices connecting to the same database. We've ran Postgres with over 200,000 clients connecting to PgBouncer and multiplexing probably 10-15 server connections. Postgres is known for not being able to have many concurrent connections open and a pooler is the only way presently to use Postgres at scale. Coming from a Rust world, I can see how one could run a large business from just a few servers, but in other languages that require a bit more compute and have global locks on the runtime, like Python, Ruby, and others, the natural way to parallelize computation is processes, not threads, so many connections from many different instances of the app need to be open to the database to serve traffic. Even in apps that can easily multithread, a connection cannot be shared between threads, since it's stateful. So, a connection pool with # connections = # of threads is usually required. Multiply this by 100s of instances of the same app running in a highly available at scale deployment and you'll run out of Postgres connections very quickly without using a pooler. |
@levkk this is great postgresml/pgcat#474 - I'm surprised that no one mentioned it here :) for anyone else who loves sqlx and wants to use "external" connection pooler then consider using pgcat over pgbouncer. The cool part is that PgCat recognizes SQLx :) https://postgresml.org/blog/making-postgres-30-percent-faster-in-production
|
Hi, we are trying to use PgBouncer, but there is (a known? related issue: #67 (comment)) problem with the duplicate IDs of prepared statements used by distinct instances sharing the database connections.
One nasty hacky way to get around the problem (statistically working most of the time) is using random statement IDs instead of the sequence. This obviously is not the real solution. It is only the proof of concept to verify that this is the root cause.
So, I am not really looking to merge this PR. Instead, I wanted to ask you guys what are your thoughts on this. What would you see as the best approach to resolve the problem?