Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve max_lifetime handling #3051

Closed
mirek26 opened this issue Feb 11, 2024 · 2 comments
Closed

Improve max_lifetime handling #3051

mirek26 opened this issue Feb 11, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@mirek26
Copy link
Contributor

mirek26 commented Feb 11, 2024

Is your feature request related to a problem? Please describe.

We have a latency-sensitive system and we're seeing regular spikes in query latency. After some digging, I believe there are two interconnected issues:

  1. Establishing a new database connection is slow (which I'm trying to improve here and here)
  2. Connection pool does not reap old connections well, leading to a wave of new connections every max_lifetime

A somewhat related issue is #2854 - similar symptoms. All connections are replaced after max_lifetime AND this happens during acquire, so we observe spikes in query latency due to (1).

The issue with (2) is that the maintenance task only looks at (size - min_size) connections. This means that if size = min_size, it does nothing. We run with relatively high min_size to ensure we always have a good number of available connections, but after learning this, I suspect it may actually be counterproductive.

Describe the solution you'd like

The pool should be replacing connections in the background. It should be extremely rare that a connection needs to be opened at query time. Even if opening the connection takes 20ms - we're writing Rust so my expectations are high ;-)

@abonander had a related comment in #2848:

The logic of acquire() should be changed to race between acquiring a connection from the idle queue and opening a new connection.

💯 . This would be amazing. But I don't think it'd fully solve my issue - the problem is that if all connections in the pool were opened around the same time, the next connection we acquire from the idle queue is likely to be above max lifetime as well.

My preferred combination of changes would be:

  1. Don't check max_lifetime on acquire. Instead, check it on release in return_to_pool. If users want to check this on acquire, they are free to do so manually in before_acquire
  2. Improve the replacement logic: go though the full pool one-by-one, if old or idle, close it and run min_connections_maintenance immediately
  3. The suggestion above: acquire() should be changed to race between acquiring a connection from the idle queue and opening a new connection

I'm open to other solutions; happy to contribute changes if we agree on the path forward

@mirek26 mirek26 added the enhancement New feature or request label Feb 11, 2024
@mirek26
Copy link
Contributor Author

mirek26 commented Feb 12, 2024

After writing this, I figured we can simulate proposal 1 by

     .max_lifetime(None)
     .after_release(|_conn, meta| Box::pin(async move { Ok(meta.age.as_secs() < 30 * 60) }))

We don't get the benefit of background max lifetime checking, but we run enough queries for this to be checked often enough. This chart shows the p99 latency of different queries after rolling out this change:
Screenshot 2024-02-12 at 8 38 46 AM

Basically: synchronous reconnections at query time became very rare.

I still think this change should be done in sqlx - this seems to be a much better default

@abonander
Copy link
Collaborator

abonander commented Mar 19, 2024

CLosed by #3065

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants