Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermitten db connection error with aws rds #702

Open
badoet opened this issue Jan 23, 2018 · 5 comments
Open

Intermitten db connection error with aws rds #702

badoet opened this issue Jan 23, 2018 · 5 comments

Comments

@badoet
Copy link

badoet commented Jan 23, 2018

we got this error intermittently in our docker containers
read tcp <ip address>:39954-\u003e<ip address>:5432: read: connection reset by peer

the only special thing is that we use postgres in AWS RDS.
not sure if this affect anything. anyone facing the same issue?
im still using this commit e42267488fe361b9dc034be7a6bffef5b195bceb
and go1.9.2 only using database/sql library with github.com/lib/pq

once we got this errror, we just need to delete the pod and hope the new container does not run into this issue again. this is quite unstable and pretty sure we are not the only on who will get this issue

@chadweimer
Copy link

I'm experiencing the identical issue. I'm also running in docker, but rather than AWS RDS, my app is connecting to a postgres container on the same overlay network.

I'm currently on commit 90697d6. This seems like a relatively new problem (I didn't notice the problem while on commit b77235e).

@cristiangraz
Copy link

@chadweimer I'm seeing this exact same issue and oddly I'm also on the same commits as you and was using the same commit as you before these errors started popping up.

Current: 90697d6.
Previous: b77235e

It also happened after we updated and suddenly started seeing errors. We use glide to manage dependencies and initially thought this was related to another set of dependencies being updated at the same time but have now narrowed it down to the database connection (based on the IP address in the error).

We are running in docker (Alpine linux 3.7) on Go 1.10.1 but connecting to a database on GCP.

Here's the list of changes between those two versions:
b77235e...90697d6

What Go version and base container are you using?

@cristiangraz
Copy link

cristiangraz commented Jul 10, 2018

@chadweimer I think the issue may be this commit: 6e2a335

When ErrBadConn is returned, my understanding is that the query is automatically retried leading to the same issue but no error because it is handled transparently. I think that's happening is since that commit, the error is now returned causing the queries to fail.

In our case it's happening on idle connections. Are you seeing the same? Where is your application running? I'm going to try setting SetConnMaxLifetime this week. I've done some research and it looks like GCP closes idle connections after 10 minutes, and I've reached out to Compose support who says they've seen the same issue on AWS after ~5 minutes.

@badoet
Copy link
Author

badoet commented Jul 17, 2018

we use go 1.9.2 and prebuild the go binary
deploy with kihamo/scratch-ca-certs image
for awhile now we have not encountered this issue again
Seems like an intermittent error, very hard to reproduce to catch
i think further step we can take to catch this error automatically is to add to our livenessProbe a query to a path in our Go service which will try to do simple SQL query as the liveness test. After failing 3 times, kubernetes can auto restart this Go pod.

@Lekensteyn
Copy link

#1013 (lib/pq >= v1.9.0) should have addressed the issue where a dead connection is stuck forever. An analysis of the current situation can be found in #835 (comment)

I think that this issue can be closed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants