-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
influxdb output leaks connections #1058
Comments
thanks for the report, I don't think there's anything wrong with the client, it's this that's causing the bug: https://github.com/influxdata/telegraf/blob/master/plugins/outputs/influxdb/influxdb.go#L198 |
Although it's not 100% clear to me why your OS is keeping alive so many dead TCP connections, can you give me the output of these commands?
|
Of course, you should close the client before clear the array. But Close method of http client is just empty. server running influxdb
server running telegraf:
|
you're right, I don't think changing the close method will actually do anything to help this issue. But I'm still not quite sure I follow the logic on reading the response object. When querying influx there is only a single json object that gets returned (it begins with |
From what I can tell, the problem is that Telegraf creates a new http client by calling Connect() every time there is a write failure. The idea here was that if the database goes away, Telegraf could recreate on restart. (see #836) The problem in this case is that influxdb never goes away, and thus all tcp connections are still valid, and thus creating new http clients just results in more and more tcp connections. So by "fixing" #836 I actually unintentionally created a situation where this could happen. EDIT: I am going to change the implementation so that it checks the error message for "database not found", and if that's there then it tries to recreate. It will no longer create new http clients. cc @jchauncey |
@sparrc After digging a lot I just realized that too. Implementing HttpClient's Close method and call CloseIdleConnections of Transport should solve this problem. I'm sorry that I just guessed the reason this morning and that may be misleading. |
If users properly call client.Close(), then this will make sure that established tcp connections dont continually grow when creating new http clients. This fixes the case where users are creating new http clients on top of existing _valid_ connections. This was encountered in Telegraf when we were recreating our http clients after getting write failures that were unrelated to the actual connection being severed (such as typos in the retention policy, see influxdata/telegraf#1058)
no problem, thanks for the report!, I've submitted a PR to influxdb to fix that, and a separate PR to telegraf. I think it's probably best to leave the http clients open and only recreate the db if that is the specific error encountered. |
closes #1058 closes #1059 also see influxdata/influxdb#6425
I putted the wrong retention policy in the config file yesterday and every write failed since that. I found there were about 5k tcp connections between telegraf and influxdb(port 8086) this morning.
The issue may be caused by Query function in influxdb/client/v2/client.go which uses JSON decoder to read the response body. The decoder just reads the first value and if there is still some data in the response stream the connection won't be closed or reused.
The text was updated successfully, but these errors were encountered: