Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client: prevent leak of file descriptors with multiple clients #434

Merged
merged 1 commit into from
Dec 17, 2015

Conversation

orivej
Copy link
Contributor

@orivej orivej commented Dec 14, 2015

Each HTTP client keeps alive up to 2 connections, so the number of clients created in the lifetime of an application must be constant to prevent exhaustion of file descriptors. The options to fix this are:

  • do not keep connections open (as done in this patch)
  • keep one global client
  • keep a global pool of clients or transports
  • export and document a method to close connections that calls (*http.Transport).CloseIdleConnections()
  • document that users can only instantiate a small number of clients during the lifetime of an application

@orivej
Copy link
Contributor Author

orivej commented Dec 14, 2015

Related to #404.

@fsouza
Copy link
Owner

fsouza commented Dec 15, 2015

@orivej I don't think that we should apply this change. I'm probably leaning towards option 4, but understanding that people will probably not read the documentation and catch the leaking with profiling/debugging and conclude that they should not create too many instances of the go-dockerclient.

Keeping less instances of the Client is preferred. Would you be interested in adding this to the documentation of the library?

@orivej
Copy link
Contributor Author

orivej commented Dec 15, 2015

I thought of another option, registering CloseIdleConnections() of new transports with runtime.SetFinalizer. This will save existing users of go-dockerclient from a lot of trouble of adding Close after NewClient.

@fsouza
Copy link
Owner

fsouza commented Dec 15, 2015

@orivej seems like a better idea, but there's an important point: we should do that only when we allocate the the client, we should not call CloseIdleConnections if the user code has replaced the HTTPClient.

One approach is having the cleanhttp.DefaultClient() result as an unexported field in the struct, along with the exported field, and call the CloseIdleConnections using the unexported reference, if it's non-nil, leaving it up to the user code in case the HTTPClient has been customized. What do you think?

@orivej
Copy link
Contributor Author

orivej commented Dec 15, 2015

I moved the original proposal to https://github.com/orivej/go-dockerclient/tree/fix-fd-leak-v1 and added CloseIdleConnections as the HTTP transport finalizer. Since it finalizes just the transport and not the docker.Client, there is no concern about custom HTTPClients.

In one of my applications that used to exhaust the default of 1024 file descriptors after the merge of #399, finalizers manage to keep the number of outstanding connections under 60, so the new proposal will likely keep applications that relied on the old API working [1]. However, one might still prefer to close connections explicitly. What do you think of orivej@643597a on top of the current proposal?

[1] In my opinion #399 breaks API, since it considerably changes the behaviour of old New methods for applications that create and discard clients as necessary.

@jefferai
Copy link
Contributor

@fsouza If this is as big a problem as @orivej suggests than the right place to put in finalizers is in cleanhttp. The patch he's suggesting here still uses cleanhttp, just adds finalizers to the methods, but it might as well be fixed centrally (which was kind of the point of cleanhttp in the first place).

What I'm not clear on at this point is what is causing the problem in the first place. Shouldn't the normal behavior of http.Client, when being GCd, be to close idle connections?

IOW, this feels like working around a Golang bug -- which is not at all a problem, but should potentially be raised upstream.

@jefferai
Copy link
Contributor

I've done some more digging -- the connection pool in Go is implemented by http.Transport. Unfortunately this is where a lot of the issues that cleanhttp is used to solve come from -- TLS, keepalives, compression, proxy settings, etc. The point of using new transports is so that different libraries aren't stomping over each others' settings via a shared default client -- something that was causing real, hard-to-diagnose issues.

@orivej In the Consul PR you said that finalizers weren't a great option, but then you used them in this PR. Is there a reason you think they aren't? Otherwise, I'm happy to put finalizers on the transports returned from cleanhttp, which will solve this problem for everyone in one go.

@jefferai
Copy link
Contributor

Here's a playground example with the functions pulled from cleanhttp, for anyone curious: http://play.golang.org/p/XDneIRst_T

@orivej
Copy link
Contributor Author

orivej commented Dec 17, 2015

In this example I do not see finalizers be called ever until the program runs out of FDs: http://play.golang.org/p/EFyiff-O9B (it will not run on Play due to time restriction, but you can run it locally, presumably after executing ulimit -n 1024, or ulimit -n 10000 and increasing iteration count accordingly). This is why this pull request used double indirection in the finalizer: after reading The argument x must be a pointer to an object allocated <> by taking the address of a composite literal I though that single indirection is not supposed to call finalizers at all. But now I think that double indirection may actually finalize a pointer itself. My actual server that uses go-dockerclient ran out of FDs when I tried to add finalizers with single indirection, but did not with double indirection as in this pull request.

@orivej
Copy link
Contributor Author

orivej commented Dec 17, 2015

The point of using new transports is so that different libraries aren't stomping over each others' settings via a shared default client -- something that was causing real, hard-to-diagnose issues.

This is why I think cleanhttp solves this problem by helping instantiate one (or a small set of) private instances of http.Client or http.Transport per-library (as in hashicorp/consul#1517).

A reimplementation of http.Transport that shares idle connections between its instances would be a generic solution and probably is worth evaluating (and it /is/ something that should be done in the standard library if it is really good). I think I'll open a discussion at golang-nuts.

@jefferai
Copy link
Contributor

Hi @orivej,

It certainly doesn't need to use double indirection; the various places in Go itself that use SetFinalizer (for example, https://golang.org/src/net/fd_unix.go and https://golang.org/src/os/exec/exec.go) all do so on objects using single indirection. I think you are right about double indirection simply finalizing the pointer.

I'm not sure why that fixes the problem, but it does -- I could reproduce here. At http://play.golang.org/p/ufC7P8pDog you can see a modified version of your code where I simply moved the finalizer into the section of code that is in cleanhttp, and you can verify that it works. As expected, moving the client := makeClient() call above the for loop also behaves as expected (as in, not closing idle connections each loop).

I'd be very interested in knowing why the double indirection makes this difference, but if you agree that this should fix the problem, I'll make this change in cleanhttp and then the various downstream projects can just pull in new versions for their next release.

@jefferai
Copy link
Contributor

A reimplementation of http.Transport that shares idle connections between its instances would be a generic solution and probably is worth evaluating (and it /is/ something that should be done in the standard library if it is really good). I think I'll open a discussion at golang-nuts.

I think the problem here is that if the transports have different configurations (TLS, proxy, keepalive, etc.), reuse of the idle connections can be subject to different behavior than what a client might expect. I think the overall better solution would be to have Transport kill its own idle connections when it's GCd. I wouldn't be surprised if DefaultTransport never even does that, since it's around for the lifetime of the program anyways.

@orivej
Copy link
Contributor Author

orivej commented Dec 17, 2015

At http://play.golang.org/p/ufC7P8pDog you can see a modified version of your code where I simply moved the finalizer into the section of code that is in cleanhttp, and you can verify that it works.

I see that you added the second indirection to the finalizer, without which it still does not work.

@jefferai
Copy link
Contributor

@orivej Yep. I've been looking through some of the documentation on the GC in 1.5 to see if I can spot (without digging into code) why this behaves this way.

The argument x must be a pointer to an object allocated by calling new or by taking the address of a composite literal.

I do think that either the documentation is correct and the usage in the Go stdlib is faulty, or the documentation is incorrect and the usage in the Go stdlib is correct, because I agree with you -- that does seem to imply that it must be a pointer to a pointer. Thing is, even if it isn't, eventually the GC will in fact run the finalizer, but it behaves more like the documentation suggests:

The finalizer for x is scheduled to run at some arbitrary time after x becomes unreachable. There is no guarantee that finalizers will run before a program exits, so typically they are useful only for releasing non-memory resources associated with an object during a long-running program.

Whereas when using double indirection it runs now.

I think this is definitely worth a post on go-nuts or maybe even a GitHub issue regarding the discrepancy between the documentation and Go's implementation in its stdlib -- it seems like one or the other is clearly wrong.

I was starting to read https://blog.golang.org/go15gc and have a suspicion that it has something to do with tweaks to the algorithm. When using double indirection, you're creating a second object on the heap (the pointer to the pointer) which outside of the SetFinalizer call is never referenced again. I (again, not looking at the GC code) can imagine some speed optimization in the GC or even the runtime where it avoids GCing objects created in a particular function and/or for loop while in the middle of the for loop; IOW it doesn't drop all knowledge of the pointer until some later time, with the objects being marked grey as a result. Since the pointer-to-a-pointer is created in a separate function, it doesn't benefit from this optimization, is marked white, and is reaped instantly.

This all said, do we have agreement about the way forward here? SetFinalizer in cleanhttp using double indirection? Or does it make people too nervous? :-)

@orivej
Copy link
Contributor Author

orivej commented Dec 17, 2015

When I moved server out of the process in the last paste, removed the call to runtime.GC from the loop and increased iteration count, I observe that outstanding connections are typically closed when their number reaches 150, and it never reached more than 300, so I would say that finalizers with double indirection is an acceptable generic solution.

Would docker.unixClient use cleanhttp.DefaultTransport() instead of &http.Transport, or would it set its own finalizer?

@jefferai
Copy link
Contributor

@orivej Sounds good. I'll make the changes in cleanhttp now.

Regarding docker.unixClient, I guess I'd be worried about any meddling from e.g. proxies, which probably shouldn't kick in for unix:// but may. One possibility would be to expose the finalizer settin as a function in cleanhttp acting on an arbitrary *http.Transport -- then the existing code which creates its own transport can be used, but it can call that function rather than implementing its own. I'll test that and ensure it works fine, then report back.

@jefferai
Copy link
Contributor

Tested -- works fine. I'll put in a PR to cleanhttp and poke @orivej for review.

@jefferai
Copy link
Contributor

@orivej
Copy link
Contributor Author

orivej commented Dec 17, 2015

I thought of another potential problem to keep in mind. Here hashicorp/consul#688 (comment) @samprakos reported that a server consumed all FDs because clients instantiated in a loop kept them open. It looks like finalizers will completely protect clients, but servers may still need more FDs to keep more connections to the same number of clients due to client transports that are not yet finalized. Custom solutions like hashicorp/consul#1517 may still be useful (but hopefully not necessary).

@slackpad
Copy link

@orivej I think we are ok - @samprakos was running a process on the servers using the Consul API client so the finalizers would help here (the servers aren't a special case). You are right that if people point a bunch of Consul API clients at a small set of servers then they can cause problems, but the vast majority of Consul users will be pointing their Consul API clients at localhost to hit their local agent. The agent -> server RPC communication in Consul uses connection pools and multiplexed TCP streams (by way of yamux) to be as efficient as possible with server fds, though it is common to have to bump up the fd limits on Consul servers in large clusters.

@orivej
Copy link
Contributor Author

orivej commented Dec 17, 2015

I moved my second proposal for go-dockerclient to https://github.com/orivej/go-dockerclient/tree/fix-fd-leak-v2, and updated this one with the current cleanhttp. I confirmed that this edition fixes my original issue.

@jefferai
Copy link
Contributor

Great. I'll merge the PR.

@jefferai
Copy link
Contributor

@orivej @fsouza I just merged in another change which simplifies things slightly; I verified that this still works in our shared test function from Playground.

go-cleanhttp should be good to go.

@orivej
Copy link
Contributor Author

orivej commented Dec 17, 2015

Confirmed, and updated PR.

@fsouza
Copy link
Owner

fsouza commented Dec 17, 2015

@orivej @jefferai thank you guys for discussing and coming to a nice solution involving both projects. I was kinda afk today. I will go ahead and merge this pull request now.

@fsouza fsouza merged commit 9d655b3 into fsouza:master Dec 17, 2015
@jefferai
Copy link
Contributor

@fsouza No problem. I just wanted to fix this upstream if possible, because it's kind of the exact point of cleanhttp -- solving these problems in one place.

Thanks to @orivej for bringing attention to this issue and figuring out the fix!

@orivej
Copy link
Contributor Author

orivej commented Mar 10, 2016

@fsouza, JFYI, upstream cleanhttp decided to disable keep-alives for the default transport: hashicorp/go-cleanhttp#4
consul switched by default to the non-finalized pooled transport, reintroduced in cleanhttp under a new name: hashicorp/consul#1825
They claim better performance than the same transport with a finalizer, but I do not see their reason.

@slackpad
Copy link

@orivej thanks for pinging this one. The change we made in Consul isn't for better performance vs. the finalizer, it's better than the new default behavior of cleanhttp without the finalizer and with keep alives disabled which would have made the Consul API client make a new connection for each request.

@orivej
Copy link
Contributor Author

orivej commented Mar 10, 2016

@slackpad, do you know why cleanhttp made that change? If it was not for consul, then why did you not reintroduce transport finalizers in consul?

@jefferai
Copy link
Contributor

@orivej The finalizers did not work in all cases. We saw actual issues where GC wasn't triggering, presumably because due to memory being freed up it just didn't need to run. So it often worked, but it (in not-easily-reproducible fashion) did not always work.

It didn't seem like a good idea to periodically force GC. The way it is now (forcing keepalives off) is demonstrably safe, but it does have a speed cost.

We may put in options for the pooled version to have it call the idle connection cleanup function on a user-defined timer (and potentially also add the finalizer back in for connections that weren't cleaned up during the timed event). However, generally speaking the dev is aware of the kind of workload they'll be using -- lots of connections stood up and torn down, or reusing existing connections over and over -- so can make a reasoned choice about which one to use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants