-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Goroutine leak #854
Comments
|
It looks like we are leaking goroutines. Number of goroutines keeps growing until the process is killed.
|
As I understand it we are leaking HTTP connections. I tried
|
I think we might be leaking goroutines from keep-alive connections in the http client's Transport struct we use to request Elasticsearch. That client along with its transport profile gets recreated at each reconciliation. I did some experiments (nice way to spend time in the plane 😄): func main() {
// test http server
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
io.WriteString(w, "Hello world")
}))
// simulate reconciliation loop
var memStats runtime.MemStats
for {
// create a new client and a new transport at every iteration
transport := &http.Transport{
// DisableKeepAlives: true,
}
c := &http.Client{Transport: transport}
resp, _ := c.Get(server.URL)
io.Copy(ioutil.Discard, resp.Body)
resp.Body.Close()
// close idle keep-alive connections
// transport.CloseIdleConnections()
runtime.ReadMemStats(&memStats)
log.Printf("%d goroutines, %d heapalloc", runtime.NumGoroutine(), memStats.HeapAlloc)
time.Sleep(10 * time.Millisecond)
}
} Output:
Notice how both the number of goroutines and the heap size increase over time. Now, output of the same code but this time I uncomment
Number of goroutines stays constant over time (4-5), we see heap size increase but is regularly reset (GC doing its job). It looks like if we don't explicitly close idle keep-alive connections, the http client's transport is not terminated and garbage collected as we would expect, and keeps long-lived connections forever. Effectively increasing goroutines and heap over time. Some options to fix that:
Generally it seems Go http client and transport are aimed to be thread-safe and optimized for reuse. I think we should aim for 3. in the long term. Since we also have Observers that persist a connection to the ES cluster, I think we could encapsulate a cluster's client that we keep around to be reused both by observers and reconciliations, for a given cluster. Note that I'm not sure this is the exact (only?) problem we have, I did not test these settings yet in the operator. Edit: I just noticed @pebrc last comment about how disabling keep-alive connections does not help, so that's probably not the problem :/ Did you disable it through |
I did more testing, in the operator codebase this time. See this debugging branch: master...sebgl:debug-memleak In that branch I output the current number of goroutines and heap size every 100 ms. I changed the implementation of both ES client and process manager client to close leftover idle connections from the transport once the client is not required. I now get a stable number of goroutines! 🎉 I think we've spotted the problem :) |
Nice work @sebgl! (and yes I used |
When deploying multiple Elasticsearch clusters (here 5) I am seeing the operator process being killed repeatedly because it exceeds its memory limit
The text was updated successfully, but these errors were encountered: