Goroutine leak in fluxd? #1602

brantb · 2018-12-14T15:46:10Z

We've observed fluxd's memory usage slowly increasing until it reaches the memory limit (arbitrarily set to 300Mi by us) and is OOMKilled by Kubernetes:

With @ncabatoff's guidance I used the profiler to dump the goroutines and found several thousand. Here's a gist with the output of lsof and netstat plus the goroutine list. (10.244.3.20 is the IP address assigned to the flux-memcached pod)

We also have a lot of logs like the following:

ts=2018-12-14T15:11:47.467587321Z caller=warming.go:192 component=warmer canonical_name=mcr.microsoft.com/k8s/metrics/adapter auth={map[]} err="requesting tags: json: cannot unmarshal array into Go value of type struct { Tags []string \"json:\\\"tags\\\"\" }"
ts=2018-12-14T15:11:51.219396002Z caller=warming.go:192 component=warmer canonical_name=mcr.microsoft.com/k8s/aad-pod-identity/mic auth={map[]} err="requesting tags: json: cannot unmarshal array into Go value of type struct { Tags []string \"json:\\\"tags\\\"\" }"
ts=2018-12-14T15:11:54.583460254Z caller=warming.go:192 component=warmer canonical_name=iqsandbox.azurecr.io/gameday/quackserver auth={map[]} err="requesting tags: Get https://iqsandbox.azurecr.io/v2/gameday/quackserver/tags/list: unauthorized: authentication required"
ts=2018-12-14T15:11:55.710904979Z caller=warming.go:192 component=warmer canonical_name=mcr.microsoft.com/k8s/aad-pod-identity/nmi auth={map[]} err="requesting tags: json: cannot unmarshal array into Go value of type struct { Tags []string \"json:\\\"tags\\\"\" }"

iqsandbox.azurecr.io is a container registry which doesn't have a pull secret present in the namespace Flux is running in. Anecdotally, this issue seems to have started (or gotten worse) around the time we started running containers using images from mcr.microsoft.com, so maybe something in that code path is the culprit?

I'll be happy to provide any additional diagnostic info you need. Thanks again to @ncabatoff for walking me through this so far (I don't have any experience with the golang toolchain and his help was invaluable).

The text was updated successfully, but these errors were encountered:

ncabatoff · 2018-12-14T15:52:05Z

One extra detail: the reason we think the memory increase is due to the goroutines is because @brantb also produced a pprof heap SVG which shows that the heap accounts for only ~80/262Mi memory used.

brantb · 2018-12-14T16:31:18Z

Whoops, I forgot to include that. Here's the heap graph.

2opremio · 2019-01-17T16:39:27Z

@brantb left some more info on Slack today https://weave-community.slack.com/archives/C4U5ATZ9S/p1547737322675900

2opremio · 2019-01-17T17:59:25Z

@brantb I am not 100% that #1672 will solve the problem but it will surely help

Interestingly, the amount of authentication-error logs from the warmer roughly matches the amount of descriptors leaked:

# grep component=warmer "flux-log.txt"  | grep unauthorized | wc -l
    3791

# cat /proc/$FLUXPID/net/sockstat
sockets: used 3808
TCP: inuse 16 orphan 0 tw 27 alloc 3046 mem 2750
UDP: inuse 0 mem 0
UDPLITE: inuse 0
RAW: inuse 0
FRAG: inuse 0 memory 0

brantb · 2019-01-29T14:30:42Z

We've been running master-2441121d for over a week now and the memory profile looks much healthier:

I'm going to call this one a win. Thanks again, @2opremio & @ncabatoff!

davidkarlsen · 2019-01-29T16:55:04Z

@brantb nice UI - what is it?

brantb · 2019-01-29T17:05:51Z

@davidkarlsen That's Azure Monitor, for their managed Kubernetes service (AKS). 😄

agcooke mentioned this issue Jan 8, 2019

Flux leaks file descriptors and runs out of file descriptors #1639

Closed

2opremio mentioned this issue Jan 17, 2019

Improve handling of registry challenge requests #1672

Merged

brantb closed this as completed Jan 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Goroutine leak in fluxd? #1602

Goroutine leak in fluxd? #1602

brantb commented Dec 14, 2018 •

edited

Loading

ncabatoff commented Dec 14, 2018

brantb commented Dec 14, 2018 •

edited

Loading

2opremio commented Jan 17, 2019

2opremio commented Jan 17, 2019 •

edited

Loading

brantb commented Jan 29, 2019

davidkarlsen commented Jan 29, 2019

brantb commented Jan 29, 2019

Goroutine leak in fluxd? #1602

Goroutine leak in fluxd? #1602

Comments

brantb commented Dec 14, 2018 • edited Loading

ncabatoff commented Dec 14, 2018

brantb commented Dec 14, 2018 • edited Loading

2opremio commented Jan 17, 2019

2opremio commented Jan 17, 2019 • edited Loading

brantb commented Jan 29, 2019

davidkarlsen commented Jan 29, 2019

brantb commented Jan 29, 2019

brantb commented Dec 14, 2018 •

edited

Loading

brantb commented Dec 14, 2018 •

edited

Loading

2opremio commented Jan 17, 2019 •

edited

Loading