Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Filebeat] Memory leak associated with failure to setup harvesters #6797

Closed
jeremydonahue opened this issue Apr 6, 2018 · 3 comments
Closed
Assignees
Labels

Comments

@jeremydonahue
Copy link

Filebeat has a memory leak which is exposed by repeated failure to setup harvesters (eg. permission denied). This ticket is a result of the original discussion on the community support site.

From what I can tell, this comes down to channel.SubOutlet being created for the new harvester, but not cleaned up properly when h.Setup() throws an exception. I think this is because something that happens in harvester.Run() is what actually cleans up the outlet. Another possibility is that the new harvester is getting added to a state or registry structure somewhere and forgotten, so garbage collection isn't reaping it.

Here's some output from pprof. I've also attached the pprof file here.

33159.58kB of 33159.58kB total (  100%)
Dropped 308 nodes (cum <= 165.80kB)
Showing top 10 nodes out of 51 (cum >= 512.04kB)
      flat  flat%   sum%        cum   cum%
19975.92kB 60.24% 60.24% 19975.92kB 60.24%  runtime.malg
 7680.66kB 23.16% 83.40%  7680.66kB 23.16%  github.com/elastic/beats/filebeat/channel.SubOutlet
 2048.19kB  6.18% 89.58%  2048.19kB  6.18%  github.com/elastic/beats/filebeat/prospector/log.NewHarvester
 1357.91kB  4.10% 93.68%  1357.91kB  4.10%  runtime.allgadd
 1024.08kB  3.09% 96.76%  1024.08kB  3.09%  runtime.acquireSudog
  544.67kB  1.64% 98.41%   544.67kB  1.64%  github.com/elastic/beats/libbeat/publisher/queue/memqueue.NewBroker
  528.17kB  1.59%   100%   528.17kB  1.59%  regexp.(*bitState).reset
         0     0%   100%   528.17kB  1.59%  github.com/elastic/beats/filebeat/beater.(*Filebeat).Run
         0     0%   100%   512.04kB  1.54%  github.com/elastic/beats/filebeat/channel.CloseOnSignal.func1
         0     0%   100%   512.04kB  1.54%  github.com/elastic/beats/filebeat/channel.SubOutlet.func1
(pprof) list SubOutlet
Total: 32.38MB
ROUTINE ======================== github.com/elastic/beats/filebeat/channel.SubOutlet in /home/jeremy/src/go/src/github.com/elastic/beats/filebeat/channel/util.go
    7.50MB     7.50MB (flat, cum) 23.16% of Total
         .          .     15:// SubOutlet create a sub-outlet, which can be closed individually, without closing the
         .          .     16:// underlying outlet.
         .          .     17:func SubOutlet(out Outleter) Outleter {
         .          .     18:	s := &subOutlet{
         .          .     19:		isOpen: atomic.MakeBool(true),
       1MB        1MB     20:		done:   make(chan struct{}),
       2MB        2MB     21:		ch:     make(chan *util.Data),
    4.50MB     4.50MB     22:		res:    make(chan bool, 1),
         .          .     23:	}
         .          .     24:
         .          .     25:	go func() {
         .          .     26:		for event := range s.ch {
         .          .     27:			s.res <- out.OnEvent(event) 

profile001

Config (abridged):

filebeat.modules:
  - module: system

# yes, we know fqdn is in the `beats` field
fields_under_root: true
fields:
  source_host: {{ fqdn }}

filebeat.shutdown_timeout: 10s
filebeat.registry_flush: 30s

output.kafka:
  hosts: ["kafka-01..."]

  topic: {{ topic }}

  required_acks: 1
  compression: snappy
  client_id: '{{ client_id }}'
  keep_alive: 10m

  partition.round_robin:
    reachable_only: true

logging.level: info

Note, in the above scenario, /var/log/auth.log is the file which Filebeat can't access. We can, of course, fix the permissions to alleviate this problem, but that doesn't really fix anything.

  • Version: Filebeat 6.2.2
  • Operating System: Ubuntu 14.04. Kernel: 4.4.0-111-generic
  • Steps to Reproduce: Run Filebeat with the above config. Make sure at least 1 file found by the prospector generates a permission denied error when opening the file.
  • Notes: This is a slow leak, which makes testing it harder. I think lowering the default value of scan_frequency (eg. 1s) will make it leak memory faster.

Please let me know if there are more details I can provide, and I'm happy to test any potential solutions when we narrow down the problem.

Thanks,
Jeremy

@ruflin ruflin added bug Filebeat Filebeat labels Apr 9, 2018
@adriansr adriansr assigned adriansr and unassigned adriansr Apr 9, 2018
adriansr added a commit to adriansr/beats that referenced this issue Apr 13, 2018
This patch reorganizes a little bit how the log harvester works, so that
suboutlets are only created when the harvester is ready to use them
(inside Run()), instead of being passed during constructor.

This prevents a memory leak caused by some internal goroutines not
stopping if the harvester Setup() fails, for example when files cannot
be read.

Fixes elastic#6797
urso pushed a commit that referenced this issue Apr 13, 2018
This patch reorganizes a little bit how the log harvester works, so that
suboutlets are only created when the harvester is ready to use them
(inside Run()), instead of being passed during constructor.

This prevents a memory leak caused by some internal goroutines not
stopping if the harvester Setup() fails, for example when files cannot
be read.

Fixes #6797
@andrewkroh
Copy link
Member

Fixed in #6829 (for master).

adriansr added a commit to adriansr/beats that referenced this issue May 18, 2018
This patch reorganizes a little bit how the log harvester works, so that
suboutlets are only created when the harvester is ready to use them
(inside Run()), instead of being passed during constructor.

This prevents a memory leak caused by some internal goroutines not
stopping if the harvester Setup() fails, for example when files cannot
be read.

Fixes elastic#6797
adriansr added a commit to adriansr/beats that referenced this issue May 18, 2018
This patch reorganizes a little bit how the log harvester works, so that
suboutlets are only created when the harvester is ready to use them
(inside Run()), instead of being passed during constructor.

This prevents a memory leak caused by some internal goroutines not
stopping if the harvester Setup() fails, for example when files cannot
be read.

Fixes elastic#6797
ph pushed a commit that referenced this issue May 24, 2018
This patch reorganizes a little bit how the log harvester works, so that
suboutlets are only created when the harvester is ready to use them
(inside Run()), instead of being passed during constructor.

This prevents a memory leak caused by some internal goroutines not
stopping if the harvester Setup() fails, for example when files cannot
be read.

Fixes #6797
@jo3rg
Copy link

jo3rg commented Jun 13, 2018

@andrewkroh is this already fixed in any stable release?

@ph
Copy link
Contributor

ph commented Jun 13, 2018

This will be fixed in the 6.3.0 release and next 6.2.5, which should happen any day.

mpfz0r added a commit to Graylog2/collector-sidecar that referenced this issue Sep 21, 2018
Our bundled filebeat had a memory leak (elastic/beats#6797).

Fixes #283

While here:
 The "-configtest" option is deprecated since Beats 6.0.
 Add a version switch to avoid warning.
mariussturm pushed a commit to Graylog2/collector-sidecar that referenced this issue Oct 2, 2018
Our bundled filebeat had a memory leak (elastic/beats#6797).

Fixes #283

While here:
 The "-configtest" option is deprecated since Beats 6.0.
 Add a version switch to avoid warning.
leweafan pushed a commit to leweafan/beats that referenced this issue Apr 28, 2023
This patch reorganizes a little bit how the log harvester works, so that
suboutlets are only created when the harvester is ready to use them
(inside Run()), instead of being passed during constructor.

This prevents a memory leak caused by some internal goroutines not
stopping if the harvester Setup() fails, for example when files cannot
be read.

Fixes elastic#6797
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants