Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessive disk usage - filebeat #673

Closed
markunsworth opened this issue Jan 8, 2016 · 12 comments
Closed

Excessive disk usage - filebeat #673

markunsworth opened this issue Jan 8, 2016 · 12 comments
Labels
Filebeat Filebeat

Comments

@markunsworth
Copy link

Since installing filebeat (version 1.0.0) we noticed that servers with high log volumes were running out of disk space, looking at the logfiles they weren't using up more than 10% of the total available space on the disk. The majority of logfiles are rotated nightly using logrotate.

When we restart the filebeat service the disk usage drops by 70%.

Here's our filebeat.yml config:

filebeat:
  prospectors:    
    -      
      paths:
        - /var/log/syslog    
      type: log
  registry_file: /var/lib/filebeat/registry
  config_dir:

output:
  logstash:
    enabled: true    
    hosts: ["logstash0:5555", "logstash1:5555"]    
    loadbalance: false

shipper:  
  name: my_server_name
  tags: [some_tag_value]
@markunsworth markunsworth changed the title Excessive disk usage Excessive disk usage - filebeat Jan 8, 2016
@ruflin
Copy link
Contributor

ruflin commented Jan 8, 2016

I assume that this related to the open files. Filebeat by default keeps the files open for 24 hours based on the ignore_older config. If a file is deleted but filebeat still has the file handler open, the file disappears but still takes up space on your disk. I assume you delete files quite frequently. There are two options to tackle this problem: Either lower ignore_older to the maximum time file updates could still happen or use "force_close_files" which directly closes a file when it is rotated. I recommend the first option as the second can lead to missing events on file rotation.

For reference the links to the docs:

@andrewkroh andrewkroh added the Filebeat Filebeat label Jan 8, 2016
@markunsworth
Copy link
Author

Not that frequently.

The only input for filebeat is the file /var/log/syslog which is copytruncated by logrotate once per day.

As the file is never deleted, could it be that it's holding onto the older data from before truncation occurred?

@markunsworth
Copy link
Author

Checking the openfiles for the filebeat process (pre restart) shows this:

$ sudo ls -la /proc/1097/fd/
total 0
dr-x------ 2 root root  0 Jan  8 18:58 .
dr-xr-xr-x 9 root root  0 Dec 19 09:28 ..
lrwx------ 1 root root 64 Jan  8 18:59 0 -> socket:[8665]
l-wx------ 1 root root 64 Jan  8 18:59 1 -> pipe:[8660]
lrwx------ 1 root root 64 Jan  8 18:59 10 -> socket:[8671]
lrwx------ 1 root root 64 Jan  8 18:59 11 -> socket:[8672]
lr-x------ 1 root root 64 Jan  8 18:59 12 -> /var/log/syslog.2 (deleted)
lr-x------ 1 root root 64 Jan  8 18:59 13 -> /var/log/syslog.2 (deleted)
lr-x------ 1 root root 64 Jan  8 18:59 14 -> /var/log/syslog.2 (deleted)
lr-x------ 1 root root 64 Jan  8 18:59 15 -> /var/log/syslog.2 (deleted)
lr-x------ 1 root root 64 Jan  8 18:59 16 -> /var/log/syslog.2 (deleted)
lr-x------ 1 root root 64 Jan  8 18:59 17 -> /var/log/syslog.2 (deleted)
lr-x------ 1 root root 64 Jan  8 18:59 18 -> /var/log/syslog.2 (deleted)
lr-x------ 1 root root 64 Jan  8 18:59 19 -> /var/log/syslog.2 (deleted)
l-wx------ 1 root root 64 Jan  8 18:59 2 -> pipe:[8660]
lr-x------ 1 root root 64 Jan  8 18:59 20 -> /var/log/syslog.2 (deleted)
lr-x------ 1 root root 64 Jan  8 18:59 21 -> /var/log/syslog.2 (deleted)
lr-x------ 1 root root 64 Jan  8 18:59 22 -> /var/log/syslog.2 (deleted)
lr-x------ 1 root root 64 Jan  8 18:59 23 -> /var/log/syslog.2 (deleted)
lr-x------ 1 root root 64 Jan  8 18:59 24 -> /var/log/syslog.2 (deleted)
lr-x------ 1 root root 64 Jan  8 18:59 25 -> /var/log/syslog.2 (deleted)
lr-x------ 1 root root 64 Jan  8 18:59 26 -> /var/log/syslog.2 (deleted)
lr-x------ 1 root root 64 Jan  8 18:59 27 -> /var/log/syslog.2 (deleted)
lr-x------ 1 root root 64 Jan  8 18:59 28 -> /var/log/syslog.2 (deleted)
lr-x------ 1 root root 64 Jan  8 18:59 29 -> /var/log/syslog.2 (deleted)
lrwx------ 1 root root 64 Jan  8 18:59 3 -> anon_inode:[eventpoll]
lr-x------ 1 root root 64 Jan  8 18:59 30 -> /var/log/syslog.1
lrwx------ 1 root root 64 Jan  8 18:59 31 -> socket:[47415580]
l-wx------ 1 root root 64 Jan  8 18:59 4 -> pipe:[8660]
lrwx------ 1 root root 64 Jan  8 18:59 5 -> socket:[8666]
lrwx------ 1 root root 64 Jan  8 18:59 6 -> socket:[8667]
lrwx------ 1 root root 64 Jan  8 18:59 7 -> socket:[8668]
lrwx------ 1 root root 64 Jan  8 18:59 8 -> socket:[8669]
lrwx------ 1 root root 64 Jan  8 18:59 9 -> socket:[8670]

Then after the process is restarted:

$ sudo ls -la /proc/19397/fd
total 0
dr-x------ 2 root root  0 Jan  8 19:05 .
dr-xr-xr-x 9 root root  0 Jan  8 19:04 ..
lrwx------ 1 root root 64 Jan  8 19:05 0 -> socket:[48122897]
l-wx------ 1 root root 64 Jan  8 19:05 1 -> pipe:[48121922]
lrwx------ 1 root root 64 Jan  8 19:05 10 -> socket:[48122903]
lrwx------ 1 root root 64 Jan  8 19:05 11 -> socket:[48122904]
lrwx------ 1 root root 64 Jan  8 19:05 12 -> socket:[48122906]
lr-x------ 1 root root 64 Jan  8 19:05 13 -> /var/log/syslog
l-wx------ 1 root root 64 Jan  8 19:05 2 -> pipe:[48121922]
lrwx------ 1 root root 64 Jan  8 19:05 3 -> anon_inode:[eventpoll]
l-wx------ 1 root root 64 Jan  8 19:05 4 -> pipe:[48121922]
lrwx------ 1 root root 64 Jan  8 19:05 5 -> socket:[48122898]
lrwx------ 1 root root 64 Jan  8 19:05 6 -> socket:[48122899]
lrwx------ 1 root root 64 Jan  8 19:05 7 -> socket:[48122900]
lrwx------ 1 root root 64 Jan  8 19:05 8 -> socket:[48122901]
lrwx------ 1 root root 64 Jan  8 19:05 9 -> socket:[48122902

@ruflin
Copy link
Contributor

ruflin commented Jan 8, 2016

So the file /var/log/syslog.2 is deleted every 24 hours, correct? And you use the default ignore_older config?

Could you update to filebeat 1.0.1 to see if the issue persists or even going to the snapshot of 1.1? We fixed there one issue related to closing files. Even though I think it is not to the case that we have here, it would be good to know if the never version resolves it.

Could your share part of your log file written by filebeat? It would be good to see what filebeat thinks about the "last modified date" of the files deleted.

@markunsworth
Copy link
Author

I think my issue was more down to a misconfiguration of logrotate. We hadn't actually specified copytruncate so I think the files were just being rotated. I have modified the logrotate configuration to actually state copytruncate which means the file descriptor for /var/log/syslog no longer changes each day.

However, it does seems odd that filebeat would hold onto so many deleted files especially as they would have been deleted more than 24 hours ago.

@ruflin
Copy link
Contributor

ruflin commented Jan 14, 2016

@markunsworth Yes, filebeat should not keep the file open if they were not modified. 1.0.1 should have here some improvements and more to come: #718

Let me know if you still have the problem with your changes you made and version 1.0.1. In case the problem is solved, please close the issue.

@mohamabid
Copy link

@ruflin I am running into the same issue. I have tried both 1.1.0 release and Feb 9th nightly release with close_older flag. Filebeat does not release the old files.

I have also tried to test with force_close_files: true with no success, I was hoping force close would release the files on rotation.

[root]# ls -l /proc/30458/fd
total 0
lr-x------ 1 root root 64 Feb 11 01:01 0 -> /dev/urandom
l-wx------ 1 root root 64 Feb 11 01:01 1 -> pipe:[1642605]
lr-x------ 1 root root 64 Feb 11 01:03 10 -> /mnt/ephemeral0/TOPOLOGY-3-1455070740-worker-6700.log
lr-x------ 1 root root 64 Feb 11 01:03 11 -> /mnt/ephemeral0/storm/tracer/tracer.TOPOLOGY-3-1455070740-worker-6701.log.200 (deleted)
lr-x------ 1 root root 64 Feb 11 01:01 12 -> /mnt/ephemeral0/storm/TOPOLOGY-3-1455070740-worker-6701.log.9 (deleted)
lr-x------ 1 root root 64 Feb 11 01:03 13 -> /mnt/ephemeral0/storm/TOPOLOGY-3-1455070740-worker-6701.log.9 (deleted)
lr-x------ 1 root root 64 Feb 11 01:03 14 -> /mnt/ephemeral0/storm/tracer/tracer.TOPOLOGY-3-1455070740-worker-6701.log.200 (deleted)
lr-x------ 1 root root 64 Feb 11 01:03 16 -> /mnt/ephemeral0/storm/TOPOLOGY-3-1455070740-worker-6701.log
lr-x------ 1 root root 64 Feb 11 01:03 17 -> /mnt/ephemeral0/storm/tracer/tracer.TOPOLOGY-3-1455070740-worker-6701.log
l-wx------ 1 root root 64 Feb 11 01:01 2 -> pipe:[1642605]
lr-x------ 1 root root 64 Feb 11 01:01 20 -> /mnt/ephemeral0/storm/supervisor.log
lr-x------ 1 root root 64 Feb 11 01:01 23 -> /mnt/ephemeral0/storm/tracer/tracer.TOPOLOGY-3-1455070740-worker-6701.log.200 (deleted)
lrwx------ 1 root root 64 Feb 11 01:01 3 -> /var/log/filebeat/mybeat.log
lr-x------ 1 root root 64 Feb 11 01:01 36 -> /mnt/ephemeral0/storm/TOPOLOGY-3-1455070740-worker-6701.log.9 (deleted)
lr-x------ 1 root root 64 Feb 11 01:01 37 -> /mnt/ephemeral0/storm/tracer/tracer.TOPOLOGY-3-1455070740-worker-6701.log.200 (deleted)
l-wx------ 1 root root 64 Feb 11 01:01 4 -> pipe:[1642605]
lrwx------ 1 root root 64 Feb 11 01:01 5 -> socket:[1642612]
lr-x------ 1 root root 64 Feb 11 01:03 6 -> /mnt/ephemeral0/storm/TOPOLOGY-3-1455070740-worker-6700.log.err
lrwx------ 1 root root 64 Feb 11 01:01 7 -> [eventpoll]
lr-x------ 1 root root 64 Feb 11 01:01 8 -> /mnt/ephemeral0/storm/TOPOLOGY-3-1455070740-worker-6701.log.9 (deleted)
lr-x------ 1 root root 64 Feb 11 01:01 9 -> /mnt/ephemeral0/storm/tracer/tracer.TOPOLOGY-3-1455070740-worker-6701.log.200 (deleted)

################################ More than 5 minutes later ############################
[root]# ls -l /proc/30458/fd
total 0
lr-x------ 1 root root 64 Feb 11 01:08 0 -> /dev/urandom
l-wx------ 1 root root 64 Feb 11 01:08 1 -> pipe:[1642605]
lr-x------ 1 root root 64 Feb 11 01:03 10 -> /mnt/ephemeral0/storm/TOPOLOGY-3-1455070740-worker-6701.log.9 (deleted)
lr-x------ 1 root root 64 Feb 11 01:03 11 -> /mnt/ephemeral0/storm/tracer/tracer.TOPOLOGY-3-1455070740-worker-6701.log.200 (deleted)
lr-x------ 1 root root 64 Feb 11 01:08 12 -> /mnt/ephemeral0/storm/TOPOLOGY-3-1455070740-worker-6701.log.9 (deleted)
lr-x------ 1 root root 64 Feb 11 01:03 13 -> /mnt/ephemeral0/storm/TOPOLOGY-3-1455070740-worker-6701.log.9 (deleted)
lr-x------ 1 root root 64 Feb 11 01:03 14 -> /mnt/ephemeral0/storm/tracer/tracer.TOPOLOGY-3-1455070740-worker-6701.log.200 (deleted)
lr-x------ 1 root root 64 Feb 11 01:08 15 -> /mnt/ephemeral0/storm/tracer/tracer.TOPOLOGY-3-1455070740-worker-6701.log.200 (deleted)
lr-x------ 1 root root 64 Feb 11 01:03 16 -> /mnt/ephemeral0/storm/TOPOLOGY-3-1455070740-worker-6701.log.9 (deleted)
lr-x------ 1 root root 64 Feb 11 01:03 17 -> /mnt/ephemeral0/storm/tracer/tracer.TOPOLOGY-3-1455070740-worker-6701.log.200 (deleted)
lr-x------ 1 root root 64 Feb 11 01:08 18 -> /mnt/ephemeral0/storm/TOPOLOGY-3-1455070740-worker-6701.log.9 (deleted)
lr-x------ 1 root root 64 Feb 11 01:08 19 -> /mnt/ephemeral0/storm/tracer/tracer.TOPOLOGY-3-1455070740-worker-6701.log.200 (deleted)
l-wx------ 1 root root 64 Feb 11 01:08 2 -> pipe:[1642605]
lr-x------ 1 root root 64 Feb 11 01:08 20 -> /mnt/ephemeral0/storm/supervisor.log
lr-x------ 1 root root 64 Feb 11 01:08 21 -> /mnt/ephemeral0/storm/TOPOLOGY-3-1455070740-worker-6701.log.9 (deleted)
lr-x------ 1 root root 64 Feb 11 01:08 22 -> /mnt/ephemeral0/storm/tracer/tracer.TOPOLOGY-3-1455070740-worker-6701.log.200 (deleted)
lr-x------ 1 root root 64 Feb 11 01:08 23 -> /mnt/ephemeral0/storm/tracer/tracer.TOPOLOGY-3-1455070740-worker-6701.log.200 (deleted)
lr-x------ 1 root root 64 Feb 11 01:08 24 -> /mnt/ephemeral0/storm/TOPOLOGY-3-1455070740-worker-6701.log.9 (deleted)
lr-x------ 1 root root 64 Feb 11 01:08 25 -> /mnt/ephemeral0/storm/tracer/tracer.TOPOLOGY-3-1455070740-worker-6701.log.200 (deleted)
lr-x------ 1 root root 64 Feb 11 01:08 26 -> /mnt/ephemeral0/storm/tracer/tracer.TOPOLOGY-3-1455070740-worker-6701.log.200 (deleted)
lr-x------ 1 root root 64 Feb 11 01:08 27 -> /mnt/ephemeral0/storm/TOPOLOGY-3-1455070740-worker-6701.log
lr-x------ 1 root root 64 Feb 11 01:08 28 -> /mnt/ephemeral0/storm/tracer/tracer.TOPOLOGY-3-1455070740-worker-6701.log.200 (deleted)
lr-x------ 1 root root 64 Feb 11 01:08 29 -> /mnt/ephemeral0/storm/TOPOLOGY-3-1455070740-worker-6701.log.9 (deleted)
lrwx------ 1 root root 64 Feb 11 01:08 3 -> /var/log/filebeat/mybeat.log
lr-x------ 1 root root 64 Feb 11 01:08 30 -> /mnt/ephemeral0/storm/TOPOLOGY-3-1455070740-worker-6701.log.9 (deleted)
lr-x------ 1 root root 64 Feb 11 01:08 31 -> /mnt/ephemeral0/storm/tracer/tracer.TOPOLOGY-3-1455070740-worker-6701.log
lr-x------ 1 root root 64 Feb 11 01:08 36 -> /mnt/ephemeral0/storm/TOPOLOGY-3-1455070740-worker-6701.log.9 (deleted)
lr-x------ 1 root root 64 Feb 11 01:08 37 -> /mnt/ephemeral0/storm/tracer/tracer.TOPOLOGY-3-1455070740-worker-6701.log.200 (deleted)
l-wx------ 1 root root 64 Feb 11 01:08 4 -> pipe:[1642605]
lrwx------ 1 root root 64 Feb 11 01:08 5 -> socket:[1642612]
lr-x------ 1 root root 64 Feb 11 01:03 6 -> /mnt/ephemeral0/storm/tracer/tracer.TOPOLOGY-3-1455070740-worker-6701.log.200 (deleted)
lrwx------ 1 root root 64 Feb 11 01:08 7 -> [eventpoll]
lr-x------ 1 root root 64 Feb 11 01:08 8 -> /mnt/ephemeral0/storm/TOPOLOGY-3-1455070740-worker-6701.log.9 (deleted)
lr-x------ 1 root root 64 Feb 11 01:08 9 -> /mnt/ephemeral0/storm/tracer/tracer.TOPOLOGY-3-1455070740-worker-6701.log.200 (deleted)
[root]# date
Thu Feb 11 01:09:20 UTC 2016

My Config:

prospectors:
-
paths:
- /var/log/storm/.log
- /var/log/storm/tracer/tracer
.log
- /var/log/storm/.log.err
- /var/log/storm/
.log.out
exclude_files: [".gz$"]
input_type: log
ignore_older: 5m
close_older: 5m
force_close_files: true
#scan_frequency: 10s
#harvester_buffer_size: 16384
#max_bytes: 10485760
multiline:
pattern: ^[0-9]
negate: true
match: after
max_lines: 100

filebeat-holdingon.txt

@ruflin
Copy link
Contributor

ruflin commented Feb 11, 2016

This could be related to the issue here: #922

@markunsworth
Copy link
Author

Since changing our logrotate configs and upgrading to filebeat 1.0.1 we're no longer getting the same fileusage problems and we don't see filebeat holding onto deleted file handles.

However, I think that is more likely down to the change in the logrotate functionality as we weren't using copytruncate previously so by default the syslog file would just be renamed

 syslog -> syslog.1 -> syslog.2 -> deleted

meaning filebeat was holding onto the original handle as the file was renamed, but also picking up the newly created syslog file as that was the file in the prospector paths.

@mohamabid - I would suggest making sure your logrotate scripts are set up so that files aren't moved/renamed to fix this quickly

@ruflin
Copy link
Contributor

ruflin commented Feb 11, 2016

@markunsworth Interesting note. It "should" work in both cases. In our system tests we always use the renaming of files and creating a new one. If I understand you right that is what didn't work before?

Now also version 1.1 is available.

@markunsworth
Copy link
Author

It held onto the renamed ones but didn't release them when they were eventually deleted

@ruflin
Copy link
Contributor

ruflin commented Apr 27, 2016

We did several improvements on the file handler closing. 1.2.2 is now out. Closing this issue. Please reopen if this happens again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Filebeat Filebeat
Projects
None yet
Development

No branches or pull requests

4 participants