-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fluentd fails to cleanup buffer metadata #2593
Comments
Hmm... I'm not sure why this happens because fluentd just deletes metadata file after chunk flush:
fluentd doesn't truncate metadata file so 0 size metadata is curious situation.
By no error logs, I assume this problem happens outside of fluentd. |
Thanks, that is helpful. Any migration tips for moving to file_single without losing logs? Assuming that it wouldn't know what to do with the existing buffer files if I did a hard cutover. |
I guess maybe leave both configurations in place for a while, but have the old one match on nothing? |
I did find a number of warnings like these that were from around the time the issue happened. Is it possible that fluentd might create meta files, and then leave it there empty if the chunk fails to be written to disk? |
Yep :) Filled up most of the disk, then created a pod that logs a ton of junk data to keep the disk full. After doing so, about 8000 empty meta files were generated within about a minute. Tried out td-agent 1.7.0 as well, same behavior. Here's a log where it created a meta file that has no data in it (wasn't seeing any logs about them until I removed
And the actual file:
|
Thank you for your hard work to dig the root cause! Your information was very helpful to us. |
Great! Thanks for fixing that up so quickly! |
Problem: Fluentd will start creating hundreds of thousands of 0 byte buffer log files when the partition it is on runs out of space. fluent/fluentd#2593 Solution: Upgrade gems version Issue: rancher/rancher#22689
Problem: Fluentd will start creating hundreds of thousands of 0 byte buffer log files when the partition it is on runs out of space. fluent/fluentd#2593 Solution: Upgrade gems version Issue: rancher/rancher#22689
Problem: Fluentd will start creating hundreds of thousands of 0 byte buffer log files when the partition it is on runs out of space. fluent/fluentd#2593 Solution: Upgrade gems version Issue: rancher/rancher#22689
Check CONTRIBUTING guideline first and here is the list to help us investigate the problem.
Describe the bug
I have fluentd tailing kubernetes logs, and sending them to elasticsearch and s3. Both outputs are configured to use file buffers in order to avoid the loss of logs if something happens to the fluentd pod. On one cluster in particular, the s3 file buffer has been filling up with a huge number of empty buffer metadata files (all zero bytes), to the point that it uses up all the inodes on the volume. There is not associated log buffer file, just the metadata. Some of these empty metadata files were months old. I had to go manually clean up all the zero byte metadata files in order to restore cluster functionality. Looking for some insight into why these blank metadata files are created, and why they are never cleaned up by fluentd.
In the past, we were seeing issues with fluentd throughput, and the log buffers would get backed up and fill up all the disk space, so I do have it configured to aggressively create and flush chunks.
${archive_path} is configured from kubernetes metadata like so:
${'container-logs/' + record['kubernetes']['namespace_name'] + '/' + record['kubernetes']['container_name'] + '/' + record['kubernetes']['pod_name'] + '/'}
. That along with the 60s chunk interval can make for a lot of chunks when many pods are redeployed, but I still would not expect to see over a million chunks, and it doesn't explain why these empty metadata files are left in place, seemingly forever.Also asked a question about this here, but it does seem like it's a different issue than what the original question was about so I figured I'd create a new one.
To Reproduce
Still working on reproducing this behavior in a dev cluster. Best I can tell, lots of pod redeploys and nightly cronjob pods.
Expected behavior
Empty metadata files should be cleaned up by fluentd.
Your Environment
fluentd --version
ortd-agent --version
td-agent 1.3.3
cat /etc/os-release
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1967.6.0
VERSION_ID=1967.6.0
BUILD_ID=2019-02-12-2138
PRETTY_NAME="Container Linux by CoreOS 1967.6.0 (Rhyolite)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"
COREOS_BOARD="amd64-usr"
uname -r
4.14.96-coreos-r1
If you hit the problem with older fluentd version, try latest version first.
Your Configuration
Your Error Log
Couldn't find any error logs. Even with debug mode on and removing the ignore_error flag.
The text was updated successfully, but these errors were encountered: