Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for fail to remove state where file was deleted and clean_removed was set. #2061

Merged
merged 1 commit into from
Jul 20, 2016
Merged

Conversation

treff7es
Copy link

We ran filbeat on tens of files which are rotated every hour and get archived (gzipped) after a certain amount of hours.
We saw that some logfiles were not harvested after an hour and few hours later basically none of the logfile was harvested. Based on the debug logs it was due to inode reuse which happened quite frequently in our system (ubuntu linux/ext4) So basically when a file get archived then the next file get the it's inode and it confused filebeat.
To solve this we set the clean_removed property which should solve our issue but it did not.
See the debug log output:
2016-07-17T19:59:01+02:00 DBG New state added for /mnt/scribe/jseditor_structured/jseditor_structured-2016-07-17_00000
2016-07-17T19:59:02+02:00 DBG Cleanup state for file as file removed: /mnt/scribe/jseditor_structured/jseditor_structured-2016-07-17_00000
2016-07-17T19:59:03+02:00 DBG Cleanup state for file as file removed: /mnt/scribe/jseditor_structured/jseditor_structured-2016-07-17_00000
2016-07-17T19:59:03+02:00 DBG Cleanup state for file as file removed: /mnt/scribe/jseditor_structured/jseditor_structured-2016-07-17_00000
2016-07-17T19:59:04+02:00 DBG Cleanup state for file as file removed: /mnt/scribe/jseditor_structured/jseditor_structured-2016-07-17_00000
2016-07-17T19:59:05+02:00 DBG Cleanup state for file as file removed: /mnt/scribe/jseditor_structured/jseditor_structured-2016-07-17_00000
2016-07-17T19:59:05+02:00 DBG Cleanup state for file as file removed: /mnt/scribe/jseditor_structured/jseditor_structured-2016-07-17_00000
2016-07-17T19:59:06+02:00 DBG Cleanup state for file as file removed: /mnt/scribe/jseditor_structured/jseditor_structured-2016-07-17_00000
2016-07-17T19:59:07+02:00 DBG Cleanup state for file as file removed: /mnt/scribe/jseditor_structured/jseditor_structured-2016-07-17_00000
2016-07-17T19:59:07+02:00 DBG Cleanup state for file as file removed: /mnt/scribe/jseditor_structured/jseditor_structured-2016-07-17_00000
2016-07-17T19:59:08+02:00 DBG New state added for /mnt/scribe/jseditor_structured/jseditor_structured-2016-07-17_00000

As you can see based on the logs the jseditor_structured logcategory should get removed earlier but it updated the state later.

Based on the code if a file gets deleted it set the ttl to zero and it sends an event and later it should be deleted. In reality it did not happen if clean_inactive was not set:
https://github.com/elastic/beats/blob/master/filebeat/prospector/prospector_log.go#L55

Now added an extra check to clean states if clean_removed set as well.

With this in our usecase it does not stop harvesting files after hours.

@elasticsearch-release
Copy link

Jenkins standing by to test this. If you aren't a maintainer, you can ignore this comment. Someone with commit access, please review this and clear it for Jenkins to run; then say 'jenkins, test it'.

1 similar comment
@elasticsearch-release
Copy link

Jenkins standing by to test this. If you aren't a maintainer, you can ignore this comment. Someone with commit access, please review this and clear it for Jenkins to run; then say 'jenkins, test it'.

@tsg tsg added the Filebeat Filebeat label Jul 19, 2016
@tsg
Copy link
Contributor

tsg commented Jul 19, 2016

LGTM, waiting also for @ruflin to review this one.

@treff7es just making sure, the above output is from running master, right? FWIK, clean_removed is not available in any released version as of today.

@tsg tsg added the review label Jul 19, 2016
@treff7es
Copy link
Author

The one I tested was from the master a few days ago.

@ruflin
Copy link
Contributor

ruflin commented Jul 20, 2016

LGTM. The problem is our tests check if the state is removed from the registry file, and that happens correctly. The prospector initiates all the cleanup but didn't remove the state from memory.

@treff7es Thanks a lot for testing this and directly provide a fix. I will have to add further tests for the new features. Let me know in case you hit some other issues.

@ruflin ruflin merged commit b78cb9c into elastic:master Jul 20, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants