-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duplicate events when filebeat is killed with SIGHUP/SIGINT #2044
Comments
Assuming that filebeat cannot wait indefinitely for an ACK (if logstash is blocked), it should wait some "sane" amount (e.g. default 5 seconds, but configurable) to fulfil the mandate of "under normal operations, no data loss or duplication should occur" |
The current behaviour is somehow expected, as the connection is "just" closed. I think the best option would be to introduce a config option like |
We've discussed this a bit in our team meeting, here is a summary. The solution proposed in this ticket (wait for a configurable amount of time before shutting down the publisher and the registrar) would be relatively easy to implement and would help, but it has the following disadvantages:
Because of these issues, we would generally prefer that we rely on #1492 to remove duplicates by de-duplication on the Elasticsearch side. However, in the Filebeat case, the de-duplication would only be effective if also spooling is used, because otherwise new UUIDs would be generated when the log lines are reread. Because spooling will come with an obvious cost (disk usage and IO), it will probably also be off by default in Filebeat. So, after all, it makes sense to implement both this ticket and #1492. Also, we realized we need to maintain a docs page explaining all the situations in which the Beats can cause duplicates or losses, similar to the Elasticsearch resiliency page. |
@ruflin Thank you - that the solution works nicely: with |
Please post all questions and issues on https://discuss.elastic.co/c/beats
before opening a Github Issue. Your questions will reach a wider audience there,
and if we confirm that there is a bug, then you can open a new issue.
For confirmed bugs, please report:
Kill filebeat with SIGHUP (as is done when using the standard service restart:
service filebeat restart
) and then restart will result in duplicate events. A SIGHUP should be considered a normal restart, and filebeat should wait for ACK from logstash before exiting.Attached are sample configurations for logstash and filebeat and a test script to run the test multiple times in sequence to reproduce.
test.tar.gz
Related: #2041
Note - for the script to work you must start logstash with the
-r
restart option. The line in the script which adds a comment to the logstash configuration causes logstash to reload the configuration and flush the lines. Logshash file output should flush everything to file after 5 seconds, but does not seem to be honouring this now...The text was updated successfully, but these errors were encountered: