Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filebeat losing connection to Kafka #1432

Closed
gossu opened this issue Apr 20, 2016 · 2 comments
Closed

Filebeat losing connection to Kafka #1432

gossu opened this issue Apr 20, 2016 · 2 comments

Comments

@gossu
Copy link

gossu commented Apr 20, 2016

I'm using latest nightly build of Filebeat and Kafka 0.9.0.1, both on the same machine.
My setup is that Filebeat is reading logs from /var/log/messages and then publishing them to Kafka.

The whole thing is working fine for about 30mins and after that Filebeat suddenly stops being able to communicate with Kafka.

Restarting the filebeat service then fixes the issue.

Below the logs I'm getting at the moment when the problem occurs:

2016-04-18T13:20:14+03:00 DBG  output worker: publish 2 events
2016-04-18T13:20:14+03:00 DBG  guaranteed flag is set
2016-04-18T13:20:14+03:00 DBG  publish events with attempts=-1
2016-04-18T13:20:14+03:00 DBG  forwards msg with attempts=-1
2016-04-18T13:20:14+03:00 DBG  message forwarded
2016-04-18T13:20:14+03:00 DBG  events from worker worker queue
2016-04-18T13:20:14+03:00 DBG  publish events
2016-04-18T13:20:14+03:00 WARN producer/broker/[1 859530387616] state change to [closing] because %!s(MISSING)
2016-04-18T13:20:14+03:00 WARN Closed connection to broker [172.24.33.7:9092]
2016-04-18T13:20:14+03:00 DBG  Kafka publish failed with: EOF
2016-04-18T13:20:14+03:00 DBG  Kafka publish failed with: EOF
2016-04-18T13:20:14+03:00 DBG  finished kafka batch
2016-04-18T13:20:14+03:00 DBG  handlePublishEventsResult
2016-04-18T13:20:14+03:00 DBG  handle publish error: EOF
2016-04-18T13:20:14+03:00 INFO Error publishing events (retrying): EOF

After that I'm getting infinite number of repeated logs looking like that:

2016-04-18T13:20:14+03:00 DBG  forwards msg with attempts=-1
2016-04-18T13:20:14+03:00 DBG  message forwarded
2016-04-18T13:20:14+03:00 DBG  events from retries queue
2016-04-18T13:20:14+03:00 DBG  publish events
2016-04-18T13:20:14+03:00 DBG  Kafka publish failed with: EOF
2016-04-18T13:20:14+03:00 DBG  Kafka publish failed with: EOF
2016-04-18T13:20:14+03:00 DBG  finished kafka batch
2016-04-18T13:20:14+03:00 DBG  handlePublishEventsResult
2016-04-18T13:20:14+03:00 DBG  handle publish error: EOF
2016-04-18T13:20:14+03:00 INFO Error publishing events (retrying): EOF

Observed on Centos 7.
Might not be so obvious to reproduce as for me it only happens in "random" moments.
Previously discussed here: https://discuss.elastic.co/t/filebeat-loses-connection-to-kafka/47660

@urso
Copy link

urso commented Apr 27, 2016

I've been digging through the sarama lib (third party lib used to connect to kafka). The library is supposed to automatically reconnect on failure.

At this line: 2016-04-18T13:20:14+03:00 WARN producer/broker/[1 859530387616] state change to [closing] because %!s(MISSING) The broker detected some error (unfortunately error message is not really helpful), closes the broker connection and removes the broker from list of brokers used by active client. The next time the client tries to push some events, the client will look up known/cached brokerProducer and won't find any, thus it looks up the leading broker. At this point the lib will try to reconnect to kafka.

Having no additional error messages saying otherwise it looks like the reconnect was successful. But I have no idea where the subsequent EOF's are coming from.

For how long do you wait with filebeat restart? You use TLS when connecting filebeat->kafka? Anything in Kafka logs?

@urso
Copy link

urso commented Apr 28, 2016

Log messages have been fixed with #1516

Problem is related to IBM/sarama#294. Workaround for output hanging to be done soonish.

@tsg tsg closed this as completed in #1543 May 2, 2016
tsg pushed a commit that referenced this issue May 2, 2016
* Laziliy init kafka outputs

Initialize output modes with/without guaranteed retry policy.

* Add tryPushFailed to lb/context
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants