Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support sending logs to Kafka, instead of directly to logstash #258

Closed
blysik opened this issue Aug 29, 2014 · 21 comments
Closed

Support sending logs to Kafka, instead of directly to logstash #258

blysik opened this issue Aug 29, 2014 · 21 comments

Comments

@blysik
Copy link

blysik commented Aug 29, 2014

It's useful to have a buffer for logstash indexers to pull from. (Redis, RabbitMQ, Kafka). logstash-forwarder should support sending data into Kafka.

@jordansissel
Copy link
Contributor

The current protocol the forwarder uses allows load sharing and reliable transport. What would Kafka provide?

@jordansissel
Copy link
Contributor

Similar discussion on the rabbitmq ticket, actually.

@jordansissel
Copy link
Contributor

Related: #190 (rabbitmq)

@blysik
Copy link
Author

blysik commented Aug 29, 2014

Aha! so you're suggesting that instead, I could just logstash-forwarder to logstash instances that only output to kafaka, which then are processed by other logstash instances that consume from kafka, and get the same result?

(Some logstash instances may do double duty I suppose.)

@blysik
Copy link
Author

blysik commented Aug 29, 2014

Or, I suppose, you're proposing there's no need at all for kafka.

@sybrandy
Copy link

Not sure if anyone still cares, but the main benefit of Kafka in this scenario is a durable and reliable buffer for those cases where your log traffic is exceptionally high. I've never stressed Logstash, but I can imaging that there may be cases where the amount of log traffic would be more than the current Logstash instances could handle. In those cases, I can see where the forwarder has to stop and allow Logstash to catch up. Kafka would allow all of the forwarders to continue to push log messages and Logstash itself would process the messages at it's own rate.

I'm not saying that the forwarder needs to support it. After working with Kafka for a several years, I felt I could provide a reason as to why it might be a good idea.

NOTE: Not sure how lumberjack works exactly, so I may have mucked up a detail or to. If so, I apologize.

@TinLe
Copy link

TinLe commented Nov 17, 2014

As someone who have tried various combinations of logstash-forwarder and Kafka, I would like to add to the side of being useful if ls-fwder support using Kafka as a transport.

As @sybrandy mentioned, logstash and/or ES instances can get busy and not able to handle the incoming traffic from ls-fwder. It would be nice to have a buffer+transport such as Kafka, especially if that infrastructure already exist (and it does in our case).

In our case, ls-fwder overwhelmed our LS + ES, and we ended up writing a small local tool that consume local log and push into Kafka for transport. LS consumes data from Kafka at the other end, at its own pace.

In the end, I had to drop ls-fwder completely.

@driskell
Copy link
Contributor

The lumberjack input plugin does not currently support the full lumberjack protocol. It is missing partial ACK support which would allow backoff. As such if LS cannot keep up with logs, LSF will lose connection and resend - making the situation worse. That might be why it was overwhelmed.

I think without that problem the need for a queue is more for work distribution purposes only (dynamically adding LS worker nodes pulling from a queue)

I eventually forked into Log Courier to rewrite the protocol with a back off and had no problems since with general workloads. I have some going into Redis though via a LS instance (courier in and Redis out - no filters) and that has dynamic worker pool. Could be an option here.

@bcwilsondotcom
Copy link

+1 any updates on this?

@jordansissel
Copy link
Contributor

Logstash has a Kafka output you can use for this purpose. At this time, the
only output lsf supports is to Logstash (via the lumberjack protocol).

On Monday, July 6, 2015, Brandon Wilson notifications@github.com wrote:

+1 any updates on this?


Reply to this email directly or view it on GitHub
#258 (comment)
.

@rberger
Copy link

rberger commented Jul 21, 2015

Logstash requires deploying and running a JVM on every client. So it would be a win to have a lightweight client (like logstash-forwarder) as the mechanism to send to Logstash servers via Kakfa.

We have prototyped somehing using fluentd as the client writing onto kafka and being read by the logstash kafka input module. But this has the requirement of having to put ruby on all our client servers.

We're using Kafka for everything else. It would be nice to have it transport logs as well and have a lightweight client with no dependencies like logstash-forwarder.

If I had time I would take a stab at it. There is a couple very nice kafka library for go:

The basic but full featured go client for Kafka
https://github.com/Shopify/sarama

Supports avro and schema registry, built on top of sarma:
https://github.com/stealthly/go_kafka_client

@TinLe
Copy link

TinLe commented Jul 21, 2015

Have you looked at kafkacat? I was using lsf, but that got painful.
Since I switched to kafkacat, I have not had any problems.

https://github.com/edenhill/kafkacat

I use kafkacat to read logfiles and send to our kafka infrastructure.
It's wrapped in runit and tail -F to always follow new log files. Simple
and work great.

Tin

On Tue, Jul 21, 2015 at 12:51 PM, Robert J. Berger <notifications@github.com

wrote:

Logstash requires deploying and running a JVM on every client. So it would
be a win to have a lightweight client (like logstash-forwarder) as the
mechanism to send to Logstash servers via Kakfa.

We have prototyped somehing using fluentd http://www.fluentd.org/ as
the client writing onto kafka and being read by the logstash kafka input
module. But this has the requirement of having to put ruby on all our
client servers.

We're using Kafka for everything else. It would be nice to have it
transport logs as well and have a lightweight client with no dependencies
like logstash-forwarder.

If I had time I would take a stab at it. There is a couple very nice kafka
library for go:

The basic but full featured go client for Kafka
https://github.com/Shopify/sarama

Supports avro and schema registry, built on top of sarma:
https://github.com/stealthly/go_kafka_client


Reply to this email directly or view it on GitHub
#258 (comment)
.

@rberger
Copy link

rberger commented Jul 21, 2015

That’s pretty nice.

Do you need to do anything special so the logstash kafka input module will consume the logs on the topic you use kafkacat to publish to?

Could you share an example runit config file that does this? Do you have a config file per log or you tail a bunch of logs and let logstash sort it out?

Thanks
Rob
rberger@ibd.com mailto:rberger@ibd.com if you want to continue this privately, not sure if its appropriate for the issues forum.

On Jul 21, 2015, at 12:57 PM, Tin Le notifications@github.com wrote:

Have you looked at kafkacat? I was using lsf, but that got painful.
Since I switched to kafkacat, I have not had any problems.

https://github.com/edenhill/kafkacat

I use kafkacat to read logfiles and send to our kafka infrastructure.
It's wrapped in runit and tail -F to always follow new log files. Simple
and work great.

Tin

On Tue, Jul 21, 2015 at 12:51 PM, Robert J. Berger <notifications@github.com

wrote:

Logstash requires deploying and running a JVM on every client. So it would
be a win to have a lightweight client (like logstash-forwarder) as the
mechanism to send to Logstash servers via Kakfa.

We have prototyped somehing using fluentd http://www.fluentd.org/ as
the client writing onto kafka and being read by the logstash kafka input
module. But this has the requirement of having to put ruby on all our
client servers.

We're using Kafka for everything else. It would be nice to have it
transport logs as well and have a lightweight client with no dependencies
like logstash-forwarder.

If I had time I would take a stab at it. There is a couple very nice kafka
library for go:

The basic but full featured go client for Kafka
https://github.com/Shopify/sarama

Supports avro and schema registry, built on top of sarma:
https://github.com/stealthly/go_kafka_client


Reply to this email directly or view it on GitHub
#258 (comment)
.


Reply to this email directly or view it on GitHub #258 (comment).

@TinLe
Copy link

TinLe commented Jul 21, 2015

Since my log files are already in json format, I did not have to do
anything special. I used the default log_event schema.

Yes, one config file per log, as these logs can get fairly large (700GB to
1TB+ per day).

Here is the meat of the kafkaput.sh script. Define the variable and you
are good to go.

TOPIC="log-event_nginx_json"
PARTITIONS=-1
COMPRESSION="gzip"
#DEBUG="-c 3 -T"
DEBUG=
PATH=/export/apps/kafkacat:$PATH
NGINXLOG=/export/apps/nginx/logs/access-json.log
BROKER="kafka-zk-broker.example.com:12345"

tail -F ${NGINXLOG} | kafkacat -P -b ${BROKER} -t ${TOPIC} -p ${PARTITIONS}
-z ${COMPRESSION} ${DEBUG}

As for runit, just tell it to keep the script running, that's it.

#!/bin/bash
exec 2>&1
exec chpst -u app:app /export/apps/kafkacat/kafkaput.sh

I used fpm to create a kafkaput rpm with the script and runit setup as the
postinstall. Now I just have to distribute the rpm via our deploy system
and voila.

Tin

On Tue, Jul 21, 2015 at 1:34 PM, Robert J. Berger notifications@github.com
wrote:

That’s pretty nice.

Do you need to do anything special so the logstash kafka input module will
consume the logs on the topic you use kafkacat to publish to?

Could you share an example runit config file that does this? Do you have a
config file per log or you tail a bunch of logs and let logstash sort it
out?

Thanks
Rob
rberger@ibd.com mailto:rberger@ibd.com if you want to continue this
privately, not sure if its appropriate for the issues forum.

On Jul 21, 2015, at 12:57 PM, Tin Le notifications@github.com wrote:

Have you looked at kafkacat? I was using lsf, but that got painful.
Since I switched to kafkacat, I have not had any problems.

https://github.com/edenhill/kafkacat

I use kafkacat to read logfiles and send to our kafka infrastructure.
It's wrapped in runit and tail -F to always follow new log files. Simple
and work great.

Tin

On Tue, Jul 21, 2015 at 12:51 PM, Robert J. Berger <
notifications@github.com

wrote:

Logstash requires deploying and running a JVM on every client. So it
would
be a win to have a lightweight client (like logstash-forwarder) as the
mechanism to send to Logstash servers via Kakfa.

We have prototyped somehing using fluentd http://www.fluentd.org/ as
the client writing onto kafka and being read by the logstash kafka
input
module. But this has the requirement of having to put ruby on all our
client servers.

We're using Kafka for everything else. It would be nice to have it
transport logs as well and have a lightweight client with no
dependencies
like logstash-forwarder.

If I had time I would take a stab at it. There is a couple very nice
kafka
library for go:

The basic but full featured go client for Kafka
https://github.com/Shopify/sarama

Supports avro and schema registry, built on top of sarma:
https://github.com/stealthly/go_kafka_client


Reply to this email directly or view it on GitHub
<
#258 (comment)

.


Reply to this email directly or view it on GitHub <
#258 (comment)
.


Reply to this email directly or view it on GitHub
#258 (comment)
.

@jonatanblue
Copy link

+1

1 similar comment
@NikolaeVarius
Copy link

+1

@trixpan
Copy link

trixpan commented Oct 28, 2015

for those +1 this thread. I don't think it will happen anytime soon.

Agree or not, Elastic still seem to see LS as the gateway to Kafka when using the ELK stack.With the death of LF, this is even more explicitly stated with :

" In particular, we are very likely to reject pull requests that add a new output type (libbeat output
for kafka, riemann, etc.). The reason is that maintaining all these outputs would involve a
significant effort which is already spent in Logstash. You can use Logstash as a gateway to lots
of already supported systems."

https://github.com/elastic/libbeat/blob/master/CONTRIBUTING.md

There are a few options in addition to the kafkacat already mentioned above, in special:

Mozilla's heka (just be mindfull Heka's kafka implementation using Sarama async is not durable)
https://github.com/mozilla-services/heka/

Some random dude's kafka version of "lumberjack"
https://github.com/hyqer/kafka-forwarder
The code is largely unmaintained but can it uses Sarama (shopify Go's Kafka library) for all the heavy lifting and the library is well kept.

@tbragin
Copy link

tbragin commented Mar 13, 2016

For those that may not be following Beats closely, libbeat (and Filebeat) now support a Kafka output in master: elastic/beats#942

@ruflin
Copy link
Member

ruflin commented Mar 14, 2016

Closing this issue in favor of elastic/beats#943 so we can continue the discuss there and the related issues.

@ruflin ruflin closed this as completed Mar 14, 2016
@sunilmchaudhari
Copy link

Since my log files are already in json format, I did not have to do
anything special. I used the default log_event schema.

Yes, one config file per log, as these logs can get fairly large (700GB to
1TB+ per day).

Here is the meat of the kafkaput.sh script. Define the variable and you
are good to go.

TOPIC="log-event_nginx_json"
PARTITIONS=-1
COMPRESSION="gzip"
#DEBUG="-c 3 -T"
DEBUG=
PATH=/export/apps/kafkacat:$PATH
NGINXLOG=/export/apps/nginx/logs/access-json.log
BROKER="kafka-zk-broker.example.com:12345"

tail -F ${NGINXLOG} | kafkacat -P -b ${BROKER} -t ${TOPIC} -p ${PARTITIONS}
-z ${COMPRESSION} ${DEBUG}

As for runit, just tell it to keep the script running, that's it.

#!/bin/bash
exec 2>&1
exec chpst -u app:app /export/apps/kafkacat/kafkaput.sh

I used fpm to create a kafkaput rpm with the script and runit setup as the
postinstall. Now I just have to distribute the rpm via our deploy system
and voila.

Tin

On Tue, Jul 21, 2015 at 1:34 PM, Robert J. Berger notifications@github.com
wrote:

That’s pretty nice.
Do you need to do anything special so the logstash kafka input module will
consume the logs on the topic you use kafkacat to publish to?
Could you share an example runit config file that does this? Do you have a
config file per log or you tail a bunch of logs and let logstash sort it
out?
Thanks
Rob
rberger@ibd.com mailto:rberger@ibd.com if you want to continue this
privately, not sure if its appropriate for the issues forum.

On Jul 21, 2015, at 12:57 PM, Tin Le notifications@github.com wrote:
Have you looked at kafkacat? I was using lsf, but that got painful.
Since I switched to kafkacat, I have not had any problems.
https://github.com/edenhill/kafkacat
I use kafkacat to read logfiles and send to our kafka infrastructure.
It's wrapped in runit and tail -F to always follow new log files. Simple
and work great.
Tin
On Tue, Jul 21, 2015 at 12:51 PM, Robert J. Berger <
notifications@github.com

wrote:
Logstash requires deploying and running a JVM on every client. So it
would
be a win to have a lightweight client (like logstash-forwarder) as the
mechanism to send to Logstash servers via Kakfa.
We have prototyped somehing using fluentd http://www.fluentd.org/ as
the client writing onto kafka and being read by the logstash kafka
input
module. But this has the requirement of having to put ruby on all our
client servers.
We're using Kafka for everything else. It would be nice to have it
transport logs as well and have a lightweight client with no
dependencies
like logstash-forwarder.
If I had time I would take a stab at it. There is a couple very nice
kafka
library for go:
The basic but full featured go client for Kafka
https://github.com/Shopify/sarama
Supports avro and schema registry, built on top of sarma:
https://github.com/stealthly/go_kafka_client

Reply to this email directly or view it on GitHub
<
#258 (comment)
.

Reply to this email directly or view it on GitHub <
#258 (comment)
.


Reply to this email directly or view it on GitHub
#258 (comment)
.

Hi Tin,
Does kafka cat support Kafka connection over SSL?

@TinLe
Copy link

TinLe commented Apr 19, 2020

@sunilmchaudhari It's been years since I last used kafkacat, but last I checked, it does support talking to kafka over SSL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests