Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Change multiline behavior to support cri-o type logs #1316

Closed
lorenmh opened this issue May 7, 2019 · 40 comments
Closed

Feature Request: Change multiline behavior to support cri-o type logs #1316

lorenmh opened this issue May 7, 2019 · 40 comments

Comments

@lorenmh
Copy link

lorenmh commented May 7, 2019

Problem

It appears as if fluentbit currently does not support cri-o type of multiline logs. This is due to the way the multiline code works.

Given a log format of type A which indicates a normal log line, and a log format of type B which indicates a continuation log line, the current multiline behavior is to only support ABBB type multiline logs, but it doesn't support BBBA type multiline logs (to see what I mean, check out example 1 and 2 below).

Cri-o has two logtags, P indicating a partial log line (one that will be continued by another partial log line, or by a 'full log line'), and F indicating a full log line. A P tagged log will be followed by a terminal F tagged log to close out the multiline log. For example, cri-o logs look like:

<timestamp> <stream> P foo
<timestamp> <stream> P bar
<timestamp> <stream> P baz
<timestamp> <stream> F bot

Which should become foo bar baz bot when combined as a multiline log. Current behavior is such that if Parser_Firstline is set to match (?<date>[^ ]+) (?<stream>[^ ]+) P (?<log>.*), then instead of a single log of foo bar baz bot current behavior would be to get three logs, foo, bar and baz bot.

Possible Solution

Instead of a Parser_Firstline and Parser_N, can we have a Parser_Multiline_Start, Parser_Multiline_Continue and Parser_Multiline_End?

Example 1

Given the following logs in a file /some/file which can be captured with the current multiline behavior:

2019/05/07 some
  multiline
  log
2019/05/07 another
  multiline
  log
2019/05/07 non multiline
2019/05/07 non multiline 2

given the following parsers:

[PARSER]
    Name full
    Format regex
    Regex (?<date>\d{4}/\d{2}/\d{2}) (?<log>.*)

[PARSER]
    Name partial
    Format regex
    Regex \s{2}(?<log>.*)

Could be configured as:

[INPUT]
    Name tail
    Path /some/file
    Multiline On
    Parser_Multiline_Start full
    Parser_Multiline_Continue partial
    Parser_Multiline_End partial

Example 2

Given the following crio logs in a file /some/file which can not be captured with current multiline behavior:

2019-05-07T18:57:50.904275087+00:00 stdout P some
2019-05-07T18:57:51.904275087+00:00 stdout P multiline
2019-05-07T18:57:52.904275087+00:00 stdout F log
2019-05-07T18:57:53.904275087+00:00 stdout P another
2019-05-07T18:57:54.904275087+00:00 stdout P multiline
2019-05-07T18:57:55.904275087+00:00 stdout F log
2019-05-07T18:57:56.904275087+00:00 stdout F non multiline
2019-05-07T18:57:57.904275087+00:00 stdout F non multiline 2

given the following parsers:

[PARSER]
    Name full
    Format regex
    Regex (?<date>[^ ]+) (?<stream>[^ ]+) F (?<log>.*)

[PARSER]
    Name partial
    Format regex
    Regex (?<date>[^ ]+) (?<stream>[^ ]+) P (?<log>.*)

Could be configured as:

[INPUT]
    Name tail
    Path /some/file
    Multiline On
    Parser_Multiline_Start partial
    Parser_Multiline_Continue partial
    Parser_Multiline_End full

I believe the logic to use these parameters, Parser_Multiline_Start, Parser_Multiline_Continue, and Parser_Multiline_End should be fairly straightforward. Would this be something that would be acceptable if it appeared in a PR?

Best

@birdy81
Copy link

birdy81 commented Jun 27, 2019

Hi, we also need a solution for partial logs cri-o (containerd). For example it would be great if this solution #852 could be extended.

@edsiper
Copy link
Member

edsiper commented Jun 27, 2019

Definitely, we will work in a solution...

...but let's be honest, how is possible that another tooling is still adding more complexity when generating multiline logs ?

@ealebed
Copy link

ealebed commented Feb 6, 2020

any updates?

@imarko
Copy link

imarko commented Apr 21, 2020

The following is a minimal version of the idea from #852 implemented as a Lua filter. Untested, may be buggy.

local reassemble_state = {}

function reassemble_cri_logs(tag, timestamp, record)
   -- IMPORTANT: reassemble_key must be unique for each parser stream
   -- otherwise entries from different sources will get mixed up.
   -- Either make sure that your parser tags satisfy this or construct
   -- reassemble_key some other way
   local reassemble_key = tag
   -- if partial line, accumulate
   if record.logtag == 'P' then
      reassemble_state[reassemble_key] = reassemble_state[reassemble_key] or "" .. record.message
      return -1, 0, 0
   end
   -- otherwise it's a full line, concatenate with accumulated partial lines if any
   record.message = reassemble_state[reassemble_key] or "" .. record.message
   reassemble_state[reassemble_key] = nil
   return 1, timestamp, record
end

@strantalis
Copy link

We are looking for a solution to handle cri-o multiline logs as well. Is there any way to do this now or is it still a missing feature?

@ealebed
Copy link

ealebed commented Aug 3, 2020

@strantalis You can write logs from application(s) WITHOUT newlines - replacing '\n' for another symbol - collect logs with fluentbit, send messages to ES, and replace "another symbol" back to '\n' in elastic pipeline...

Sounds like a crutch, but it's working - we use this solution for handling multiline java stacktraces - see more

@imarko
Copy link

imarko commented Aug 3, 2020

The Lua filter I described in #1316 (comment) is another approach. Been using something very similar in production for the past few months, works great.

@strantalis
Copy link

Thanks @ealebed & @imarko. It's hard to get teams to log in a consistent format but the lua filter looks like it is working.

@AntonDobshikov
Copy link

For some reasons on my side it works only with braces around
(reassemble_state[reassemble_key] or "")
in two places. this way:

local reassemble_state = {}

function reassemble_cri_logs(tag, timestamp, record)
   -- IMPORTANT: reassemble_key must be unique for each parser stream
   -- otherwise entries from different sources will get mixed up.
   -- Either make sure that your parser tags satisfy this or construct
   -- reassemble_key some other way
   local reassemble_key = tag
   -- if partial line, accumulate
   if record.logtag == 'P' then
      reassemble_state[reassemble_key] = (reassemble_state[reassemble_key] or "") .. record.message
      return -1, 0, 0
   end
   -- otherwise it's a full line, concatenate with accumulated partial lines if any
   record.message = (reassemble_state[reassemble_key] or "") .. record.message
   reassemble_state[reassemble_key] = nil
   return 1, timestamp, record
end```

@perezjasonr
Copy link

am I understanding this right? For the time being, simply using the cri parser when using cri-o/containerd is the correct parser, but it doesn't handle is multiline format well which requires further processing or this lua workaround. Please let me know if Im misunderstanding.

@dennis-benzinger-hybris
Copy link
Contributor

Yes, the CRI parser will split every log line from CRI-O / containerd into time, stream (stdout or stderr), logtag (p for partial or f for full lines) and message (actual log from the container).
After that long log lines from the container are still split into pieces. The Lua script then combines them into one logical, container level log message.

@agup006
Copy link
Member

agup006 commented Feb 11, 2021

am I understanding this right? For the time being, simply using the cri parser when using cri-o/containerd is the correct parser, but it doesn't handle is multiline format well which requires further processing or this lua workaround. Please let me know if Im misunderstanding.

Correct for now, note: we are working on generic multi-line support. Targeted for 1.7.X release, and after the first release

@perezjasonr
Copy link

Ok thank you I will be keeping an eye on this issue then

@perezjasonr
Copy link

perezjasonr commented Feb 12, 2021

in addition to the lua script being added, does /var/lib/docker/containers also need to be changed?
on my nodes that dir doesnt exist but /var/lib/containerd does, as does /var/lib/containers

also if anyone is using the helm chart for the lua script does anyone have an example of just the lua section, are you just putting that block "as is" into the curly brackets?

@Hyper200
Copy link

We have also come across this issue where we have json log messages being put to stdout. Podman option for logging is journald which has its own mutiline max message size issues or k8sfile logging which is CRI-O as far as i can tell.

CRI-O splits the json logging over 4 or 5 lines, it would be nice for fluentd and fluent bit to re-assemable these before being posted to elasticsearch.

@perezjasonr
Copy link

Actually now I'm wondering what fluentbit was doing with /var/lib/docker/containers, was it using it for metadata? with kubernetes tail is referening /var/log/containers/

after changing the parser to be cri,
I'm not sure if we should be changing /var/lib/docker/containers mount to /var/lib/containers or (or something else) or if we should remove that mount entirely, but one thing I do notice is that theres no /var/lib/docker, and yet I see a lot of people leaving that mount there.

@perezjasonr
Copy link

is this what's causing it?

  # max_container_log_line_size is the maximum log line size in bytes for a container.
  # Log line longer than the limit will be split into multiple lines. -1 means no
  # limit.
  max_container_log_line_size = 16384

im wondering if we can set this to -1 for now and it wont split any more? may as well suggest it just in case.
this is for containerd though not sure if cri-o has a similar setting

@timricese
Copy link

is this what's causing it?

  # max_container_log_line_size is the maximum log line size in bytes for a container.
  # Log line longer than the limit will be split into multiple lines. -1 means no
  # limit.
  max_container_log_line_size = 16384

im wondering if we can set this to -1 for now and it wont split any more? may as well suggest it just in case.
this is for containerd though not sure if cri-o has a similar setting

This is a setting only in containerd, right? It doesnt exist in CRI-O at least.

@perezjasonr
Copy link

perezjasonr commented Mar 4, 2021

[editing response: max container log size is "under" key pair streaming section, but its not indented to be in that section, very confusing and strange]

  [plugins."io.containerd.grpc.v1.cri".x509_key_pair_streaming]
    # tls_cert_file is the filepath to the certificate paired with the "tls_key_file"
    tls_cert_file = ""

    # tls_key_file is the filepath to the private key paired with the "tls_cert_file"
    tls_key_file = ""

  # max_container_log_line_size is the maximum log line size in bytes for a container.
  # Log line longer than the limit will be split into multiple lines. -1 means no
  # limit.
  max_container_log_line_size = 16384

I actually tried max_container_log_line_size = -1 to see if it would help, and not sure if anyone else wants to try it and collaborate findings but I no longer see logtag P, every logtag is now F. I did a filter in kibana and the only logtag showing up is F, I get 0 results for P. I hope its safe to assume I'm not getting separated logs with this.

so perhaps for the containerd folks, this is a less intrusive alternative to the custom lua script?

@timricese and yes, I can only confirm that I found that setting in containerd, thats what I'm using.

@t0nik0
Copy link

t0nik0 commented Mar 29, 2021

Is there any news? Lua script is not worked for me.
I use fluent/fluent-bit:1.7.
Fluent-bit sends logs to logstash/gelf for some fields modifications.
My ConfigMap fluent-bit:

kubectl describe -n logging cm fluent-bit-config
Name:         fluent-bit-config
Namespace:    logging
Labels:       k8s-app=fluent-bit
Annotations:  <none>

Data
====
filter-kubernetes.conf:
----
[FILTER]
    Name                kubernetes
    Match               kube.*
    Kube_URL            https://kubernetes.default.svc:443
    Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
    Kube_Tag_Prefix     kube.var.log.containers.
    K8S-Logging.Parser  On
    K8S-Logging.Exclude On
    Annotations         Off

[FILTER]
    Name          nest
    Match         kube.*
    Operation     lift
    Nested_under  kubernetes
    Add_prefix    kubernetes_

[FILTER]
    Name          grep
    Match         kube.*
    Exclude       kubernetes_namespace_name kube-system

[FILTER]
    Name          nest
    Match         kube.*
    Operation     lift
    Nested_under  kubernetes_labels
    Add_prefix    kubernetes_labels_

[FILTER]
    Name                modify
    Match               kube.*
    Remove_wildcard     kubernetes_annotations_

[FILTER]
    Name                modify
    Match               kube.*
    Remove_wildcard     kubernetes_labels_

[FILTER]
    Name    lua
    Match   kube.*
    script  /fluent-bit/etc/merge-crio-multiline.lua                                                                                                                                                                                                                                 
    call    reassemble_cri_logs

fluent-bit.conf:
----
[SERVICE]
    Flush         1
    Log_Level     info
    Daemon        off
    Parsers_File  parsers.conf
    HTTP_Server   On
    HTTP_Listen   0.0.0.0
    HTTP_Port     2020

@INCLUDE input-kubernetes.conf
@INCLUDE filter-kubernetes.conf
@INCLUDE output-gelf.conf

input-kubernetes.conf:
----
[INPUT]
    Name              tail
    Tag               kube.*
    Path              /var/log/containers/*.log
    Parser            crio
    DB                /var/log/flb_kube.db
    Mem_Buf_Limit     5MB
    Skip_Long_Lines   On
    Refresh_Interval  10

merge-crio-multiline.lua:
----
local reassemble_state = {}
function reassemble_cri_logs(tag, timestamp, record)
  -- IMPORTANT: reassemble_key must be unique for each parser stream
  -- otherwise entries from different sources will get mixed up.
  -- Either make sure that your parser tags satisfy this or construct
  -- reassemble_key some other way
  local reassemble_key = tag
  -- if partial line, accumulate
  if record.logtag == 'P' then
    reassemble_state[reassemble_key] = (reassemble_state[reassemble_key] or "") .. record.message
    return -1, 0, 0
  end
  -- otherwise it's a full line, concatenate with accumulated partial lines if any
  record.message = (reassemble_state[reassemble_key] or "") .. record.message
  reassemble_state[reassemble_key] = nil
  return 1, timestamp, record
end

output-gelf.conf:
----
[OUTPUT]
    Name                    gelf
    Match                   kube.*
    Host                    <MASKED>
    Port                    12201
    Mode                    udp
    Gelf_Short_Message_Key  message

parsers.conf:
----
[PARSER]
    Name   json
    Format json
    Time_Key time
    Time_Format %d/%b/%Y:%H:%M:%S %z

[PARSER]
    Name        docker
    Format      json
    Time_Key    time
    Time_Format %Y-%m-%dT%H:%M:%S.%L
    Time_Keep   On

[PARSER]
    # https://rubular.com/r/M2DugKGda5hxLe
    Name crio
    Format Regex
    Regex ^(?<time>.+)\s(?<stream>stdout|stderr)\s(?<logtag>F|P)\s(?<message>.*)$
    Time_Key    time
    Time_Format %Y-%m-%dT%H:%M:%S.%L%z
    Time_Keep   On

Events:  <none>

@cpflaume
Copy link

I think it would be good to support cri-o type logs. We are running into this issue in our kubernetes cluster.
It would be lovely if this issue would be solved soon.

@edsiper
Copy link
Member

edsiper commented Apr 24, 2021

FYI: new & native multiline support (with docker and cri modes) is coming shortly :)

@t0nik0
Copy link

t0nik0 commented Apr 25, 2021

... is coming shortly :)
Great news!
But sorry my interest... How long the "shortly" lasts? Is there a milestone?
The issue is about 2 years.

@edsiper
Copy link
Member

edsiper commented Apr 25, 2021

~2 weeks max

@gabegorelick
Copy link

#3444 was merged but doesn't appear to be released or documented yet. Is that also coming soon?

@t0nik0
Copy link

t0nik0 commented May 14, 2021

Does release 1.7.5 contain the feature? There is nothing about multiline in RN(

@gabegorelick
Copy link

Does release 1.7.5 contain the feature?

Those commits aren't in that tag, so I think not.

@t0nik0
Copy link

t0nik0 commented May 14, 2021

I tried to look into the code, built the image v1.8 and havn't undestood how to set subj (
It needs some docs.

@ep4sh
Copy link

ep4sh commented Jun 10, 2021

Hi guys, any updates on it?

@cpflaume
Copy link

Considering that the feature resides on then branch "https://github.com/fluent/fluent-bit/tree/multiline-1.8" my guess is, that it will be available with release version 1.8.X

@kien-truong
Copy link

kien-truong commented Jun 19, 2021

I don't know why, but in our cluster, many lines are tagged with F the container log file, eventhough they should have been tagged as P. So the above-mentioned Lua script doesn't work for us. For instance,

2021-06-19T15:28:18.485439385+07:00 stdout F 2021-06-19 15:28:18,485 [main] INFO  o.a.k.c.admin.AdminClientConfig - AdminClientConfig values: 
2021-06-19T15:28:18.48549486+07:00 stdout F     bootstrap.servers = [xxx.kafka:9092]
2021-06-19T15:28:18.485502296+07:00 stdout F    client.dns.lookup = use_all_dns_ips
2021-06-19T15:28:18.485507982+07:00 stdout F    client.id = 
2021-06-19T15:28:18.485513021+07:00 stdout F    connections.max.idle.ms = 300000
2021-06-19T15:28:18.48551861+07:00 stdout F     default.api.timeout.ms = 60000
2021-06-19T15:28:18.485524423+07:00 stdout F    metadata.max.age.ms = 300000
2021-06-19T15:28:18.485529539+07:00 stdout F    metric.reporters = []
2021-06-19T15:28:18.485534754+07:00 stdout F    metrics.num.samples = 2
2021-06-19T15:28:18.485540864+07:00 stdout F    metrics.recording.level = INFO
...

@imarko
Copy link

imarko commented Jun 19, 2021

@kien-truong we've seen this happen with some log types, for example Go raw stack traces where the Go runtime writes out each line in a separate write() syscall. I believe what happens (at least in our case) is that conmon receives these in separate reads and has no way to tell that they belong together so they are written out as separate F lines.

@tolwi
Copy link

tolwi commented Jun 22, 2021

@kien-truong you can use your own record start marker to collect multiline record + lua script to remove cri "prefix":

[INPUT]
...
    Tag                           my_log
    Multiline                     On
    Parser_Firstline              cri_firstline



[FILTER]
    Name                          lua
    Match                         my_log
    script                        cleanup.lua
    call                          cleanup

parsers

[PARSER]
    Name        cri_firstline
    Format      regex
    Regex       ^(?<time>[^ ]+) (?<stream>[^ ]+) (?<log_tag>[^ ]+) (?<log>\d{4}-\d{2}-\d{2}[^\n]+)
    Time_Key    time
    Time_Format %Y-%m-%dT%H:%M:%S.%L%z

cleanup.lua

function cleanup(tag, timestamp, record)
  log = record["log"]:gsub("\n[^\n]+ [FP] ", "\n")
  record["log"] = log
  return 1, timestamp, record
end

@z0rc
Copy link

z0rc commented Jul 13, 2021

@gabegorelick
Copy link

Besides built-in multiline parsers, we allow now to register your own multiline parsers in the current parsers.conf files by using the new [MULTILINE_PARSER] section.

Is there documentation on MULTILINE_PARSER somewhere? The new multiline.parser option also doesn't seem to be documented in https://docs.fluentbit.io/manual/pipeline/inputs/tail yet and there's nothing in https://docs.fluentbit.io/manual/pipeline/parsers about this.

@edsiper
Copy link
Member

edsiper commented Jul 20, 2021

(now we ship a built-in "cri" multiline parser)

Multiline Update

As part of Fluent Bit v1.8, we have released a new Multiline core functionality. This new big feature allows you to configure new [MULTILINE_PARSER]s that support multi formats/auto-detection, new multiline mode on Tail plugin, and also on v1.8.2 (to be released on July 20th, 2021) a new Multiline Filter.

For now, you can take at the following documentation resources:

Documentation pages now point to complete config examples that are available on our repository.

Thanks everyone for supporting this!

@edsiper edsiper closed this as completed Jul 20, 2021
@stevehipwell
Copy link

@edsiper could you take a look at fluent/helm-charts#137 to add v1.8 support to the Helm chart? I'm happy to change the kubelet defaults and wait for the v1.8.2 image if that works better.

@Quarky9
Copy link
Contributor

Quarky9 commented Jul 20, 2021

@edsiper
Giving it a try right now with 1.8.1, tail input, "cri" parser.
Unfortunately all "logtag: P" get through ;-(

Was wondering about

"%Y-%m-%dT%H:%M:%S.%L%:z"
if the format string looks correct ? the "%:z" at the end, causing the parser to fail ?

            Name tail
            Path /var/log/containers/*.log
            multiline.parser cri
            Tag cri.*
            Skip_Long_Lines On```

@stevehipwell
Copy link

@Quarky9 I think you might want to wait for the v1.8.2 release later today as that's the version mentioned in the announcement.

@pranavmarla
Copy link

pranavmarla commented Aug 31, 2021

FYI, although this issue is closed, it only appears to be partially solved by the current multiline parser capability -- specifically, although the built-in CRI parser can correctly parse CRI multiline logs, any custom multiline parser that you create will not be able to correctly parse CRI multiline logs. So, if you cannot use the built-in parser for whatever reason, you will not be able to create your own custom parser to use instead.

In other words, as of now, any custom multiline parser that you create seems to suffer from the same regex limitations described in the very first post above:

Given a log format of type A which indicates a normal log line, and a log format of type B which indicates a continuation log line, the current multiline behavior is to only support ABBB type multiline logs, but it doesn't support BBBA type multiline logs

For more details, see #4033.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests