-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tail Docker Support detects multilines too eagerly #2585
Comments
Some more context: We removed the |
we see the same behaviour. docker_multline is detecting the Parser_Firstline regex as first line properly. But when we switched to Docker_Mode_Parser, its merging all lines that do not conform to the Parser regex, even if there is no first line present. Implication of this is all application logs that do not confirm to the Parser regex pattern gets merged indefinitely till buffer flush, where as we expect it to output as different lines (since there is no multiline expected out of them). |
Here's a minimalistic example to see the issue with a local running fluent bit fluent.conf
parsers.conf
logproducer.sh
|
The output will be similar to this
|
@sibidass @coders-kitchen So do you expect that multiline concatenation starts with first valid multiline log? I don't really understand usecase for that, because: a. if file consists of few logging formats (where some of them are multiline and some of them not) it won't work correctly as it will start to concatenate not-multilline logs anyway Only reason to skip concatenation unless first regex match is when someone uses multiline feature for not multiline logs (or regex is invalid), so rather inappropriate configuration Anyway, I can add this safety check if this is really needed. @coders-kitchen for given logs:
I would use regex which catches the given lines properly,eg. |
@sumo-drosiek So the reason why I had to pull back to previous Multiline configuration provided by fluentbit in our kubernetes environment is because of the following reason: We have all kinds of application containers running in a cluster emitting logs in different formats.(I suppose this is the case with most clusters) Earlier, with Multiline configuration, it will add subsequent lines to first line, provided first line is in proper datetime format. So when tailing a container file, if first line is datetime formatted log, and second line is not, it will identify second line as part of first and merge it accordingly. If the first line itself is not in a datetime format, that means the logs are in a different format and it will be sent as separate line. Now, once I switched to the new Docker_Mode_Parser, this is not happening. which means , all application logs that do not follow the datetime beginning format are now getting merged indefinitely. we can modify the regex to accomodate other formats, but that is not a practical solution, as we have definite release cycles and we have tons of applications in lots of different formats. Also both the previous multiline option and the new docker multiline mentions the following in documentation: So it would be good, if we could seamlessly migrate to docker multiline from fluentbit multiline facility. |
btw, I am currently using lua script with fluentbit multiline flush to process multilines properly. it is working fine, but lua is really slow compared to native C and I saw a performance jump after switching to Docker_Mode_Parser, but unfortunately I had to roll back because of noise from users that their logs are getting merged. |
@sibidass so expected behavior for you is to start concatenating multlines from the first match of the |
@sumo-drosiek I believe concatenating multilines from first match is good enough, which will make Docker_Mode_Parser fully compatible with the multiline mode on, as I could see that is the current behaviour of fluentbit multiline flush. |
@coders-kitchen Will such solution solve your issue as well? |
Yes, that would perfectly match our expectations as well |
@sibidass @coders-kitchen The fix is included in the last fluent-bit version: v1.6.0 🎉 |
wow nice. I will test this out. 🧪 |
Bug Report
Describe the bug
We do observe that the tail input plugin with enabled Docker Support, joins log lines to eagerly.
It doesn't respect the given regex for first line detection.
This leads from time to time to the situation that too many logs are collected and
To Reproduce
Is joined to (stripping out the stream and time information.
ACCESS [2020-09-25T07:47:03.299Z] "GET /unleash/api/admin/metrics/feature-toggles HTTP/1.1" 200 - 0 151 3 2 "10.244.7.0" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36" "686477e2-fb40-447a-9dc9-21334aa816ca" "chat-api-int.tech-on-air.com:30080" "10.245.208.187:4242"
ACCESS [2020-09-25T07:47:05.296Z] "GET /unleash/api/admin/metrics/feature-toggles HTTP/1.1" 304 - 0 0 5 3 "10.244.7.0" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36" "e98770d5-836e-48e7-9171-43f74a734a25" "chat-api-int.tech-on-air.com:30080" "10.245.208.187:4242"
ACCESS [2020-09-25T07:47:06.030Z] "GET /unleash/api/client/features HTTP/1.1" 304 - 0 0 5 4 "10.244.8.0" "whatsappdispatcher" "41faa28c-083a-460a-a126-9a9d8e7ed3f4" "chat-api-int.tech-on-air.com:30080" "10.245.208.187:4242"
ACCESS [2020-09-25T07:47:07.443Z] "GET /unleash/api/client/features HTTP/1.1" 304 - 0 0 6 4 "10.244.7.0" "whatsappdispatcher" "c4c38fb5-6553-45e3-a977-eb2cd1ebd9e0" "chat-api-int.tech-on-air.com:30080" "10.245.208.187:4242"
ACCESS [2020-09-25T07:47:07.745Z] "GET /unleash/api/client/features HTTP/1.1" 304 - 0 0 5 3 "10.244.7.0" "whatsappdispatcher" "d6761038-9ab9-4e8f-a558-1491fade3c39" "chat-api-int.tech-on-air.com:30080" "10.245.208.187:4242"
ACCESS [2020-09-25T07:47:08.300Z] "GET /unleash/api/admin/metrics/feature-toggles HTTP/1.1" 304 - 0 0 3 1 "10.244.7.0" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36" "9b5358aa-7c42-4853-92ac-8f6177f3c19d" "chat-api-int.tech-on-air.com:30080" "10.245.208.187:4242"
ACCESS [2020-09-25T07:47:10.297Z] "GET /unleash/api/admin/metrics/feature-toggles HTTP/1.1" 304 - 0 0 3 1 "10.244.7.0" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36" "4929e3e5-8146-4832-957f-d6d12cdc3725" "chat-api-int.tech-on-air.com:30080" "10.245.208.187:4242"
ACCESS [2020-09-25T07:47:11.030Z] "GET /unleash/api/client/features HTTP/1.1" 304 - 0 0 5 3 "10.244.8.0" "whatsappdispatcher" "d5da1102-dd20-47cb-8f28-291b1915554e" "chat-api-int.tech-on-air.com:30080" "10.245.208.187:4242"
ACCESS [2020-09-25T07:47:12.443Z] "GET /unleash/api/client/features HTTP/1.1" 304 - 0 0 5 4 "10.244.7.0" "whatsappdispatcher" "6136cb52-56fb-45be-bb4f-31d09e730586" "chat-api-int.tech-on-air.com:30080" "10.245.208.187:4242"
ACCESS [2020-09-25T07:47:12.745Z] "GET /unleash/api/client/features HTTP/1.1" 304 - 0 0 6 4 "10.244.7.0" "whatsappdispatcher" "928911a2-3f3b-4541-b43e-0b09be23c61c" "chat-api-int.tech-on-air.com:30080" "10.245.208.187:4242"
Expected behavior
We do expect the the sample logs are not detected as multiline logs
Your Environment
Server type and version:
Operating System and version:
Filters and plugins:
[SERVICE]
Flush 1
Log_Level info
Daemon off
Parsers_File parsers.conf
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port 2020
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/.log
Exclude_Path /var/log/containers/wawhatsapp-instances*.log
Parser docker
DB /var/log/flb_kube.db
Mem_Buf_Limit 5MB
Ignore_Older 3600
Skip_Long_Lines On
Docker_Mode On
Docker_Mode_Parser docker_multiline
Refresh_Interval 1
[FILTER]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc:443
Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
Kube_Tag_Prefix kube.var.log.containers.
Merge_Log On
Merge_Log_Key log_processed
K8S-Logging.Parser On
K8S-Logging.Exclude Off
[FILTER]
Name record_modifier
Match *
Record cluster_name CHAT-API
Record log_collector fluentbit
[FILTER]
Name nest
Match kube.*
Operation lift
Nested_under kubernetes
[FILTER]
Name nest
Match kube.*
Operation lift
Nested_under labels
[FILTER]
Name nest
Match kube.*
Operation lift
Nested_under annotations
[PARSER]
Name docker_multiline
Format regex
Regex (?^{"log":"\d{4}-\d{2}-\d{2}.*)
[PARSER]
Name json
Format json
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep On
[PARSER]
Name syslog
Format regex
Regex ^<(?[0-9]+)>(?[^ ]* {1,2}[^ ]* [^ ]) (?[^ ]) (?[a-zA-Z0-9_/.-])(?:[(?[0-9]+)])?(?:[^\:]:)? (?.)$
Time_Key time
Time_Format %b %d %H:%M:%S
[OUTPUT]
Name gelf
Match kube.*
Host
Port 12201
Mode udp
Gelf_Short_Message_Key log
Additional context
We want to use the multiline merge capabilities of fluentbit tail plugin to ensure the java stacktraces are not present as single messages, but only one
The text was updated successfully, but these errors were encountered: