Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes Pods Logging (escaped nested) JSON to Std Out Isn't Correctly Parsed #691

Closed
therealdwright opened this issue Jul 21, 2018 · 14 comments
Assignees

Comments

@therealdwright
Copy link

Bug Report

Describe the bug

When running a docker container inside a Kubernetes cluster where the container already outputs JSON lines, it is not possible to parse the nested JSON in a way that Elasticsearch will correctly index the key:value pairs. Running the Docker parser over the log using the escaped json decoder correctly parses everything in the log except for the original JSON message.

If this is a misconfiguration on my end, assistance would be greatly appreciated, however I think I have run in to a limitation that if the original log is JSON which results in escaped JSON with a nested escaped JSON object, there appears to be no way to solve this in fluent-bit.

To Reproduce

  • In a Kubernetes or Minikube cluster, run the fluentbit daemonset which can be found here: https://github.com/fluent/fluent-bit-kubernetes-logging (I have set my environment up as a forward to fluentd to send to logz.io but it would yield the same result to send directly to an elasticsearch instance.)
  • Start a container that outputs JSON to std out, I did so by running the command kubectl run echo-chamber --quiet --image=quay.io/therealdwright/echo-chamber:latest
  • Observe the logs of the container by running the command kubectl logs -f deployments/echo-chamber and see the logs that will be output as follows:
{"Message": "JSON Parsing in Kubernetes is fun!"}

Expected behavior

I expect the logs to be parsed in a way that Elasticsearch is able to index the key "Message" in a way that can be queried by Lucene.

Screenshots

screen shot 2018-07-21 at 11 19 04 am

Your Environment

  • Version used: Fluentbit Offical Docker image from fluent/fluent-bit:0.13.4 (as a Daemonset) in a forward configuration passing logs to a fluentd container running version 1.2.3 using the logz.io plugin.
  • Configuration:
    fluent-bit input config:
[INPUT]
    Name              tail
    Tag               kube.*
    Path              /var/log/containers/*.log
    Parser            docker
    DB                /var/log/flb_kube.db

docker parser config:

[PARSER]
    Name        docker
    Format      json
    Time_Key    time
    Time_Format %Y-%m-%dT%H:%M:%S.%L
    Time_Keep   On
    # Command      |  Decoder | Field   | Optional Action
    # =============|==========|=========|=================
    Decode_Field_As   escaped    log
  • Environment name and version: Kubernetes 1.9.6
  • Server type and version: AWS EC2 Instance
  • Operating System and version: Debian Jessie k8s-1.9-debian-jessie-amd64-hvm-ebs-2018-03-11 (ami-db8546b9)
  • Filters and plugins: fluent-bit tail with Docker parser.

Additional context

I have tried many variants of configuration based on the GitHub issues I could see, I also have looked through the source and it looks like only a single "Decode_Field_As" match can be applied to a single log.

Additionally when running the Merge_JSON_Log on the Kubernetes filter, I see many errors: "could not merge JSON log as requested" in the fluentbit container.

An example raw log sent to logz.io (retrieved looking at the JSON payload) is as follows:

{
  "_index": "logzioCustomerIndex180721_v3",
  "_type": "http-bulk",
  "_id": "AWS6ag4I1yq4qwVVbz95.account-38207",
  "_version": 1,
  "_score": null,
  "_source": {
    "kubernetes": {
      "container_name": "echo-chamber",
      "host": "ip-10-41-24-84.ap-southeast-2.compute.internal",
      "annotations": {
        "kubernetes.io/limit-ranger": "LimitRanger plugin set: cpu request for container echo-chamber"
      },
      "docker_id": "d012be600cdb0f322203da4a4161da42e898cc8ce4a4dd5d79a1cad1a4d80537",
      "pod_id": "89f03ebb-8c82-11e8-b430-0a948d0850cc",
      "pod_name": "echo-chamber-59bbd6495d-4n9x7",
      "namespace_name": "default",
      "labels": {
        "pod-template-hash": "1566820518",
        "run": "echo-chamber"
      }
    },
    "@timestamp": "2018-07-21T01:17:48.000+00:00",
    "log": "{\"Message\": \"JSON Parsing in Kubernetes is fun!\"}\n",
    "stream": "stdout",
    "@log_name": "kube.var.log.containers.echo-chamber-59bbd6495d-4n9x7_default_echo-chamber-d012be600cdb0f322203da4a4161da42e898cc8ce4a4dd5d79a1cad1a4d80537.log",
    "time": "2018-07-21T01:17:48.4295262Z",
    "type": "http-bulk",
    "tags": [
      "_logz_http_bulk_json_8070"
    ]
  },
  "fields": {
    "@timestamp": [
      1532135868000
    ]
  },
  "highlight": {
    "kubernetes.labels.run": [
      "@kibana-highlighted-field@echo-chamber@/kibana-highlighted-field@"
    ]
  },
  "sort": [
    1532135868000
  ]
}
@therealdwright
Copy link
Author

I have tested this with the new 0.13.5 release and found the same issue. Log field in Kibana is displayed as follows for the following configurations:

Config 1:

[PARSER]
    Name              docker
    Format            json
    Time_Key          time
    Time_Format       %Y-%m-%dT%H:%M:%S.%L
    Time_Keep         On
    Decode_Field_As   escaped               log       do_next
    Decode_Field_As   json                  log

In Kibana: log {"Message": "JSON Parsing in Kubernetes is fun!"}
In JSON payload: "log": "{\"Message\": \"JSON Parsing in Kubernetes is fun!\"}\n",

Config 2:

[PARSER]
    Name              docker
    Format            json
    Time_Key          time
    Time_Format       %Y-%m-%dT%H:%M:%S.%L
    Time_Keep         On
    Decode_Field_As   json                  log       do_next
    Decode_Field_As   escaped               log

In Kibana: log {"Message": "JSON Parsing in Kubernetes is fun!"}
In JSON payload: "log": "{\"Message\": \"JSON Parsing in Kubernetes is fun!\"}\n",

Config 3:

[PARSER]
    Name              docker
    Format            json
    Time_Key          time
    Time_Format       %Y-%m-%dT%H:%M:%S.%L
    Time_Keep         On
    Decode_Field_As   json                  log

In Kibana: log {\"Message\": \"JSON Parsing in Kubernetes is fun!\"}\n
In JSON payload: "log": "{\\\"Message\\\": \\\"JSON Parsing in Kubernetes is fun!\\\"}\\n",

I'm continuing to experiment on my end with alternate approaches to find a work around but would greatly appreciate some assistance. I hope I'm not the only person trying to parse nested escaped JSON logs.

@therealdwright therealdwright changed the title Kubernetes Pods Logging (nested) JSON to Std Out Isn't Correctly Parsed Kubernetes Pods Logging (escaped nested) JSON to Std Out Isn't Correctly Parsed Jul 25, 2018
@nikolay
Copy link

nikolay commented Jul 25, 2018

So, it seems that without this getting fixed, using Fluent Bit with Kubernetes is pretty much useless. It's surprising that basic use cases have not been considered and are not part of the tests...

@saravanan30erd
Copy link

@nikolay Yes, you are right. Tried with multiple possibilities but no luck.

@StevenACoffman
Copy link
Contributor

StevenACoffman commented Aug 4, 2018

@nikolay Pull requests are always welcome. However, I have noticed your pattern of feedback is often not particularly helpful or necessary

@nikolay
Copy link

nikolay commented Aug 4, 2018

@StevenACoffman Your ability to do pattern recognition is sub-par.

I no longer program in archaic and unsafe programming languages - I did C/C++ in the 90s. Anyway, in 2018, I may consider doing a Rust-based alternative.

I already work on a pull request to improve the Docker build.

And last, but not least: contribution to Open Source projects is not only in the form of pull requests! My frustration was only about a missing disclaimer - I didn't say "false advertisement"!

@StevenACoffman
Copy link
Contributor

@nikolay Most open source Maintainers appreciate feedback, so I'm suggesting you might try expressing your frustration or disappointment in more constructive ways. Pull requests are only one (highly recommended) possibility. Perhaps just being more careful that your comments are more constructive or polite.

I no longer program in archaic and unsafe programming languages - I did C/C++ in the 90s. Anyway, in 2018, I may consider doing a Rust-based alternative.

  • T Truthful?
  • H Helpful?
  • I Inspiring/Interesting?
  • N Necessary?
  • K Kind?

I enjoy Rust as well! It plays nice with C, and if you extended fluent-bit in some way using it, I'm sure the maintainers would appreciate hearing about it. However, your choice of calling the maintainers choice of C as archaic and unsafe does not pass the T.H.I.N.K. test. I would appreciate it if you refrained from that kind of stuff.

@therealdwright
Copy link
Author

@StevenACoffman I appreciate this model of the T.H.I.N.K. test, I'd like to instate it in our own internal guidelines.

@nikolay
Copy link

nikolay commented Aug 6, 2018

@StevenACoffman My point was that I personally am not going back to C just to make Fluent Bit handle basic use cases. My other point was that the Kubernetes world doesn't need a fancy log shipper - it needs something really basic, which is well-suited for its use cases. For example, the inability to reload configuration, the centralized parsers definition, the inability to have wild cards for parser files, and others are not well-suited for Kubernetes features or limitations. The documentation is also outdated and incomplete. Anyway, I am doing everything to my abilities to help the project, but you cannot expect all people to submit pull requests!

Anyway, can you explain to me how your THINK test applies to your handling of a very simple PR (#692) that's been ignored for almost two weeks, and which obviously fixes a very obvious bug in Fluent Bit?

@edsiper
Copy link
Member

edsiper commented Aug 6, 2018

@nikolay utf-8 decoder patch will be reviewed after 0.14 release.

edsiper added a commit that referenced this issue Aug 6, 2018
JSMN in strict mode cannot be used to validate a JSON string without
filing the tokens, it behavior is unexpected, more details about this
in the following jsmn pull request:

    zserge/jsmn#119 (comment)

this patch remove the flb_pack_json_valid() function.

Signed-off-by: Eduardo Silva <eduardo@treasure-data.com>
edsiper added a commit that referenced this issue Aug 6, 2018
Signed-off-by: Eduardo Silva <eduardo@treasure-data.com>
@edsiper
Copy link
Member

edsiper commented Aug 6, 2018

@therealdwright thanks for providing specific details to reproduce the issue.

the fix is already in place in GIT master and will be backported to 0.13 branch for 0.13.7 release:

with the above fixes the following config works as expected:

[PARSER]
    Name         docker
    Format       json
    Time_Key     time
    Time_Format  %Y-%m-%dT%H:%M:%S.%L
    Time_Keep    On
    # Command      |  Decoder | Field | Optional Action
    # =============|==================|=================
    Decode_Field_As   escaped    log    do_next
    Decode_Field_As   json       log

more detail about the root cause of the problem with jsmn project read here: zserge/jsmn#119 (comment)

@edsiper edsiper self-assigned this Aug 6, 2018
edsiper added a commit that referenced this issue Aug 6, 2018
JSMN in strict mode cannot be used to validate a JSON string without
filing the tokens, it behavior is unexpected, more details about this
in the following jsmn pull request:

    zserge/jsmn#119 (comment)

this patch remove the flb_pack_json_valid() function.

Signed-off-by: Eduardo Silva <eduardo@treasure-data.com>
edsiper added a commit that referenced this issue Aug 6, 2018
Signed-off-by: Eduardo Silva <eduardo@treasure-data.com>
@nikolay
Copy link

nikolay commented Aug 9, 2018

@edsiper My v0.13.7 setup without #692 does not work. If I patch it with the PR, it starts to.

@xadcoh
Copy link

xadcoh commented Aug 16, 2018

@nikolay I used "default" config, but i changed escaped to escaped_utf8:

[PARSER]
    Name         docker
    Format       json
    Time_Key     time
    Time_Format  %Y-%m-%dT%H:%M:%S.%L
    Time_Keep    On
    # Command      |  Decoder | Field | Optional Action
    # =============|==================|=================
    Decode_Field_As   escaped_utf8    log    do_next
    Decode_Field_As   json       log

And logs look good to me

@edsiper
Copy link
Member

edsiper commented Aug 21, 2018

are we ok to close this ticket?

@edsiper
Copy link
Member

edsiper commented Aug 21, 2018

#692 is merged, so we should be good. Fixed

@edsiper edsiper closed this as completed Aug 21, 2018
rawahars pushed a commit to rawahars/fluent-bit that referenced this issue Oct 24, 2022
* source: add Github links

Signed-off-by: Patrick Stephens <pat@calyptia.com>

* source: switch to Github

Signed-off-by: Patrick Stephens <pat@calyptia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants