Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add target and remove_field options to decode_json_field #3134

Closed
andrewkroh opened this issue Dec 6, 2016 · 4 comments
Closed

Add target and remove_field options to decode_json_field #3134

andrewkroh opened this issue Dec 6, 2016 · 4 comments
Labels
discuss Issue needs further discussion.

Comments

@andrewkroh
Copy link
Member

I was trying to use the new decode_json_field processor to decode pretty multiline json messages (as in #1208). And without a configurable "target" it's kind of hard to use since the mapping for the message field is a string, but decode_json_field makes message an object.

So I propose we add a configuration options for target and remove_field like Logstash has.

For example:

filebeat.yml

filebeat.prospectors:
- paths:
    - input.json
  multiline.pattern: '^}'
  multiline.negate: true
  multiline.match: before

output.console.pretty: true

processors:
  - decode_json_fields:
      fields: ['message']

input.json

{
  "hello": "world"
}
{
  "hello": "world",
  "foo":   "bar"
}

output

{
  "@timestamp": "2016-12-06T17:04:00.892Z",
  "beat": {
    "hostname": "macbook13.local",
    "name": "macbook13.local",
    "version": "6.0.0-alpha1"
  },
  "input_type": "log",
  "message": {
    "hello": "world"
  },
  "offset": 23,
  "source": "input.json",
  "type": "log"
}
{
  "@timestamp": "2016-12-06T17:04:00.892Z",
  "beat": {
    "hostname": "macbook13.local",
    "name": "macbook13.local",
    "version": "6.0.0-alpha1"
  },
  "input_type": "log",
  "message": {
    "foo": "bar",
    "hello": "world"
  },
  "offset": 64,
  "source": "input.json",
  "type": "log"
}
@andrewkroh andrewkroh added the discuss Issue needs further discussion. label Dec 6, 2016
@andrewkroh andrewkroh changed the title Add target and remove_field option to decode_json_field Add target and remove_field options to decode_json_field Dec 6, 2016
@ruflin
Copy link
Member

ruflin commented Dec 7, 2016

I would see the current behaviour even as a bug as this is probably going to break as soon as the template is loaded. We must definitively change this.

Instead of allowing the flexibility to decide under which namespace the processed data should go, I suggest we put it under json as this namespace already exists. This allows us to define this already in the template and prevents most type conflicts. There could be type conflicts in case two different json documents have different types for the same fields, but I would argue this is more an issue in the log structure.

In case someone wants to move the fields to a different place (and breaks templates with it) this can be done on the Logstash or Ingest side.

If someone wants to remove the message field, this could already be done with the filters we have. I see that it is nice to have as a short cut for usability, but I'm hesitant to add a feature in two places. Also a user can use again ingest or LS to remove fields.

In general we should be careful allowing users to modify the structure of events as this will make the events incompatible with the template and will lead to potential type / namespace conflicts with overwrites, which I prefer not to handle on the beats side.

@martinscholz83
Copy link
Contributor

martinscholz83 commented Dec 7, 2016

Hi @andrewkroh, i actually testing try to decode json message field. Here is my output

  "@timestamp": "2016-12-07T09:15:41.173Z",
  "beat": {
    "hostname": "4201halwsd00001",
    "name": "4201halwsd00001",
    "version": "6.0.0-alpha1"
  },
  "input_type": "log",
  "message": "{\n  \"hello\": \"world\"}",
  "offset": 23,
  "source": "input.json",
  "type": "log"
}

You see that message is format with \n as new line. But it doesnt add a break. Something i have missed? Tested with console output and elastic output.

Update:
Tested also on ubuntu 16.04 and WIN 10

@martinscholz83
Copy link
Contributor

@andrewkroh, i'll tack back my comment. I see this coming from the multiline pattern. And json decode is not able to decode it properly. I assume this is what you meaning with

mapping for the message field is a string, but decode_json_field makes message an object

@martinscholz83
Copy link
Contributor

Like @ruflin says, i also think this is a bug. I have tested something
input.json

{ 
    "hello": "world",
    "test": {
         "hello": "world"
     },
     "foo": "baar"
}
{ 
    "hello": "world",
    "test": {
         "hello": "world"
     },
     "foo": "baar"
}
{ 
    "hello": "world",
    "test": {
         "hello": "world"
     },
     "foo": "baar"
}
{ 
    "hello": "world",
    "test": {
         "hello": "world"
     },
     "foo": "baar"
}

console output

{
  "@timestamp": "2016-12-08T13:22:54.422Z",
  "beat": {
    "hostname": "D-W-D806348",
    "name": "D-W-D806348",
    "version": "6.0.0-alpha1"
  },
  "input_type": "log",
  "message": {
    "foo": "baar",
    "hello": "world",
    "test": {
      "hello": "world"
    }
  },
  "offset": 101,
  "source": "input.json",
  "type": "log"
}
{
  "@timestamp": "2016-12-08T13:22:54.422Z",
  "beat": {
    "hostname": "D-W-D806348",
    "name": "D-W-D806348",
    "version": "6.0.0-alpha1"
  },
  "input_type": "log",
  "message": {
    "foo": "baar",
    "hello": "world",
    "test": {
      "hello": "world"
    }
  },
  "offset": 202,
  "source": "input.json",
  "type": "log"
}
{
  "@timestamp": "2016-12-08T13:22:54.422Z",
  "beat": {
    "hostname": "D-W-D806348",
    "name": "D-W-D806348",
    "version": "6.0.0-alpha1"
  },
  "input_type": "log",
  "message": {
    "foo": "baar",
    "hello": "world",
    "test": {
      "hello": "world"
    }
  },
  "offset": 303,
  "source": "input.json",
  "type": "log"
}
{
  "@timestamp": "2016-12-08T13:22:54.422Z",
  "beat": {
    "hostname": "D-W-D806348",
    "name": "D-W-D806348",
    "version": "6.0.0-alpha1"
  },
  "input_type": "log",
  "message": "{ \n    \"hello\": \"world\",\n    \"test\": {\n         \"hello\": \"world\"\n     },\n     \"foo\": \"baar\"",
  "offset": 401,
  "source": "input.json",
  "type": "log"
}
 

It always doesn't decode the last line from the multiline pattern, or if you specify only on entry it doesn't decode. See my first commit. (is this another bug??). If i publish to elasticsearch the only record is the with "message": {\n \"hello"... because message is a string like in the template. So 👍 for adding it under json (where i can find this in template?) or a sepcified target.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issue needs further discussion.
Projects
None yet
Development

No branches or pull requests

3 participants