Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add experimental Docker json-file prospector #5402

Merged
merged 1 commit into from
Nov 16, 2017

Conversation

exekias
Copy link
Contributor

@exekias exekias commented Oct 18, 2017

This pull request adds an experimental docker prospector. Docker supports different ways of outputting logs, this prospector implements json-file.

By offering the prospector we abstract the format from the user, so there is no need to manually configure JSON decoding, also we can get timestamp from the JSON file, something that's not possible as of now from existing settings.

prospectors:
   - type: docker
     containers.ids:
       - c3ec7a0bd9640151a768663b7e78c115d5b1a7f87fba572666bacd8065893d41

This would read all logs:

prospectors:
   - type: docker
     containers.ids:
       - '*'

This also abstract specifics of the chosen provider, for this case this would create a file prospector for /var/lib/docker/containers/c3ec7a0bd9640151a768663b7e78c115d5b1a7f87fba572666bacd8065893d41/*.log.

In the future, we can extend this to support other logging drivers/formats, like CRI format, or a plugin implementation. I would add a conf key like mode: json-file once that happens.

TODO:

  • Abstract file paths by adding a container_id parameter.
  • Extract @timestamp from JSON
  • Add tests

@exekias exekias added discuss Issue needs further discussion. enhancement Filebeat Filebeat in progress Pull request is currently in progress. labels Oct 18, 2017
Copy link
Contributor

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going forth and back on this one if we should have config options to support json in json or a prospector type. The reason I'm leaning in the direction of the prospector type as you did here is that it allows us the change the implementation in the future and add more magic without affecting the user config. +1 on this :-)

@exekias Could you add some more details to the PR description on what this prospector does. An example on what the input log data looks like and what it makes out of it would be great.

LogType: {},
RedisType: {},
UdpType: {},
StdinType: {},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kvch Not related to this PR, but as we register the prospectors now I'm curious if we still need this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ValidType is still used in log prospector for config validation and opening the right input.
But there is no other check/use of it. The check now happens when a factory is retrieved from registry using GetFactory. So it is not required, but this part needs further refactoring to make ValidType log specific. For the sake of consistency and to make refactoring easier, it is acceptable to add docker to the list.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed #5537 to remove this part.

@exekias exekias changed the title [WIP] Add experimental Docker json-file prospector Add experimental Docker json-file prospector Oct 30, 2017
@ruflin
Copy link
Contributor

ruflin commented Nov 7, 2017

Lets move this forward to play around with it but mark it experimental for now. Can you add an experimental log message and add it to the CHANGELOG?

@kwojcicki
Copy link
Contributor

I would like to have filebeat only harvest from currently active docker containers. Was wondering if this would be an appropriate thing to add to this docker prospector or maybe it should be added somewhere else?

LogType = "log"
StdinType = "stdin"
RedisType = "redis"
UdpType = "udp"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const UdpType should be UDPType

@@ -42,3 +42,17 @@ func (msg *Message) AddFields(fields common.MapStr) {
}
msg.Fields.Update(fields)
}

func (msg *Message) RemoveFields(fields ...string) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported method Message.RemoveFields should have comment or be unexported
receiver name msg should be consistent with previous receiver name m for Message

@exekias exekias force-pushed the docker-prospector branch 2 times, most recently from 0d6a2ca to cdf12cb Compare November 14, 2017 12:16
@@ -42,3 +43,18 @@ func (msg *Message) AddFields(fields common.MapStr) {
}
msg.Fields.Update(fields)
}

// RemoveFields removes given fields from the message
func (msg *Message) RemoveFields(fields ...string) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

receiver name msg should be consistent with previous receiver name m for Message

@exekias exekias added review and removed discuss Issue needs further discussion. in progress Pull request is currently in progress. labels Nov 14, 2017
@exekias
Copy link
Contributor Author

exekias commented Nov 14, 2017

This should be ready for a second look @ruflin, I implemented timestamp parsing, put message in the right place and added some tests

@exekias
Copy link
Contributor Author

exekias commented Nov 14, 2017

jenkins, retest this please

Copy link
Contributor

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left some minor comments.

}

message.AddFields(common.MapStr{
"stream": line.Stream,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you give some details on what values stream will have? Should we add this to a fields.yml?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see below this is stdin for example. If we know this field will exist, probably worth adding as keyword to fields.yml

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pushed it, thanks!

// Parse timestamp
ts, err := time.Parse(time.RFC3339, line.Timestamp)
if err != nil {
return message, errors.Wrap(err, "parsing docker timestamp")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps worth adding the given timestamp to the error. Probably already part of err?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LogType: {},
RedisType: {},
UdpType: {},
StdinType: {},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed #5537 to remove this part.

}

type config struct {
ContainersIDs []string `config:"containers_ids"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make this container.ids and container.path or containers.ids and containers.path?

@@ -83,6 +83,9 @@ type config struct {
MaxBytes int `config:"max_bytes" validate:"min=0,nonzero"`
Multiline *reader.MultilineConfig `config:"multiline"`
JSON *reader.JSONConfig `config:"json"`

// Hidden on purpose, used by the docker prospector:
DockerJSON bool `config:"docker-json"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of hacky but don't have a better solution.

Copy link
Contributor Author

@exekias exekias Nov 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@urso recommended something like this to avoid repeating code or a major refactor, works for me 😇

@@ -515,6 +517,11 @@ func (h *Harvester) newLogFileReader() (reader.Reader, error) {
return nil, err
}

if h.config.DockerJSON {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's interesting that form an implementation point of view it just comes down to adding a reader in the harvester (simplified). Seems like the log harvester is pretty generic and perhaps should be extracted. Prospectors then could configure "readers" on the harvester. (just thinking out loud for later and @urso was also talking about similar stuff in the past).

@exekias
Copy link
Contributor Author

exekias commented Nov 15, 2017

rebased after master changes

@@ -23,6 +23,12 @@
description: >
The content of the line read from the log file.

- name: stream
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we call this log.stream or something similar? log.source? Source is already overloaded.

@weltenwort has some ideas for a better name?

@ruflin ruflin merged commit 4b3f9e3 into elastic:master Nov 16, 2017
@ruflin
Copy link
Contributor

ruflin commented Nov 16, 2017

Merged this one as the field naming discussion we can still do in a follow up PR.

@tsg
Copy link
Contributor

tsg commented Nov 21, 2017

@exekias Please remember to add docs for this one before the 6.1 GA date.

exekias added a commit to exekias/beats that referenced this pull request Nov 29, 2017
ruflin pushed a commit that referenced this pull request Nov 29, 2017
exekias added a commit to exekias/beats that referenced this pull request Nov 30, 2017
This adds docs for elastic#5402
(cherry picked from commit c37cc31)
kvch pushed a commit that referenced this pull request Nov 30, 2017
This adds docs for #5402
(cherry picked from commit c37cc31)
@dedemorton
Copy link
Contributor

Removing needs_docs label because it looks like docs were added in #5752

leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants