Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor input docs #6537

Merged
merged 4 commits into from
Mar 23, 2018
Merged

Conversation

dedemorton
Copy link
Contributor

@dedemorton dedemorton commented Mar 13, 2018

Resolves doc issue: #6375

I still have a quite a few questions, which I've added as comments flagged for REVIEWERS. Please read and respond to questions.

Here's what the TOC will look like when the book is built:

image

Changes:

To do after merging this PR:

@dedemorton
Copy link
Contributor Author

cc @ruflin Didn't want to flag you as a reviewer, but just so you know this is up.

:type: docker

[[docker-input]]
=== Docker input
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... docker input/prospector is available since 6.1. Only found it mentioned here.

As this is a new file, maybe we want to have an extra PR for it?

Maybe we want to have a separate doc per input type?

How about naming the file input-docker.asciidoc. This way, all input related docs will be displayed together in the file browser.

With inputs/modules being mostly self-contained, I wonder if we want to add the docs to the modules/inputs _meta path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@urso

  • re docker prospector for 6.1: Getting the prospector options broken into separate topics has been on my to-do list for awhile. I probably should have done a separate PR for the structural changes to make it easier to backport them. But since I didn't, I'll just have to create a separate PR for 6.1/6.2 that has relevant changes (if we decide to backport). First I want to make sure this gets into 6.3.
  • This PR does add a separate doc for each input.
  • I'll rename the files as you've suggested
  • If we use _meta folders, I'd like to keep it simple and add an include rather than doing a copy to the docs dir. I've asked ph for his input.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirmed that the docker options are covered in 6.1, just not ideally organized. Backporting this work to 6.2 would be a pretty big effort. The reorg isn't straightforward because it involves moving a lot of content around. I'll revisit the decision later if it makes sense, but for now, I'd like to keep this in 6.3 and not backport.


// REVIEWERS: The filebeat.reference.yml file does not show a docker input type
// so I'm not sure if the following config is correct. docker only shows up under
// autodiscover.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@exekias can you have a look?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example configuration is correct 👍

Options that control how Filebeat deals with log messages that span multiple lines. See <<multiline-examples>> for more information about configuring multiline options.
//REVIEWERS: I'm using the same example here that's used in the topic about the
//log input. I wonder if we have a better one that shows a couple of different
//input types?
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say it's good enough, as you're mostly setting generic configs. Maybe @exekias @ph have some more examples available?


experimental[]
// REVIWERS: I would like to make the following list generated, if possible. Can
// we reuse the logic that we use for generating the modules list?
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be awesome.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm...I looked at the logic that we use to create the list of modules for the docs, and I don't like that it recurses the directory where the source files are stored. I'd rather have something that looks at the hierarchy defined by the asciidoc markup (rather than relying on the source files to be in a particular directory).

Our doc tools really should support a macro or something for generating "jump" lists or tables. This is not a new concept. I could do it in FrameMaker 15 years ago. Rather than coming up with a solution just for Beats, I'm going to open a request in the docs repo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've created this issue to track the enhancement request: elastic/docs#324

* <<stdin-input>>
* <<redis-input>>
* <<udp-input>>
* <<docker-input>>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about renaming the anchors to filebeat-input-<service>. Also name the files input-<service>.asciidoc. So we have some grouping and reduce chance of naming conflicts.

// I backed out my changes, and I'm looking for alternatives. Maybe we should
// make this description more generic, and move the details about reading files
// to the new topic about the log input. WDYT? I might do that in a separate
// PR so we can get the refactoring merged.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on more generic description. As of now filebeat is about collecting logs only, but given as logs can be anything + we might widen the scope of filebeat in the future, events or documents might be more appropriate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I think this whole topic needs a rewrite. I'd like to defer this to another PR. I've created an issue to track the work here: #6572

Filebeat consists of two main components: <<prospector,prospectors>> and <<harvester,harvesters>>. These components work together to tail files and send event data to the output that you specify.
In this topic, you learn about the key building blocks of {beatname_uc} and how they work together. Understanding these concepts will help you make informed decisions about configuring {beatname_uc} for specific use cases.

{beatname_uc} consists of two main components: <<input,inputs>> and <<harvester,harvesters>>. These components work together to tail files and send event data to the output that you specify.
Copy link

@urso urso Mar 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I don't like this documentation file. The point is, the presence of a harvester and a particular harvester implementation is 'implementation specific'. A harvester is basically just a worker, allowing for concurrent collection of logs.

I understand the need of this doc, but in the end, we might have inputs and harvesters/workers documented for each input type.

Different input types might have different strategies for collecting logs. E.g. for stdin there is no real need for an harvester/workers.

We're also about untangling the harvesters, to have per input type worker implementations. The log harvester is overkill for some use-cases, but re-use 'complicates' the logic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agreed with @urso, this is implementation specific as more inputs are create in filebeat each input will need some kind of details on the implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah, I agree. I'm going to track this work in a separate issue: #6572


Filebeat keeps the state of each file and frequently flushes the state to disk in the registry file. The state is used to remember the last offset a harvester was reading from and to ensure all log lines are sent. If the output, such as Elasticsearch or Logstash, is not reachable, Filebeat keeps track of the last lines sent and will continue reading the files as soon as the output becomes available again. While Filebeat is running, the state information is also kept in memory by each prospector. When Filebeat is restarted, data from the registry file is used to rebuild the state, and Filebeat continues each harvester at the last known position.
{beatname_uc} keeps the state of each file and frequently flushes the state to disk in the registry file. The state is used to remember the last offset a harvester was reading from and to ensure all log lines are sent. If the output, such as Elasticsearch or Logstash, is not reachable, {beatname_uc} keeps track of the last lines sent and will continue reading the files as soon as the output becomes available again. While {beatname_uc} is running, the state information is also kept in memory for each input. When {beatname_uc} is restarted, data from the registry file is used to rebuild the state, and {beatname_uc} continues each harvester at the last known position.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'flush' is the operation. One can change this to:
filebeat keeps the file states in memory and writes a snapshot to the registry file whenever events are acknowledged by the outputs or when the registry flush timer is trigger. The flush timer is only active if some state has been changed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@urso I think you're proposing that I replace the entire line with your description, right?


Here's how Filebeat works: When you start Filebeat, it starts one or more prospectors that look in the local paths you've specified for log files. For each log file that the prospector locates, Filebeat starts a harvester. Each harvester reads a single log file for new content and sends the new log data to libbeat, which aggregates the events and sends the aggregated data to the output that you've configured for Filebeat.
// REVIEWERS: Is it still accurate to say "tails the logs" or is that a file
// concept that doesn't make sense for some of the newer inputs that we've added.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kind of depends on the input type. In a 'pull' based input, we will try to 'tail' in a sense of trying to collect new logs only. We also have 'push' based inputs (e.g. syslog or plain UDP/TCP), this would make sense for.

How about 'collect'. Instead of logs we might have events or documents. With introduction of UDP/TCP, we already widened the scope.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made this description more generic.


// REVIEWERS: Which options should I show in the example config? Incidentally,
// in the reference.yml, password shows up twice in the same namespace. This
// needs to be fixed in the reference.yml file
Copy link

@urso urso Mar 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have a ticket for the option being shown twice?

I like only displaying a very minimal config.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created a ticket here: #6573

The `redis` input supports the following configuration options plus the
<<{type}-input-common-options>> described later.

//REVIEWERS: These descriptions are new, so please review carefully.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Redis as frequently as possible without causing {beatname_uc} to scan too
frequently. Do not set this value to less than `1s`.

The default is `10s`.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we want to add a warning here. Redis slow logs are not permanent. We have to connect and query logs. If redis did collect more slow logs then can be buffered within the interval of scan_frequency, logs will be missing.


The network type to use for the Redis connection. The default is `tcp`.

//REVIEWERS: What (if any) other network types are supported here?
Copy link

@urso urso Mar 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... the beats code is not doing validating the network type. And the redis client use does no validation as well.

The stdlib docs say:

Known networks are "tcp", "tcp4" (IPv4-only), "tcp6" (IPv6-only), "udp", "udp4" (IPv4-only), "udp6" (IPv6-only), "ip", "ip4" (IPv4-only), "ip6" (IPv6-only), "unix", "unixgram" and "unixpacket".

The ipX settings do not make sense.

Redis requires TCP -> drop UDP settings.

While redis (theoretical) support unix sockets, I'm not sure about "unixgram" and "unixpacket".

//REVIEWERS: What other options need to be documented here? I don't see host
//and port listed in the reference.yml. Are the host and port separate options?
//What guidance should we provide to help users set these options? Config ex
//above should include this as an eample, too.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ph can you have a look?

----

// REVIEWERS: Are there any stdin-specific config options, or are all the config
// options inherited from libbeat?
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right now it supports all harvester settings (mostly the same as are available for log input type). Some settings just don't make sense for stdin.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@urso Can you look through the document I created and tell me which options make sense for stdin? I put a question mark for the options I think are applicable. https://docs.google.com/spreadsheets/d/1ixRNLA22LiBAgXGZzDBPxz-7kflaSn-JOQkSIcU-d78/edit#gid=0

// have a realistic example here.

// REVIEWERS: Sounds like I definitely need to add multiline to the docs here.
// which other config options from the harvester are used by the docker input?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docker input should support all the same settings as input, paths would be the only one we don't want to expose here, as internally it's defined by containers.ids


// REVIEWERS: I'm guessing where these settings live in the config
// because it's not clear from the docs or the reference.yml. Would be nice to
// have a realistic example here.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A full example could be:

- type: docker
  containers:
    paths: "/var/lib/docker/containers"
    stream: "stdout"
    ids:
      - "*"

This config will read stdout stream from all containers under the default docker containers path

@dedemorton dedemorton force-pushed the refactor_prospectors branch 3 times, most recently from 4e2746c to 46ed769 Compare March 21, 2018 19:33
@dedemorton dedemorton force-pushed the refactor_prospectors branch from 46ed769 to cf69b58 Compare March 22, 2018 00:43
Copy link
Contributor

@ph ph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dedemorton for all this work on improving it! This will help us grow our inputs on the filebeat side.

A Few answer from the issues description:

Add docs for the TCP input (see #6266). @ph If your PR is going into 6.3, I can add the docs to my PR.
It wont be in 6.3, with vacation/eah I will miss the date.

The IIS and Nginx module descriptions in filebeat.reference.yml still refer to prospector. Should I fix?
The manifest for IIS and nginx still refer to prospector. Should I fix?
Yes they should use the new inputs.

Few others points:

  • In some filebeat specific docs we are using the asciidoc variable and some not, it is normal?
  • More of a reflexion, are we thinking about redirection on the docs page for some content?

Filebeat consists of two main components: <<prospector,prospectors>> and <<harvester,harvesters>>. These components work together to tail files and send event data to the output that you specify.
In this topic, you learn about the key building blocks of {beatname_uc} and how they work together. Understanding these concepts will help you make informed decisions about configuring {beatname_uc} for specific use cases.

{beatname_uc} consists of two main components: <<input,inputs>> and <<harvester,harvesters>>. These components work together to tail files and send event data to the output that you specify.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agreed with @urso, this is implementation specific as more inputs are create in filebeat each input will need some kind of details on the implementation.

is combined into a single line before the lines are filtered by `exclude_lines`.

// REVIEWERS: Do I need to make examples like this one more generic to work with
// all the input types where this description will appear, or is this OK?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe its OK.

is `/var/lib/docker/containers`.

// REVIEWERS: Not clear to me if this setting accepts an array or a string. If
// it's just a string, why is it paths (plural)?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dedemorton I believe you are right here, it should be singular.

https://github.com/elastic/beats/blob/master/filebeat/input/docker/config.go#L15-L21

Nothing in the code currently accept multiple paths for the docker containers.

@exekias WDYT to move it to singular?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the setting key is path, this looks like a typo in docs: https://github.com/elastic/beats/blob/master/filebeat/input/docker/config.go#L17

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch @exekias

//REVIEWERS: PH: What other options need to be documented here? I don't see host
//and port listed in the reference.yml. Are the host and port separate options?
//What guidance should we provide to help users set these options? Config ex
//above should include this as an eample, too.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dedemorton Its currently a single option.

It should be:

filebeat.inputs:
- type: udp
  max_message_size: 10240
  host: "localhost:8080"

The default value is "localhost:8080", To be honest, I think we should not provide a default for it and enforce user to set it, so making sure the example provide the host will help us move forward.

FYI, I would like to make it mandatory for 7.0

@@ -96,7 +96,7 @@ section to understand the Filebeat options.
[float]
=== Migrate the "files" section

To migrate the `files` section from the Logstash Forwarder configuration, create a `prospectors` section in the Filebeat config file. For example, assuming that you start
To migrate the `files` section from the Logstash Forwarder configuration, create an `inputs` section in the Filebeat config file. For example, assuming that you start
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double space between an and input


[source,yaml]
-------------------------------------------------------------------------------------
filebeat.prospectors:
filebeat.inputs:
- type: log
paths:
- /var/log/*.log
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use the asciidoc variable for all the filebeat reference in this file? ({beatname_uc})

Copy link
Contributor Author

@dedemorton dedemorton Mar 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes...not urgent, but over time I'm trying to replace references to product names with asciidoc variables whenever I open up a file

@@ -17,7 +17,7 @@ Also read <<yaml-tips>> and <<regexp-support>> to avoid common mistakes.
[[multiline]]
=== Configuration options

You can specify the following options in the `filebeat.prospectors` section of
You can specify the following options in the `filebeat.inputs` section of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use asciidoc variable here? {beatname_uc}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup. I've fixed the references in this file, too. I'm doing these changes in small batches with other changes because it's not always a simple text replacement. Formatting affects how the attributes are interpreted.


Filebeat is a https://www.elastic.co/products/beats[Beat], and it is based on the libbeat framework.
General information about libbeat and setting up Elasticsearch, Logstash, and Kibana are covered in the {libbeat}/index.html[Beats Platform Reference].
{beatname_uc} is a https://www.elastic.co/prproducts/beats[Beat], and it is based on
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo in the URL should be https://www.elastic.co/products/beats

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My cat did that.

each log that {beatname_uc} locates, {beatname_uc} starts a harvester. Each
harvester reads a single log for new content and sends the new log data to
libbeat, which aggregates the events and sends the aggregated data to the output
that you've configured for {beatname_uc}.

image:./images/filebeat.png[Beats design]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have replaced Filebeat with the variable should we do the same with this path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in this case because the actual filename on disk also needs to be changed.

@dedemorton
Copy link
Contributor Author

dedemorton commented Mar 23, 2018

@ph I think I've addressed all of your issues.

Here are a few things left to do/decide that, IMO, can wait until after this thing is merged. I've opened issues for all the other stuff I found.

  • Do we have the correct options listed for all of the input types? https://docs.google.com/spreadsheets/d/1ixRNLA22LiBAgXGZzDBPxz-7kflaSn-JOQkSIcU-d78/edit#gid=0
  • Do we need to have the source for the input docs in _meta files under the inputs? I'd rather not because it complicates everything (including the doc build). I vote to leave as-is for the initial merge.
  • Todo: DeDe to set up redirects. We can't redirect links to anchors, but any existing links to the config options will point to the log input type, or the container topic for all input types.

@ph
Copy link
Contributor

ph commented Mar 23, 2018

I believe you have all the options, from what I've read.

Concerning the location of the input doc, I agree to keep them where they are currently. We can always move the doc in the _meta folder later if we wish to have module more self contained.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants