Refactor input docs #6537

dedemorton · 2018-03-13T00:39:02Z

Resolves doc issue: #6375

I still have a quite a few questions, which I've added as comments flagged for REVIEWERS. Please read and respond to questions.

Here's what the TOC will look like when the book is built:

Changes:

adds docs about UDP input for UDP prospector type with plain harvester #4452
renames prospector > inputs for [WIP] Rename prospector to input #5944 and Meta: Refactor prospector to input #6122
Also some light cleanup to use attribute for the Filebeat name (more work is required, but it's not urgent)

To do after merging this PR:

Create issue to track: The IIS and Nginx module descriptions in filebeat.reference.yml still refer to prospector. Should I fix? Created Some files in the IIS and Nginx modules show "prospector" instead of "input" #6642
Create issue to track: The manifest for IIS and nginx still refer to prospector. Created Some files in the IIS and Nginx modules show "prospector" instead of "input" #6642
Create issue to track: rework the topic about how filebeat works to be more generic and perhaps move file-specific into to the topic about the log input, if relevant. Created: Make How Filebeat Works description more generic #6572

dedemorton · 2018-03-13T00:41:47Z

cc @ruflin Didn't want to flag you as a reviewer, but just so you know this is up.

urso · 2018-03-13T14:51:03Z

filebeat/docs/docker-input.asciidoc

+:type: docker
+
+[[docker-input]]
+=== Docker input


Hmmm... docker input/prospector is available since 6.1. Only found it mentioned here.

As this is a new file, maybe we want to have an extra PR for it?

Maybe we want to have a separate doc per input type?

How about naming the file input-docker.asciidoc. This way, all input related docs will be displayed together in the file browser.

With inputs/modules being mostly self-contained, I wonder if we want to add the docs to the modules/inputs _meta path.

@urso

re docker prospector for 6.1: Getting the prospector options broken into separate topics has been on my to-do list for awhile. I probably should have done a separate PR for the structural changes to make it easier to backport them. But since I didn't, I'll just have to create a separate PR for 6.1/6.2 that has relevant changes (if we decide to backport). First I want to make sure this gets into 6.3.

This PR does add a separate doc for each input.

I'll rename the files as you've suggested

If we use _meta folders, I'd like to keep it simple and add an include rather than doing a copy to the docs dir. I've asked ph for his input.

I confirmed that the docker options are covered in 6.1, just not ideally organized. Backporting this work to 6.2 would be a pretty big effort. The reorg isn't straightforward because it involves moving a lot of content around. I'll revisit the decision later if it makes sense, but for now, I'd like to keep this in 6.3 and not backport.

urso · 2018-03-13T14:51:26Z

filebeat/docs/docker-input.asciidoc

+
+// REVIEWERS: The filebeat.reference.yml file does not show a docker input type
+// so I'm not sure if the following config is correct. docker only shows up under
+// autodiscover.


@exekias can you have a look?

The example configuration is correct 👍

urso · 2018-03-13T14:58:18Z

filebeat/docs/filebeat-options.asciidoc

-Options that control how Filebeat deals with log messages that span multiple lines. See <<multiline-examples>> for more information about configuring multiline options.
+//REVIEWERS: I'm using the same example here that's used in the topic about the
+//log input. I wonder if we have a better one that shows a couple of different
+//input types?


I would say it's good enough, as you're mostly setting generic configs. Maybe @exekias @ph have some more examples available?

urso · 2018-03-13T14:58:59Z

filebeat/docs/filebeat-options.asciidoc


-experimental[]
+// REVIWERS: I would like to make the following list generated, if possible. Can
+// we reuse the logic that we use for generating the modules list?


This would be awesome.

Hmm...I looked at the logic that we use to create the list of modules for the docs, and I don't like that it recurses the directory where the source files are stored. I'd rather have something that looks at the hierarchy defined by the asciidoc markup (rather than relying on the source files to be in a particular directory).

Our doc tools really should support a macro or something for generating "jump" lists or tables. This is not a new concept. I could do it in FrameMaker 15 years ago. Rather than coming up with a solution just for Beats, I'm going to open a request in the docs repo.

I've created this issue to track the enhancement request: elastic/docs#324

urso · 2018-03-13T15:00:30Z

filebeat/docs/filebeat-options.asciidoc

+* <<stdin-input>>
+* <<redis-input>>
+* <<udp-input>>
+* <<docker-input>>


How about renaming the anchors to filebeat-input-<service>. Also name the files input-<service>.asciidoc. So we have some grouping and reduce chance of naming conflicts.

urso · 2018-03-13T15:02:51Z

filebeat/docs/how-filebeat-works.asciidoc

+// I backed out my changes, and I'm looking for alternatives. Maybe we should
+// make this description more generic, and move the details about reading files
+// to the new topic about the log input. WDYT? I might do that in a separate
+// PR so we can get the refactoring merged.


+1 on more generic description. As of now filebeat is about collecting logs only, but given as logs can be anything + we might widen the scope of filebeat in the future, events or documents might be more appropriate.

TBH, I think this whole topic needs a rewrite. I'd like to defer this to another PR. I've created an issue to track the work here: #6572

urso · 2018-03-13T15:13:03Z

filebeat/docs/how-filebeat-works.asciidoc

-Filebeat consists of two main components: <<prospector,prospectors>> and <<harvester,harvesters>>. These components work together to tail files and send event data to the output that you specify.
+In this topic, you learn about the key building blocks of {beatname_uc} and how they work together. Understanding these concepts will help you make informed decisions about configuring {beatname_uc} for specific use cases.
+
+{beatname_uc} consists of two main components: <<input,inputs>> and <<harvester,harvesters>>. These components work together to tail files and send event data to the output that you specify.


TBH I don't like this documentation file. The point is, the presence of a harvester and a particular harvester implementation is 'implementation specific'. A harvester is basically just a worker, allowing for concurrent collection of logs.

I understand the need of this doc, but in the end, we might have inputs and harvesters/workers documented for each input type.

Different input types might have different strategies for collecting logs. E.g. for stdin there is no real need for an harvester/workers.

We're also about untangling the harvesters, to have per input type worker implementations. The log harvester is overkill for some use-cases, but re-use 'complicates' the logic.

I agreed with @urso, this is implementation specific as more inputs are create in filebeat each input will need some kind of details on the implementation.

Yah, I agree. I'm going to track this work in a separate issue: #6572

urso · 2018-03-13T15:19:17Z

filebeat/docs/how-filebeat-works.asciidoc


-Filebeat keeps the state of each file and frequently flushes the state to disk in the registry file. The state is used to remember the last offset a harvester was reading from and to ensure all log lines are sent. If the output, such as Elasticsearch or Logstash, is not reachable, Filebeat keeps track of the last lines sent and will continue reading the files as soon as the output becomes available again. While Filebeat is running, the state information is also kept in memory by each prospector. When Filebeat is restarted, data from the registry file is used to rebuild the state, and Filebeat continues each harvester at the last known position.
+{beatname_uc} keeps the state of each file and frequently flushes the state to disk in the registry file. The state is used to remember the last offset a harvester was reading from and to ensure all log lines are sent. If the output, such as Elasticsearch or Logstash, is not reachable, {beatname_uc} keeps track of the last lines sent and will continue reading the files as soon as the output becomes available again. While {beatname_uc} is running, the state information is also kept in memory for each input. When {beatname_uc} is restarted, data from the registry file is used to rebuild the state, and {beatname_uc} continues each harvester at the last known position.


The 'flush' is the operation. One can change this to:
filebeat keeps the file states in memory and writes a snapshot to the registry file whenever events are acknowledged by the outputs or when the registry flush timer is trigger. The flush timer is only active if some state has been changed.

@urso I think you're proposing that I replace the entire line with your description, right?

urso · 2018-03-13T15:24:36Z

filebeat/docs/overview.asciidoc


-Here's how Filebeat works: When you start Filebeat, it starts one or more prospectors that look in the local paths you've specified for log files. For each log file that the prospector locates, Filebeat starts a harvester. Each harvester reads a single log file for new content and sends the new log data to libbeat, which aggregates the events and sends the aggregated data to the output that you've configured for Filebeat.
+// REVIEWERS: Is it still accurate to say "tails the logs" or is that a file
+// concept that doesn't make sense for some of the newer inputs that we've added.


kind of depends on the input type. In a 'pull' based input, we will try to 'tail' in a sense of trying to collect new logs only. We also have 'push' based inputs (e.g. syslog or plain UDP/TCP), this would make sense for.

How about 'collect'. Instead of logs we might have events or documents. With introduction of UDP/TCP, we already widened the scope.

I've made this description more generic.

urso · 2018-03-13T15:25:29Z

filebeat/docs/redis-input.asciidoc

+
+// REVIEWERS: Which options should I show in the example config? Incidentally,
+// in the reference.yml, password shows up twice in the same namespace. This
+// needs to be fixed in the reference.yml file


we have a ticket for the option being shown twice?

I like only displaying a very minimal config.

Created a ticket here: #6573

urso · 2018-03-13T15:26:51Z

filebeat/docs/redis-input.asciidoc

+The `redis` input supports the following configuration options plus the
+<<{type}-input-common-options>> described later.
+
+//REVIEWERS: These descriptions are new, so please review carefully.


urso · 2018-03-13T15:30:58Z

filebeat/docs/redis-input.asciidoc

+Redis as frequently as possible without causing {beatname_uc} to scan too
+frequently. Do not set this value to less than `1s`.
+
+The default is `10s`.


Maybe we want to add a warning here. Redis slow logs are not permanent. We have to connect and query logs. If redis did collect more slow logs then can be buffered within the interval of scan_frequency, logs will be missing.

urso · 2018-03-13T15:52:13Z

filebeat/docs/redis-input.asciidoc

+
+The network type to use for the Redis connection. The default is `tcp`.
+
+//REVIEWERS: What (if any) other network types are supported here?


Hmm... the beats code is not doing validating the network type. And the redis client use does no validation as well.

The stdlib docs say:

Known networks are "tcp", "tcp4" (IPv4-only), "tcp6" (IPv6-only), "udp", "udp4" (IPv4-only), "udp6" (IPv6-only), "ip", "ip4" (IPv4-only), "ip6" (IPv6-only), "unix", "unixgram" and "unixpacket".

The ipX settings do not make sense.

Redis requires TCP -> drop UDP settings.

While redis (theoretical) support unix sockets, I'm not sure about "unixgram" and "unixpacket".

urso · 2018-03-13T15:57:52Z

filebeat/docs/udp-input.asciidoc

+//REVIEWERS: What other options need to be documented here? I don't see host
+//and port listed in the reference.yml. Are the host and port separate options?
+//What guidance should we provide to help users set these options? Config ex
+//above should include this as an eample, too.


@ph can you have a look?

urso · 2018-03-13T15:59:53Z

filebeat/docs/stdin-input.asciidoc

+----
+
+// REVIEWERS: Are there any stdin-specific config options, or are all the config
+// options inherited from libbeat?


right now it supports all harvester settings (mostly the same as are available for log input type). Some settings just don't make sense for stdin.

@urso Can you look through the document I created and tell me which options make sense for stdin? I put a question mark for the options I think are applicable. https://docs.google.com/spreadsheets/d/1ixRNLA22LiBAgXGZzDBPxz-7kflaSn-JOQkSIcU-d78/edit#gid=0

exekias · 2018-03-14T00:19:58Z

filebeat/docs/docker-input.asciidoc

+// have a realistic example here.
+
+// REVIEWERS: Sounds like I definitely need to add multiline to the docs here.
+// which other config options from the harvester are used by the docker input?


Docker input should support all the same settings as input, paths would be the only one we don't want to expose here, as internally it's defined by containers.ids

exekias · 2018-03-14T00:22:03Z

filebeat/docs/docker-input.asciidoc

+
+// REVIEWERS: I'm guessing where these settings live in the config 
+// because it's not clear from the docs or the reference.yml. Would be nice to
+// have a realistic example here.


A full example could be:

- type: docker containers: paths: "/var/lib/docker/containers" stream: "stdout" ids: - "*"

This config will read stdout stream from all containers under the default docker containers path

ph

Thanks @dedemorton for all this work on improving it! This will help us grow our inputs on the filebeat side.

A Few answer from the issues description:

Add docs for the TCP input (see #6266). @ph If your PR is going into 6.3, I can add the docs to my PR.
It wont be in 6.3, with vacation/eah I will miss the date.

The IIS and Nginx module descriptions in filebeat.reference.yml still refer to prospector. Should I fix?
The manifest for IIS and nginx still refer to prospector. Should I fix?
Yes they should use the new inputs.

Few others points:

In some filebeat specific docs we are using the asciidoc variable and some not, it is normal?
More of a reflexion, are we thinking about redirection on the docs page for some content?

ph · 2018-03-22T17:53:31Z

filebeat/docs/how-filebeat-works.asciidoc

-Filebeat consists of two main components: <<prospector,prospectors>> and <<harvester,harvesters>>. These components work together to tail files and send event data to the output that you specify.
+In this topic, you learn about the key building blocks of {beatname_uc} and how they work together. Understanding these concepts will help you make informed decisions about configuring {beatname_uc} for specific use cases.
+
+{beatname_uc} consists of two main components: <<input,inputs>> and <<harvester,harvesters>>. These components work together to tail files and send event data to the output that you specify.


I agreed with @urso, this is implementation specific as more inputs are create in filebeat each input will need some kind of details on the implementation.

ph · 2018-03-22T17:55:37Z

filebeat/docs/inputs/input-common-harvester-options.asciidoc

+is combined into a single line before the lines are filtered by `exclude_lines`.
+
+// REVIEWERS: Do I need to make examples like this one more generic to work with
+// all the input types where this description will appear, or is this OK?


I believe its OK.

ph · 2018-03-22T18:00:09Z

filebeat/docs/inputs/input-docker.asciidoc

+is `/var/lib/docker/containers`.
+
+// REVIEWERS: Not clear to me if this setting accepts an array or a string. If
+// it's just a string, why is it paths (plural)?


@dedemorton I believe you are right here, it should be singular.

https://github.com/elastic/beats/blob/master/filebeat/input/docker/config.go#L15-L21

Nothing in the code currently accept multiple paths for the docker containers.

@exekias WDYT to move it to singular?

Actually, the setting key is path, this looks like a typo in docs: https://github.com/elastic/beats/blob/master/filebeat/input/docker/config.go#L17

Good catch @exekias

ph · 2018-03-22T18:06:47Z

filebeat/docs/inputs/input-udp.asciidoc

+//REVIEWERS: PH: What other options need to be documented here? I don't see host
+//and port listed in the reference.yml. Are the host and port separate options?
+//What guidance should we provide to help users set these options? Config ex
+//above should include this as an eample, too.


@dedemorton Its currently a single option.

It should be:

filebeat.inputs: - type: udp max_message_size: 10240 host: "localhost:8080"

The default value is "localhost:8080", To be honest, I think we should not provide a default for it and enforce user to set it, so making sure the example provide the host will help us move forward.

FYI, I would like to make it mandatory for 7.0

ph · 2018-03-22T18:07:25Z

filebeat/docs/migration.asciidoc

@@ -96,7 +96,7 @@ section to understand the Filebeat options.
 [float]
 === Migrate the "files" section

-To migrate the `files` section from the Logstash Forwarder configuration, create a  `prospectors` section in the Filebeat config file. For example, assuming that you start
+To migrate the `files` section from the Logstash Forwarder configuration, create an  `inputs` section in the Filebeat config file. For example, assuming that you start


Double space between an and input

ph · 2018-03-22T18:08:38Z

filebeat/docs/migration.asciidoc


 [source,yaml]
 -------------------------------------------------------------------------------------
-filebeat.prospectors:
+filebeat.inputs:
 - type: log
  paths:
    - /var/log/*.log


Should we use the asciidoc variable for all the filebeat reference in this file? ({beatname_uc})

Yes...not urgent, but over time I'm trying to replace references to product names with asciidoc variables whenever I open up a file

ph · 2018-03-22T18:09:23Z

filebeat/docs/multiline.asciidoc

@@ -17,7 +17,7 @@ Also read <<yaml-tips>> and <<regexp-support>> to avoid common mistakes.
 [[multiline]]
 === Configuration options

-You can specify the following options in the `filebeat.prospectors` section of
+You can specify the following options in the `filebeat.inputs` section of


Should we use asciidoc variable here? {beatname_uc}

yup. I've fixed the references in this file, too. I'm doing these changes in small batches with other changes because it's not always a simple text replacement. Formatting affects how the attributes are interpreted.

ph · 2018-03-22T18:10:17Z

filebeat/docs/overview.asciidoc


-Filebeat is a https://www.elastic.co/products/beats[Beat], and it is based on the libbeat framework.
-General information about libbeat and setting up Elasticsearch, Logstash, and Kibana are covered in the {libbeat}/index.html[Beats Platform Reference].
+{beatname_uc} is a https://www.elastic.co/prproducts/beats[Beat], and it is based on


typo in the URL should be https://www.elastic.co/products/beats

My cat did that.

ph · 2018-03-22T18:10:45Z

filebeat/docs/overview.asciidoc

+each log that {beatname_uc} locates, {beatname_uc} starts a harvester. Each
+harvester reads a single log for new content and sends the new log data to
+libbeat, which aggregates the events and sends the aggregated data to the output
+that you've configured for {beatname_uc}.

 image:./images/filebeat.png[Beats design]


We have replaced Filebeat with the variable should we do the same with this path?

Not in this case because the actual filename on disk also needs to be changed.

dedemorton · 2018-03-23T00:10:46Z

@ph I think I've addressed all of your issues.

Here are a few things left to do/decide that, IMO, can wait until after this thing is merged. I've opened issues for all the other stuff I found.

Do we have the correct options listed for all of the input types? https://docs.google.com/spreadsheets/d/1ixRNLA22LiBAgXGZzDBPxz-7kflaSn-JOQkSIcU-d78/edit#gid=0
Do we need to have the source for the input docs in _meta files under the inputs? I'd rather not because it complicates everything (including the doc build). I vote to leave as-is for the initial merge.
Todo: DeDe to set up redirects. We can't redirect links to anchors, but any existing links to the config options will point to the log input type, or the container topic for all input types.

ph · 2018-03-23T00:16:08Z

I believe you have all the options, from what I've read.

Concerning the location of the input doc, I agree to keep them where they are currently. We can always move the doc in the _meta folder later if we wish to have module more self contained.

dedemorton added docs review labels Mar 13, 2018

dedemorton requested review from ph and urso March 13, 2018 00:40

urso reviewed Mar 13, 2018

View reviewed changes

exekias reviewed Mar 14, 2018

View reviewed changes

Refactor input docs

7d9d1ed

dedemorton force-pushed the refactor_prospectors branch 3 times, most recently from 4e2746c to 46ed769 Compare March 21, 2018 19:33

Changes from the review

cf69b58

dedemorton force-pushed the refactor_prospectors branch from 46ed769 to cf69b58 Compare March 22, 2018 00:43

Reset type attribute at the end of each file

57ac7e8

ph suggested changes Mar 22, 2018

View reviewed changes

Add changes from 2nd review

889de6e

ph approved these changes Mar 23, 2018

View reviewed changes

dedemorton merged commit f01b064 into elastic:master Mar 23, 2018

dedemorton mentioned this pull request Mar 23, 2018

Update docs to reflect prospector > input refactoring #6375

Closed


		Filebeat keeps the state of each file and frequently flushes the state to disk in the registry file. The state is used to remember the last offset a harvester was reading from and to ensure all log lines are sent. If the output, such as Elasticsearch or Logstash, is not reachable, Filebeat keeps track of the last lines sent and will continue reading the files as soon as the output becomes available again. While Filebeat is running, the state information is also kept in memory by each prospector. When Filebeat is restarted, data from the registry file is used to rebuild the state, and Filebeat continues each harvester at the last known position.
		{beatname_uc} keeps the state of each file and frequently flushes the state to disk in the registry file. The state is used to remember the last offset a harvester was reading from and to ensure all log lines are sent. If the output, such as Elasticsearch or Logstash, is not reachable, {beatname_uc} keeps track of the last lines sent and will continue reading the files as soon as the output becomes available again. While {beatname_uc} is running, the state information is also kept in memory for each input. When {beatname_uc} is restarted, data from the registry file is used to rebuild the state, and {beatname_uc} continues each harvester at the last known position.


		The network type to use for the Redis connection. The default is `tcp`.

		//REVIEWERS: What (if any) other network types are supported here?

Refactor input docs #6537

Refactor input docs #6537

Conversation

dedemorton commented Mar 13, 2018 • edited Loading

dedemorton commented Mar 13, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

urso Mar 13, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

urso Mar 13, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

urso Mar 13, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ph left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dedemorton Mar 22, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dedemorton commented Mar 23, 2018 • edited Loading

ph commented Mar 23, 2018

dedemorton commented Mar 13, 2018 •

edited

Loading

urso Mar 13, 2018 •

edited

Loading

urso Mar 13, 2018 •

edited

Loading

urso Mar 13, 2018 •

edited

Loading

dedemorton Mar 22, 2018 •

edited

Loading

dedemorton commented Mar 23, 2018 •

edited

Loading