-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor input docs #6537
Refactor input docs #6537
Conversation
cc @ruflin Didn't want to flag you as a reviewer, but just so you know this is up. |
filebeat/docs/docker-input.asciidoc
Outdated
:type: docker | ||
|
||
[[docker-input]] | ||
=== Docker input |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm... docker input/prospector is available since 6.1. Only found it mentioned here.
As this is a new file, maybe we want to have an extra PR for it?
Maybe we want to have a separate doc per input type?
How about naming the file input-docker.asciidoc
. This way, all input related docs will be displayed together in the file browser.
With inputs/modules being mostly self-contained, I wonder if we want to add the docs to the modules/inputs _meta
path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- re docker prospector for 6.1: Getting the prospector options broken into separate topics has been on my to-do list for awhile. I probably should have done a separate PR for the structural changes to make it easier to backport them. But since I didn't, I'll just have to create a separate PR for 6.1/6.2 that has relevant changes (if we decide to backport). First I want to make sure this gets into 6.3.
- This PR does add a separate doc for each input.
- I'll rename the files as you've suggested
- If we use _meta folders, I'd like to keep it simple and add an include rather than doing a copy to the docs dir. I've asked ph for his input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I confirmed that the docker options are covered in 6.1, just not ideally organized. Backporting this work to 6.2 would be a pretty big effort. The reorg isn't straightforward because it involves moving a lot of content around. I'll revisit the decision later if it makes sense, but for now, I'd like to keep this in 6.3 and not backport.
filebeat/docs/docker-input.asciidoc
Outdated
|
||
// REVIEWERS: The filebeat.reference.yml file does not show a docker input type | ||
// so I'm not sure if the following config is correct. docker only shows up under | ||
// autodiscover. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@exekias can you have a look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The example configuration is correct 👍
Options that control how Filebeat deals with log messages that span multiple lines. See <<multiline-examples>> for more information about configuring multiline options. | ||
//REVIEWERS: I'm using the same example here that's used in the topic about the | ||
//log input. I wonder if we have a better one that shows a couple of different | ||
//input types? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
experimental[] | ||
// REVIWERS: I would like to make the following list generated, if possible. Can | ||
// we reuse the logic that we use for generating the modules list? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be awesome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm...I looked at the logic that we use to create the list of modules for the docs, and I don't like that it recurses the directory where the source files are stored. I'd rather have something that looks at the hierarchy defined by the asciidoc markup (rather than relying on the source files to be in a particular directory).
Our doc tools really should support a macro or something for generating "jump" lists or tables. This is not a new concept. I could do it in FrameMaker 15 years ago. Rather than coming up with a solution just for Beats, I'm going to open a request in the docs repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've created this issue to track the enhancement request: elastic/docs#324
* <<stdin-input>> | ||
* <<redis-input>> | ||
* <<udp-input>> | ||
* <<docker-input>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about renaming the anchors to filebeat-input-<service>
. Also name the files input-<service>.asciidoc
. So we have some grouping and reduce chance of naming conflicts.
// I backed out my changes, and I'm looking for alternatives. Maybe we should | ||
// make this description more generic, and move the details about reading files | ||
// to the new topic about the log input. WDYT? I might do that in a separate | ||
// PR so we can get the refactoring merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on more generic description. As of now filebeat is about collecting logs only, but given as logs can be anything + we might widen the scope of filebeat in the future, events
or documents
might be more appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH, I think this whole topic needs a rewrite. I'd like to defer this to another PR. I've created an issue to track the work here: #6572
Filebeat consists of two main components: <<prospector,prospectors>> and <<harvester,harvesters>>. These components work together to tail files and send event data to the output that you specify. | ||
In this topic, you learn about the key building blocks of {beatname_uc} and how they work together. Understanding these concepts will help you make informed decisions about configuring {beatname_uc} for specific use cases. | ||
|
||
{beatname_uc} consists of two main components: <<input,inputs>> and <<harvester,harvesters>>. These components work together to tail files and send event data to the output that you specify. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH I don't like this documentation file. The point is, the presence of a harvester and a particular harvester implementation is 'implementation specific'. A harvester is basically just a worker, allowing for concurrent collection of logs.
I understand the need of this doc, but in the end, we might have inputs and harvesters/workers documented for each input type.
Different input types might have different strategies for collecting logs. E.g. for stdin there is no real need for an harvester/workers.
We're also about untangling the harvesters, to have per input type worker implementations. The log harvester is overkill for some use-cases, but re-use 'complicates' the logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agreed with @urso, this is implementation specific as more inputs
are create in filebeat each input will need some kind of details on the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yah, I agree. I'm going to track this work in a separate issue: #6572
|
||
Filebeat keeps the state of each file and frequently flushes the state to disk in the registry file. The state is used to remember the last offset a harvester was reading from and to ensure all log lines are sent. If the output, such as Elasticsearch or Logstash, is not reachable, Filebeat keeps track of the last lines sent and will continue reading the files as soon as the output becomes available again. While Filebeat is running, the state information is also kept in memory by each prospector. When Filebeat is restarted, data from the registry file is used to rebuild the state, and Filebeat continues each harvester at the last known position. | ||
{beatname_uc} keeps the state of each file and frequently flushes the state to disk in the registry file. The state is used to remember the last offset a harvester was reading from and to ensure all log lines are sent. If the output, such as Elasticsearch or Logstash, is not reachable, {beatname_uc} keeps track of the last lines sent and will continue reading the files as soon as the output becomes available again. While {beatname_uc} is running, the state information is also kept in memory for each input. When {beatname_uc} is restarted, data from the registry file is used to rebuild the state, and {beatname_uc} continues each harvester at the last known position. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 'flush' is the operation. One can change this to:
filebeat keeps the file states in memory and writes a snapshot to the registry file whenever events are acknowledged by the outputs or when the registry flush timer is trigger. The flush timer is only active if some state has been changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@urso I think you're proposing that I replace the entire line with your description, right?
filebeat/docs/overview.asciidoc
Outdated
|
||
Here's how Filebeat works: When you start Filebeat, it starts one or more prospectors that look in the local paths you've specified for log files. For each log file that the prospector locates, Filebeat starts a harvester. Each harvester reads a single log file for new content and sends the new log data to libbeat, which aggregates the events and sends the aggregated data to the output that you've configured for Filebeat. | ||
// REVIEWERS: Is it still accurate to say "tails the logs" or is that a file | ||
// concept that doesn't make sense for some of the newer inputs that we've added. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kind of depends on the input type. In a 'pull' based input, we will try to 'tail' in a sense of trying to collect new logs only. We also have 'push' based inputs (e.g. syslog or plain UDP/TCP), this would make sense for.
How about 'collect'. Instead of logs
we might have events
or documents
. With introduction of UDP/TCP, we already widened the scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made this description more generic.
filebeat/docs/redis-input.asciidoc
Outdated
|
||
// REVIEWERS: Which options should I show in the example config? Incidentally, | ||
// in the reference.yml, password shows up twice in the same namespace. This | ||
// needs to be fixed in the reference.yml file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have a ticket for the option being shown twice?
I like only displaying a very minimal config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created a ticket here: #6573
filebeat/docs/redis-input.asciidoc
Outdated
The `redis` input supports the following configuration options plus the | ||
<<{type}-input-common-options>> described later. | ||
|
||
//REVIEWERS: These descriptions are new, so please review carefully. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
filebeat/docs/redis-input.asciidoc
Outdated
Redis as frequently as possible without causing {beatname_uc} to scan too | ||
frequently. Do not set this value to less than `1s`. | ||
|
||
The default is `10s`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we want to add a warning here. Redis slow logs are not permanent. We have to connect and query logs. If redis did collect more slow logs then can be buffered within the interval of scan_frequency
, logs will be missing.
filebeat/docs/redis-input.asciidoc
Outdated
|
||
The network type to use for the Redis connection. The default is `tcp`. | ||
|
||
//REVIEWERS: What (if any) other network types are supported here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... the beats code is not doing validating the network type. And the redis client use does no validation as well.
The stdlib docs say:
Known networks are "tcp", "tcp4" (IPv4-only), "tcp6" (IPv6-only), "udp", "udp4" (IPv4-only), "udp6" (IPv6-only), "ip", "ip4" (IPv4-only), "ip6" (IPv6-only), "unix", "unixgram" and "unixpacket".
The ipX
settings do not make sense.
Redis requires TCP -> drop UDP settings.
While redis (theoretical) support unix sockets, I'm not sure about "unixgram" and "unixpacket"
.
filebeat/docs/udp-input.asciidoc
Outdated
//REVIEWERS: What other options need to be documented here? I don't see host | ||
//and port listed in the reference.yml. Are the host and port separate options? | ||
//What guidance should we provide to help users set these options? Config ex | ||
//above should include this as an eample, too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ph can you have a look?
filebeat/docs/stdin-input.asciidoc
Outdated
---- | ||
|
||
// REVIEWERS: Are there any stdin-specific config options, or are all the config | ||
// options inherited from libbeat? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right now it supports all harvester settings (mostly the same as are available for log
input type). Some settings just don't make sense for stdin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@urso Can you look through the document I created and tell me which options make sense for stdin? I put a question mark for the options I think are applicable. https://docs.google.com/spreadsheets/d/1ixRNLA22LiBAgXGZzDBPxz-7kflaSn-JOQkSIcU-d78/edit#gid=0
filebeat/docs/docker-input.asciidoc
Outdated
// have a realistic example here. | ||
|
||
// REVIEWERS: Sounds like I definitely need to add multiline to the docs here. | ||
// which other config options from the harvester are used by the docker input? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docker input should support all the same settings as input, paths
would be the only one we don't want to expose here, as internally it's defined by containers.ids
filebeat/docs/docker-input.asciidoc
Outdated
|
||
// REVIEWERS: I'm guessing where these settings live in the config | ||
// because it's not clear from the docs or the reference.yml. Would be nice to | ||
// have a realistic example here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A full example could be:
- type: docker
containers:
paths: "/var/lib/docker/containers"
stream: "stdout"
ids:
- "*"
This config will read stdout stream from all containers under the default docker containers path
4e2746c
to
46ed769
Compare
46ed769
to
cf69b58
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dedemorton for all this work on improving it! This will help us grow our inputs on the filebeat side.
A Few answer from the issues description:
Add docs for the TCP input (see #6266). @ph If your PR is going into 6.3, I can add the docs to my PR.
It wont be in 6.3, with vacation/eah I will miss the date.
The IIS and Nginx module descriptions in filebeat.reference.yml still refer to prospector. Should I fix?
The manifest for IIS and nginx still refer to prospector. Should I fix?
Yes they should use the new inputs.
Few others points:
- In some filebeat specific docs we are using the asciidoc variable and some not, it is normal?
- More of a reflexion, are we thinking about redirection on the docs page for some content?
Filebeat consists of two main components: <<prospector,prospectors>> and <<harvester,harvesters>>. These components work together to tail files and send event data to the output that you specify. | ||
In this topic, you learn about the key building blocks of {beatname_uc} and how they work together. Understanding these concepts will help you make informed decisions about configuring {beatname_uc} for specific use cases. | ||
|
||
{beatname_uc} consists of two main components: <<input,inputs>> and <<harvester,harvesters>>. These components work together to tail files and send event data to the output that you specify. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agreed with @urso, this is implementation specific as more inputs
are create in filebeat each input will need some kind of details on the implementation.
is combined into a single line before the lines are filtered by `exclude_lines`. | ||
|
||
// REVIEWERS: Do I need to make examples like this one more generic to work with | ||
// all the input types where this description will appear, or is this OK? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe its OK.
is `/var/lib/docker/containers`. | ||
|
||
// REVIEWERS: Not clear to me if this setting accepts an array or a string. If | ||
// it's just a string, why is it paths (plural)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dedemorton I believe you are right here, it should be singular.
https://github.com/elastic/beats/blob/master/filebeat/input/docker/config.go#L15-L21
Nothing in the code currently accept multiple paths for the docker containers.
@exekias WDYT to move it to singular?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the setting key is path
, this looks like a typo in docs: https://github.com/elastic/beats/blob/master/filebeat/input/docker/config.go#L17
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch @exekias
//REVIEWERS: PH: What other options need to be documented here? I don't see host | ||
//and port listed in the reference.yml. Are the host and port separate options? | ||
//What guidance should we provide to help users set these options? Config ex | ||
//above should include this as an eample, too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dedemorton Its currently a single option.
It should be:
filebeat.inputs:
- type: udp
max_message_size: 10240
host: "localhost:8080"
The default value is "localhost:8080", To be honest, I think we should not provide a default for it and enforce user to set it, so making sure the example provide the host will help us move forward.
FYI, I would like to make it mandatory for 7.0
filebeat/docs/migration.asciidoc
Outdated
@@ -96,7 +96,7 @@ section to understand the Filebeat options. | |||
[float] | |||
=== Migrate the "files" section | |||
|
|||
To migrate the `files` section from the Logstash Forwarder configuration, create a `prospectors` section in the Filebeat config file. For example, assuming that you start | |||
To migrate the `files` section from the Logstash Forwarder configuration, create an `inputs` section in the Filebeat config file. For example, assuming that you start |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double space between an
and input
filebeat/docs/migration.asciidoc
Outdated
|
||
[source,yaml] | ||
------------------------------------------------------------------------------------- | ||
filebeat.prospectors: | ||
filebeat.inputs: | ||
- type: log | ||
paths: | ||
- /var/log/*.log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use the asciidoc variable for all the filebeat
reference in this file? ({beatname_uc})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes...not urgent, but over time I'm trying to replace references to product names with asciidoc variables whenever I open up a file
filebeat/docs/multiline.asciidoc
Outdated
@@ -17,7 +17,7 @@ Also read <<yaml-tips>> and <<regexp-support>> to avoid common mistakes. | |||
[[multiline]] | |||
=== Configuration options | |||
|
|||
You can specify the following options in the `filebeat.prospectors` section of | |||
You can specify the following options in the `filebeat.inputs` section of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use asciidoc variable here? {beatname_uc}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup. I've fixed the references in this file, too. I'm doing these changes in small batches with other changes because it's not always a simple text replacement. Formatting affects how the attributes are interpreted.
filebeat/docs/overview.asciidoc
Outdated
|
||
Filebeat is a https://www.elastic.co/products/beats[Beat], and it is based on the libbeat framework. | ||
General information about libbeat and setting up Elasticsearch, Logstash, and Kibana are covered in the {libbeat}/index.html[Beats Platform Reference]. | ||
{beatname_uc} is a https://www.elastic.co/prproducts/beats[Beat], and it is based on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo in the URL should be https://www.elastic.co/products/beats
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My cat did that.
each log that {beatname_uc} locates, {beatname_uc} starts a harvester. Each | ||
harvester reads a single log for new content and sends the new log data to | ||
libbeat, which aggregates the events and sends the aggregated data to the output | ||
that you've configured for {beatname_uc}. | ||
|
||
image:./images/filebeat.png[Beats design] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have replaced Filebeat
with the variable should we do the same with this path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not in this case because the actual filename on disk also needs to be changed.
@ph I think I've addressed all of your issues. Here are a few things left to do/decide that, IMO, can wait until after this thing is merged. I've opened issues for all the other stuff I found.
|
I believe you have all the options, from what I've read. Concerning the location of the input doc, I agree to keep them where they are currently. We can always move the doc in the |
Resolves doc issue: #6375
I still have a quite a few questions, which I've added as comments flagged for REVIEWERS. Please read and respond to questions.
Here's what the TOC will look like when the book is built:
Changes:
To do after merging this PR: