Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update output config docs #2597

Merged
merged 3 commits into from
Sep 22, 2016
Merged

Update output config docs #2597

merged 3 commits into from
Sep 22, 2016

Conversation

urso
Copy link

@urso urso commented Sep 19, 2016

Update output config docs to match most recent changes in all outputs removing
settings not being available, updating docs for changed fields and adding
missing fields.

Add output plugin settings for: #2482 and #2074

@urso urso added docs review needs_backport PR is waiting to be backported to other branches. v5.0.0-beta1 labels Sep 19, 2016
@urso
Copy link
Author

urso commented Sep 19, 2016

successfully build docs via: make docs

Copy link
Member

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left some minor comments.

@@ -106,10 +106,6 @@ The number of workers per configured host publishing events to Elasticsearch. Th
is best used with load balancing mode enabled. Example: If you have 2 hosts and
3 workers, in total 6 workers are started (3 for each host).

===== port
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More as a historical remark: port was removed as it would not work will with hosts.

The index root name to write events to. The default is the Beat name.
For example "{beatname_lc}" generates "[{beatname_lc}-]YYYY.MM.DD" indexes (for example,
"{beatname_lc}-2015.04.26").
The index name to write events to. The default is "%\{{beatname_lc}-%\{yyyy.MM.dd}" (for example, "{beatname_lc}-2015.04.26").
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the effect of %\ here? Do you have on purpose {{ before beatname_lc?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Array of pipeline selector configurations supporting conditionals, *Format String*
based field access and name mappings. The first rule matching will be used to
set the `pipeline` for the event to be published. If `pipelines` is missing or no
rule matches, the `pipeline` field will be used.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar example as with indices would be nice here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

too much repetition.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like repetition either, but there's no guarantee with reference info that users will read from top to bottom, so some repetition might be necessary (or at least a pointer to the supporting info).


*`default`*: Default string value if `mapping` does not find a match.

*`when`*: Condition which must succeed in order to execute the current rule.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can have a general example for topics, pipelines, indices we can link to?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 We should have a usage type topic that describes, for example, how to set the pipeline dynamically. Having the reference content is good enough for Beta, though.


===== key

The Kafka event key. The event key must be unique and can be extracted from the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if no key is set? Is it by default auto generated by Kafka?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keys in kafka are optional. if key is set and reused, the old content will be overwritten. By never changing content, keys can be used for some artificial kind of deduplication.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a note that the key is optional. From the current description it looks like it is required.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

`round_robin`, or `hash`. By default the `hash` partitioner is used.

*`random.group_events`*: Select a new partition by random every `group_events`
events being published. The default value is 1.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value is 1 which means after each event a new parition is picked randomly

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@monicasarbu
Copy link
Contributor

LGTM

Update output config docs to match most recent changes in all outputs removing
settings not being available, updating docs for changed fields and adding
missing fields.
Kafka output broker event partitioning strategy. Must be one of `random`,
`round_robin`, or `hash`. By default the `hash` partitioner is used.

*`random.group_events`*: Select a new partition by random every `group_events`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't parse this sentence. Maybe there is a missing word?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No missing word. Using group_events is supposed to use the name as numeric replacement. I want to express that the number configured in group_events sets how many consecutive published events will be send to the same partition, before the partitioner selects a new partition to use for the next set events being published.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sort of substitution works better when the name of the option clearly indicates what it represents. In this case, it's not clear by the name that group_events will resolve to a number, so some readers will stumble over how the sentence is constructed. I realize you're trying to avoid using a lot of words, but it's better to be explicit like you've been in your explanation to me. Also, unless you're telling the user to do something, avoid using the imperative. When someone reads "select", they think they need to select something. It might take them a few seconds to realize that you're essentially saying "configure this option so that the software selects." Does that make sense?

events being published. The default value is 1 meaning after each event a new
parition is picked randomly.

*`round_robin.group_events`*: Select the next partition every `group_events`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as line 599

*`hash.random`*: Randomly distribute events if no hash or key value can be computed.

All partitioners will try to publish events to all partitions by default. If a
partitions leader becomes unreachable for the beat, the output might block. All
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partition's leader

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

partitions leader becomes unreachable for the beat, the output might block. All
partitioners support setting `reachable_only` to overwrite this
behavior. If `reachable_only` is set to `true`, events will be published to
available partitions only. Note: publishing to the subset of available partitions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this a proper note by starting it on a separate line and beginning with "NOTE: ". Also, saying "only" makes the sentence grammatically ambiguous in English. I would say something like:

NOTE: Publishing to a subset of available partitions potentially increases resource usage because events may become unevenly distributed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


===== indices

Array of index selector rules supporting conditionals, *Format String*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description (and the other ones like it in the doc) needs work. It took me a couple of times reading it before I realized that you're describing how to set the index dynamically. I think you're also saying that the index will be set based on the first item in the array that results in a match, but I am not sure.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy'n past ;)

I think you're also saying that the index will be set based on the first item in the array that results in a match
correct.

I did try to make it 'short, as these settings are quite 'complicated' in comparison to most other settings. indices (and the others) is an array of rules executed one after another. The first rule matching (conditionals match and format string doesn't fail) in the array of rules will set index the event is published to. If one rule did match, no other rule in the array will be check to not generate ambiguities in case of 2 rules potentially matching. If no rule did match, the index field will be evaluated (which is basically another rule). Truth is index is just another rule appended to the list of rules defined in indices.


Rule settings:

*`index`*: The index *Format String* to use. If the fields used are missing, the rule fails.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Format String shouldn't be capitalized because it's not a product component. Do you plan to point to the reference content that you wrote about format strings? I'm thinking about relocating the content about the config file to the guides for individual Beats, but I won't do that for now, so use the {libbeat}/filename.html[link text] convention to add links for now. I will clean up the links when I move the content. If you want, I can add links after this PR is merged.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changing all *Format String* to format string.

My idea with documenting all types in "config file format" includes (long-term goal) adding the type to all settings in the reference docs by name + link to the definition of said type. Having some docs in libbeat only and others reused among multiple beats I did struggle with the execution, though.

@tsg tsg merged commit b72a891 into elastic:master Sep 22, 2016
@dedemorton dedemorton mentioned this pull request Oct 4, 2016
50 tasks
dedemorton pushed a commit to dedemorton/beats that referenced this pull request Oct 5, 2016
Update output config docs to match most recent changes in all outputs removing
settings not being available, updating docs for changed fields and adding
missing fields.
@dedemorton dedemorton mentioned this pull request Nov 16, 2016
9 tasks
@dedemorton dedemorton removed the needs_backport PR is waiting to be backported to other branches. label Dec 20, 2016
@dedemorton dedemorton mentioned this pull request Jan 10, 2017
6 tasks
@dedemorton dedemorton mentioned this pull request Feb 14, 2017
12 tasks
@dedemorton dedemorton mentioned this pull request Apr 15, 2017
9 tasks
@dedemorton dedemorton mentioned this pull request May 3, 2017
5 tasks
@dedemorton dedemorton mentioned this pull request Sep 20, 2017
42 tasks
@dedemorton dedemorton mentioned this pull request Nov 17, 2017
37 tasks
@urso urso deleted the doc/update-output-docs branch February 19, 2019 18:46
leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023
Update output config docs to match most recent changes in all outputs removing
settings not being available, updating docs for changed fields and adding
missing fields.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants