Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add queue settings to docs #4884

Merged
merged 6 commits into from
Oct 9, 2017
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 67 additions & 0 deletions libbeat/docs/generalconfig.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -103,3 +103,70 @@ processors in your config.

Sets the maximum number of CPUs that can be executing simultaneously. The
default is the number of logical CPUs available in the system.

[float]
[[configuration-internal-queue]]
=== Configure the internal queue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@urso I think my answer in slack must have been confusing. Probably my fault because I was not fully caffeinated.

If this topic is something that users are likely to look for in the TOC, then I would bump it up a level and make it a separate topic. That means putting the content in a separate asciidoc file, removing the float tag, making the heading a level 2 (==), and using asciidoc includes statements to include the file in each of the configuring-howto.asciidoc files. For example: https://github.com/elastic/beats/blob/master/packetbeat/docs/configuring-howto.asciidoc.

On the other hand, if users are unlikely to want to change these settings, it's probably OK to bury them under general settings, but in that case, you could change the heading to "Internal queue configuration options" to make it parallel with the other headings at this level.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Will move it to the TOC


The `queue` namespace configures the queue type to be used.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd change this to say,

You can configure the type and behavior of the internal queue by setting options in the `queue` section of the +{beatname_lc}.yml+ config file.

Also see my comment about moving this to appear after the description of what the queue does.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


The Elastic Beats employ an internal queue for events to be published. The
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd actually put this paragraph before the sentence on line 111.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Crap weasel. Looks like some of my comments didn't get saved when I submitted the review. I'd change this to say:

{beatname_uc} uses an internal queue to store events before publishing them. 

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

queue is responsible for buffering and combining events into batches, which can
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to: "...into batches that can be consumed by the outputs" [we use "that" for restrictive clauses, "which" for non-restrictive]

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I'm a little confused here. Isn't "which" use for restrictive clauses?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's the other way around. You use "that" for restrictive clauses. By restrictive, we mean a clause that gives information essential to the meaning of the sentence. In this case, the clause "that can be consumed by the outputs" is restrictive. Sometimes it's a judgment call. TBH, the only people who really care about this distinction are writers and editors. :-) :-)

be consumed by the outputs. The outputs will use bulk operations to send a
batch of events in one transaction.


Example configuration:

[source,yaml]
------------------------------------------------------------------------------
queue.mem:
events: 4096
------------------------------------------------------------------------------

==== Configure the Memory Qeueue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Misspelling. This should say "Configure the memory queue". However, I don't think you need a separate section here (at least, not until we add support for more queue types--but even then, I'm not sure your need a separate section). It sounds like the memory queue is the only one that is supported right now? If that's true, we should state that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Support for a disk based queue with very different settings will be added in the future. Right now we only have the mem queue, but I'd keep it a separate section for now.

[[configuration-internal-queue-memory]]

The memory queue keeps all events in memory. By default no flush interval is
configured. All events published to this queue will be directly consumed by the
outputs. An output will consume up to the outputs `bulk_max_size` events at once.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence is a bit hard to parse: "An output will consume up to the outputs bulk_max_size events at once."

How about this? "An output will consume events in batches based on the output's bulk_max_size setting."

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using up to, as it can be anything between 1 and bulk_max_size. By setting bulk_max_size: -1, it can be anything between 1 and queue.mem.events. I struggled quite a bit trying to express this in a concise manner. But "based on the outputs bulk_max_size" would have me expect it is exactly this number of events (which is not guaranteed by this queue at all).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like when we can be concise, but not if we're obfuscating meaning. In this case, I think the meaning is not clear and the grammatical structure is questionable because you are leaving words out. What you're really saying here is that "An output will consume up to the number of events specified by the output's bulk_max_size setting." You're using shorthand that's common for developers to use, but will probably seem off to some readers. They won't make the same substitution that you make in your head.

Anyhow, for now, at least make "output" possessive so that you have "the output's bulk_max_size.


Only after an event is dropped or acknowledged by the output, a new slot will
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the output is reading in batches, what do we mean by a slot? Does the slot open up before the batch is fully read? Maybe I'm not fully understanding the behavior here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, slot is quite confusing here :(

By slot I mean space is freed for accepting a new event in the queue, if and only if the output ACKs an old published events. The queue.mem.events setting dictates the total number of events buffered by the queue and outputs.

With outputs potentially operating on batches of sizes 1 to max queue size, it is the amount of events ACKed by the output, which can be read/accepted by the queue after the ACK.

be available in the queue.

By setting `flush.min_events` and `flush.timeout`, spooling in the queue is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be more direct and say, "To enforce spooling in the queue, set the flush.min_events and flush.timeout options.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

enforced. In this case the output has to wait for a minimal number of events
to be available.

This sample configuration forwards events to the output if 512 events are
available or the oldest available event is already waiting for 5s in the queue:

[source,yaml]
------------------------------------------------------------------------------
queue.mem:
events: 4096
flush.min_events: 512
flush.timeout: 5s
------------------------------------------------------------------------------

===== `events`

Number of events the queue can store.

The default value is 4096 events.

===== `flush.min_events`

Minimum number of events required for publishing. If this value is set to 0, the
output can start publishing events without additional waiting times. Otherwise
the output has to wait for more events to become available.

The default value is 0.

===== `flush.timeout`

Maximum wait time for `flush.min_events` to be fulfilled. If set to 0s, events
will be immediately available for consumption.

The default values is 0s.