Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metricbeat: store only top N processes by CPU / memory #4126

Closed
tsg opened this issue Apr 27, 2017 · 1 comment
Closed

Metricbeat: store only top N processes by CPU / memory #4126

tsg opened this issue Apr 27, 2017 · 1 comment
Labels
discuss Issue needs further discussion. enhancement Metricbeat Metricbeat

Comments

@tsg
Copy link
Contributor

tsg commented Apr 27, 2017

Part of #4112, we want to add a feature where only top N by CPU and/or memory are included in the reports created by the Metricbeat system module. Optionally, for the processes that drop out of top N, we could reduce their polling interval instead of just completely dropping them.

My current plan is to implement this as a feature of the system.process metricset. I considered doing this as a processor but a processor would need to collect all processes from an interval in order to sort & compute the "top". This seems to me like a lot of state to put in a processor, plus it's hard for a processor to know when the list is "done".

Configuration wise, I'm thinking:

  process.include_top:
    enabled: true
    cpu.total.pct: 5         # include top 5 by CPU
    memory.rss.bytes: 5      # include top 5 by memory

Explanations:

  • include_top because we use include to mean "filter everything but these" in other places in Beats
  • enabled: true to have an easy and clear way to opt in / opt out of this feature.
  • cpu.total.pct: 5 means "record top 5 processes by the cpu.total.pct field.
  • memory.rss.bytes: 5 means "record top 5 processes by memory.rss.bytes field.
  • If any of cpu.total or memory.rss are set to 0, it means "match no processes". So cpu.total: 5 memory.rss: 0 will only look at CPU.

Do we want to support sorting by other fields than the two? Make it generic? That would require a more generic (and more CPU intensive) implementation.

In the above, events for processes below the threshold are dropped. If the user wants to store them, just at a reduced resolution, the following config could be added:

  process.include_top:
    enabled: true
    cpu.total.pct: 5
    memory.rss.bytes: 5
    period_multiplier: 3    # downsample for processes not in top

Here, period_multiplier: 3 means that we will only publish one in 3 events for the processes out of the top. Since we report counters and not rates, that should work fine.

@tsg tsg added discuss Issue needs further discussion. enhancement Metricbeat Metricbeat labels Apr 27, 2017
@ruflin
Copy link
Member

ruflin commented Apr 27, 2017

For the processor I'm not sure I understand what the change would be to the implementation proposed above. The processor could look as following: top(events []common.MapStr, field string, limit int, asc bool) []common.MapStr. I gets a list of events, the field to sort on, the number of events and descending or ascending. It don't think the processor should keep any state. Isn't the implementation in the process metricset going to look very similar?

I would not mix this with the period_mulitplier as I think sampling is a different feature. Also not sure yet if we should introduce sampling.

tsg pushed a commit to tsg/beats that referenced this issue May 2, 2017
This adds the option to only report on the top N processes by CPU and/or
memory. It is useful because storing metrics about each and every process from
every host can be fairly expensive from the storage point of view. Previously
it was possible to filter processes by name, which was useful if one knew in
advance which are the most interesting processes. This adds a new option which
should be quite convenient in practice, because the number of per-process
documents gets limited while still allowing to display the top processes.

Closes elastic#4126.
ruflin pushed a commit that referenced this issue May 2, 2017
This adds the option to only report on the top N processes by CPU and/or
memory. It is useful because storing metrics about each and every process from
every host can be fairly expensive from the storage point of view. Previously
it was possible to filter processes by name, which was useful if one knew in
advance which are the most interesting processes. This adds a new option which
should be quite convenient in practice, because the number of per-process
documents gets limited while still allowing to display the top processes.

Closes #4126.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issue needs further discussion. enhancement Metricbeat Metricbeat
Projects
None yet
Development

No branches or pull requests

2 participants