You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Part of #4112, we want to add a feature where only top N by CPU and/or memory are included in the reports created by the Metricbeat system module. Optionally, for the processes that drop out of top N, we could reduce their polling interval instead of just completely dropping them.
My current plan is to implement this as a feature of the system.process metricset. I considered doing this as a processor but a processor would need to collect all processes from an interval in order to sort & compute the "top". This seems to me like a lot of state to put in a processor, plus it's hard for a processor to know when the list is "done".
Configuration wise, I'm thinking:
process.include_top:
enabled: true
cpu.total.pct: 5 # include top 5 by CPU
memory.rss.bytes: 5 # include top 5 by memory
Explanations:
include_top because we use include to mean "filter everything but these" in other places in Beats
enabled: true to have an easy and clear way to opt in / opt out of this feature.
cpu.total.pct: 5 means "record top 5 processes by the cpu.total.pct field.
memory.rss.bytes: 5 means "record top 5 processes by memory.rss.bytes field.
If any of cpu.total or memory.rss are set to 0, it means "match no processes". So cpu.total: 5 memory.rss: 0 will only look at CPU.
Do we want to support sorting by other fields than the two? Make it generic? That would require a more generic (and more CPU intensive) implementation.
In the above, events for processes below the threshold are dropped. If the user wants to store them, just at a reduced resolution, the following config could be added:
process.include_top:
enabled: true
cpu.total.pct: 5
memory.rss.bytes: 5
period_multiplier: 3 # downsample for processes not in top
Here, period_multiplier: 3 means that we will only publish one in 3 events for the processes out of the top. Since we report counters and not rates, that should work fine.
The text was updated successfully, but these errors were encountered:
For the processor I'm not sure I understand what the change would be to the implementation proposed above. The processor could look as following: top(events []common.MapStr, field string, limit int, asc bool) []common.MapStr. I gets a list of events, the field to sort on, the number of events and descending or ascending. It don't think the processor should keep any state. Isn't the implementation in the process metricset going to look very similar?
I would not mix this with the period_mulitplier as I think sampling is a different feature. Also not sure yet if we should introduce sampling.
tsg
pushed a commit
to tsg/beats
that referenced
this issue
May 2, 2017
This adds the option to only report on the top N processes by CPU and/or
memory. It is useful because storing metrics about each and every process from
every host can be fairly expensive from the storage point of view. Previously
it was possible to filter processes by name, which was useful if one knew in
advance which are the most interesting processes. This adds a new option which
should be quite convenient in practice, because the number of per-process
documents gets limited while still allowing to display the top processes.
Closeselastic#4126.
This adds the option to only report on the top N processes by CPU and/or
memory. It is useful because storing metrics about each and every process from
every host can be fairly expensive from the storage point of view. Previously
it was possible to filter processes by name, which was useful if one knew in
advance which are the most interesting processes. This adds a new option which
should be quite convenient in practice, because the number of per-process
documents gets limited while still allowing to display the top processes.
Closes#4126.
Part of #4112, we want to add a feature where only top N by CPU and/or memory are included in the reports created by the Metricbeat system module. Optionally, for the processes that drop out of top N, we could reduce their polling interval instead of just completely dropping them.
My current plan is to implement this as a feature of the
system.process
metricset. I considered doing this as a processor but a processor would need to collect all processes from an interval in order to sort & compute the "top". This seems to me like a lot of state to put in a processor, plus it's hard for a processor to know when the list is "done".Configuration wise, I'm thinking:
Explanations:
include_top
because we useinclude
to mean "filter everything but these" in other places in Beatsenabled: true
to have an easy and clear way to opt in / opt out of this feature.cpu.total.pct: 5
means "record top 5 processes by thecpu.total.pct
field.memory.rss.bytes: 5
means "record top 5 processes bymemory.rss.bytes
field.cpu.total
ormemory.rss
are set to 0, it means "match no processes". Socpu.total: 5 memory.rss: 0
will only look at CPU.Do we want to support sorting by other fields than the two? Make it generic? That would require a more generic (and more CPU intensive) implementation.
In the above, events for processes below the threshold are dropped. If the user wants to store them, just at a reduced resolution, the following config could be added:
Here,
period_multiplier: 3
means that we will only publish one in 3 events for the processes out of the top. Since we report counters and not rates, that should work fine.The text was updated successfully, but these errors were encountered: