-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhanced process filtering for HostMetricsReceiver #8188
Comments
cc @dmitryax |
Host metrics receiver -> disk scraper has include/exclude capabilities. I suggest to apply similar approach to the processes scraper |
I can work on this. |
It looks like the disk scraper only allows strict and regex match, but the proposal for processes also includes wildcard. Should we drop the wildcard requirement? I do like the idea of using |
I believe our standard include/exclude configuration allows using regex matching. We should use that instead of wildcards |
The filesystem scraper separates its different includes/excludes into different config sections. Should we have an include/exclude for each of pid, executable_name, executable_path, command, command_line, and owner? Also, I believe the disk scraper gives precedence to excludes, meaning that if both an include rule and exclude rule match the same device the device is excluded. Should we keep that requirement as well? |
We should pick the ones that have less overlap. @davidmirza408 we are going to introduce filtering by attribute name for consistency with other scrapers. I don't think we need filters for all of them since most of them are overlaping. Let us know which attribute you want to filter by to solve your problem. I think adding a filter for the following attributes should be enough:
|
@dmitryax I think it would also be helpful to filter by process owner and PID. I have an implementation that I am working on here https://github.com/davidmirza408/opentelemetry-collector-contrib/pull/1 |
Upon further thought I think we should drop the wildcard as well. |
@dmitryax since @davidmirza408 is working on an implementation can you assign him to this issue? I'll work on something else. |
Sounds good. reassigned |
@dmitryax In a previous PR you mentioned you would like to see filtering closer to what is being done in filesystem or disk. What do you think of the below format
The above config has two process filters. The first one would match executables named ("otel-collector" OR "collector2") AND do NOT have a string that starts with "test" in the command line. |
So all rules within a filter are applied with AND operation, but filters themselves are applied with OR operation, right? So the config that you posted can be read as: If that's the idea, it sounds good to me. |
Yes, that's what I had in mind. Thanks for the feedback. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
I am busy on another project right now and don't have time to look at this. @dmitryax I'm ok with somebody else taking over this PR if they are interested. |
I can work on this. To summarize the discussion so far, there's consensus that we would like the following attributes to have filters:
I'm not sure whether there is consensus on adding filters for
|
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
@evan-bradley, sorry, I somehow missed your question. I think we can introduce a generic filtering solution built-in into mdatagen. We already have an option to disable resource attributes like this: resource_attributes:
process.command_line:
enabled: false We can add an option to filter resource metrics in the same configuration section. So user can do the following: resource_attributes:
process.command_line:
include:
- "/home/dir/*"
process.owned:
exclude:
- "root"
If we really need regex support, we must think more about it. I like the one-line representation as suggested by @davidmirza408 initially like WDYT? Do you still want to work on this? |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This issue has been closed as inactive because it has been stale for 120 days with no activity. |
When scraping metrics for processes using HostMetricsReceiver, the amount and size of data can be very large because there is no way to filter out the set of processes monitored. In addition to this scraping metrics for all processes is more data than most users of the receiver need.
I would like to propose that we provide the ability to allow users to filter based on the attributes being reported for a process.
Filter Settings
If no filter settings are specified for HMR then we will scrape data on all processes in the system. If on the other hand filters are specified then HMR will take a whitelist approach to process scraping where only processes that match a filter are reported on.
All Attributes that are collected by HMR can be used to filter the set of processes being reported on. Below is a complete list of items that can be used to filter:
pid - process id.
executable_name - name of the executable that was run to start the process
executable_path - fully qualified path to the executable that was run to start the process
command - command passed to the process
command_line - all command line arguments passed to the process
owner - the owner of the process
Specifying Filter Values
A filter can be specified in the following ways
exact string - this would be an exact string match. See "command" in the sample config below
wild cards - standard wild cards such as * and ? can be used to whilte list process. See "executable_path" in the example below.
regex - if the filter is proceeded with "regex" then the subsequent string is interpreted as a regex expression. See "executable_name" in the example below.
Multiple Filters
A user can enter multiple filters. If multiple filters are provided then we will scrape process metrics after the first filter matches a process. See the config below for an example that contains two filters.
Sample Config
The text was updated successfully, but these errors were encountered: