Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhanced process filtering for HostMetricsReceiver #8188

Closed
davidmirza408 opened this issue Mar 1, 2022 · 23 comments
Closed

Enhanced process filtering for HostMetricsReceiver #8188

davidmirza408 opened this issue Mar 1, 2022 · 23 comments

Comments

@davidmirza408
Copy link
Contributor

davidmirza408 commented Mar 1, 2022

When scraping metrics for processes using HostMetricsReceiver, the amount and size of data can be very large because there is no way to filter out the set of processes monitored. In addition to this scraping metrics for all processes is more data than most users of the receiver need.

I would like to propose that we provide the ability to allow users to filter based on the attributes being reported for a process.

Filter Settings
If no filter settings are specified for HMR then we will scrape data on all processes in the system. If on the other hand filters are specified then HMR will take a whitelist approach to process scraping where only processes that match a filter are reported on.

All Attributes that are collected by HMR can be used to filter the set of processes being reported on. Below is a complete list of items that can be used to filter:

pid - process id.
executable_name - name of the executable that was run to start the process
executable_path - fully qualified path to the executable that was run to start the process
command - command passed to the process
command_line - all command line arguments passed to the process
owner - the owner of the process

Specifying Filter Values
A filter can be specified in the following ways

exact string - this would be an exact string match. See "command" in the sample config below
wild cards - standard wild cards such as * and ? can be used to whilte list process. See "executable_path" in the example below.
regex - if the filter is proceeded with "regex" then the subsequent string is interpreted as a regex expression. See "executable_name" in the example below.

Multiple Filters
A user can enter multiple filters. If multiple filters are provided then we will scrape process metrics after the first filter matches a process. See the config below for an example that contains two filters.

Sample Config

receivers:
  hostmetrics:
    collection_interval: "10s"
    scrapers:
      process:
        filters:
           - executable_name: regex "^java"
             executable_path: "/usr/bin/jav*"
             command: "myJavaClass"
             command_line: "test"
             pid: 12345
             process_owner: dmirza
           - command: "/usr/bin/python2"
             command_line: "pytivo"
@jpkrohling
Copy link
Member

cc @dmitryax

@dmitryax
Copy link
Member

dmitryax commented Mar 6, 2022

Host metrics receiver -> disk scraper has include/exclude capabilities. I suggest to apply similar approach to the processes scraper

@dmitryax dmitryax closed this as completed Mar 6, 2022
@dmitryax dmitryax reopened this Mar 6, 2022
@dmitryax dmitryax added comp: receiver Receiver help wanted Extra attention is needed labels Mar 6, 2022
@TylerHelmuth
Copy link
Member

TylerHelmuth commented Mar 18, 2022

I can work on this.

@TylerHelmuth
Copy link
Member

TylerHelmuth commented Mar 18, 2022

Host metrics receiver -> disk scraper has include/exclude capabilities. I suggest to apply similar approach to the processes scraper

It looks like the disk scraper only allows strict and regex match, but the proposal for processes also includes wildcard. Should we drop the wildcard requirement? I do like the idea of using include and exclude like the other scrapers.

@dmitryax
Copy link
Member

dmitryax commented Mar 18, 2022

I believe our standard include/exclude configuration allows using regex matching. We should use that instead of wildcards

@TylerHelmuth
Copy link
Member

The filesystem scraper separates its different includes/excludes into different config sections. Should we have an include/exclude for each of pid, executable_name, executable_path, command, command_line, and owner?

Also, I believe the disk scraper gives precedence to excludes, meaning that if both an include rule and exclude rule match the same device the device is excluded. Should we keep that requirement as well?

@dmitryax
Copy link
Member

dmitryax commented Mar 18, 2022

The filesystem scraper separates its different includes/excludes into different config sections. Should we have an include/exclude for each of pid, executable_name, executable_path, command, command_line, and owner?

We should pick the ones that have less overlap.

@davidmirza408 we are going to introduce filtering by attribute name for consistency with other scrapers. I don't think we need filters for all of them since most of them are overlaping. Let us know which attribute you want to filter by to solve your problem.

I think adding a filter for the following attributes should be enough:

  • process.command_line as superset that includes values from process.executable.path, process.executable.name and process.command
  • process.executable.name for convenience

@davidmirza408
Copy link
Contributor Author

@dmitryax I think it would also be helpful to filter by process owner and PID. I have an implementation that I am working on here https://github.com/davidmirza408/opentelemetry-collector-contrib/pull/1

@davidmirza408
Copy link
Contributor Author

Host metrics receiver -> disk scraper has include/exclude capabilities. I suggest to apply similar approach to the processes scraper

It looks like the disk scraper only allows strict and regex match, but the proposal for processes also includes wildcard. Should we drop the wildcard requirement? I do like the idea of using include and exclude like the other scrapers.

Upon further thought I think we should drop the wildcard as well.

@TylerHelmuth
Copy link
Member

@dmitryax since @davidmirza408 is working on an implementation can you assign him to this issue? I'll work on something else.

@dmitryax
Copy link
Member

Sounds good. reassigned

@davidmirza408
Copy link
Contributor Author

davidmirza408 commented Mar 25, 2022

@dmitryax In a previous PR you mentioned you would like to see filtering closer to what is being done in filesystem or disk. What do you think of the below format

receivers: 
  hostmetrics: 
    collection_interval: 10s
    scrapers: 
      process: 
        filters: 
          - 
            include_executable_name:
                match_type: string
                executable_name: [otel-collector, collector2]
            exclude_process_command_line: 
                match_type: regex
                command_line: [^test]
          - 
            include_process_command:
                match_type: string
                command: [java]
            include_command_line:
                match_type: string
                command_line: [minecraft]

The above config has two process filters. The first one would match executables named ("otel-collector" OR "collector2") AND do NOT have a string that starts with "test" in the command line.

@dmitryax
Copy link
Member

dmitryax commented Mar 25, 2022

So all rules within a filter are applied with AND operation, but filters themselves are applied with OR operation, right?

So the config that you posted can be read as:
(include executable_name: [otel-collector OR collector2] AND exclude command_line: [^test])
OR
(include command: [java]) AND include command_line: [minecraft])

If that's the idea, it sounds good to me.

@davidmirza408
Copy link
Contributor Author

Yes, that's what I had in mind. Thanks for the feedback.

@github-actions
Copy link
Contributor

github-actions bot commented Nov 9, 2022

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Nov 9, 2022
@davidmirza408
Copy link
Contributor Author

I am busy on another project right now and don't have time to look at this. @dmitryax I'm ok with somebody else taking over this PR if they are interested.

@dmitryax dmitryax removed the Stale label Nov 10, 2022
@evan-bradley
Copy link
Contributor

I can work on this.

To summarize the discussion so far, there's consensus that we would like the following attributes to have filters:

  • process.command_line
  • process.executable.name

I'm not sure whether there is consensus on adding filters for process.pid or process.owner.

  • process.pid: @davidmirza408 can you describe the use case for this one? It's not immediately clear to me how this could be used, at least on Linux systems, since PIDs are generated by the kernel during process creation. The only consistent PID is for the init process, and in that case I think it would likely work just as well to filter on the command.
  • process.owner: This one seems like it would be useful in cases where the Collector has full access to /proc and we want to filter on processes started by e.g. a service user. @dmitryax what do you think?

@evan-bradley evan-bradley self-assigned this Nov 15, 2022
@evan-bradley evan-bradley added enhancement New feature or request priority:p2 Medium and removed help wanted Extra attention is needed labels Nov 15, 2022
@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Mar 20, 2023
@dmitryax
Copy link
Member

dmitryax commented Mar 23, 2023

@evan-bradley, sorry, I somehow missed your question.

I think we can introduce a generic filtering solution built-in into mdatagen. We already have an option to disable resource attributes like this:

resource_attributes:
  process.command_line:
    enabled: false

We can add an option to filter resource metrics in the same configuration section.

So user can do the following:

resource_attributes:
  process.command_line:
    include:
      - "/home/dir/*"
  process.owned:
    exclude:
      - "root"

If we really need regex support, we must think more about it. I like the one-line representation as suggested by @davidmirza408 initially like regex "^java", but we need to find a pattern that will be applicable everywhere across the collector configs. We can wrap the string with /.../ to identify regex as currently done in docker receiver. This is something that must be agreed broadly in the collector SIG.

WDYT? Do you still want to work on this?

@dmitryax dmitryax removed the Stale label Mar 23, 2023
@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label May 22, 2023
@dmitryax dmitryax removed the Stale label May 22, 2023
@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions
Copy link
Contributor

This issue has been closed as inactive because it has been stale for 120 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment