Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance windows/perfmon metricset #3828

Closed
7 of 10 tasks
ruflin opened this issue Mar 28, 2017 · 31 comments
Closed
7 of 10 tasks

Enhance windows/perfmon metricset #3828

ruflin opened this issue Mar 28, 2017 · 31 comments

Comments

@ruflin
Copy link
Contributor

ruflin commented Mar 28, 2017

This is a meta issue to track enhancements to the windows/perfmon metricset created in #3758.

@martinscholz83
Copy link
Contributor

Hi @ruflin, how do you want to go on with this issue. Should we create a PR for every point or just one big PR?

@ruflin
Copy link
Contributor Author

ruflin commented Mar 29, 2017

@maddin2016 I'm a fan of lots of small PR's as it makes it easier to review and get the changes in quickly but one large would also work.

@martinscholz83
Copy link
Contributor

I would like to start with Improve first polling cycle. Would that be an alternative?

if firstFetch {
		// Most counters require two sample values in order to compute a displayable value. So wait and then collect the second value
		select {
		case <-time.After(time.Second * 2):
			err = _PdhCollectQueryData(q.query)
		}
	}

@ruflin
Copy link
Contributor Author

ruflin commented Mar 30, 2017

@maddin2016 The above will still block for 2 seconds. I was more thinking of finding an option to only wait until the all the data is available or even skip it in the first run. Not sure what the best behaviour is. That is probably one of the trickiest once in the list above.

I would suggest to first focus on the cleanup part, for example change to the config options.

@martinscholz83
Copy link
Contributor

Ok. Here is an description of the problem

Some counters, such as rate counters, require two counter values in order to compute a displayable value. In this case you must call PdhCollectQueryData twice before calling PdhGetFormattedCounterValue. For more information, see Collecting Performance Data.

So if we don't wait the first value is always 0. So an option is to skip the first run. Maybe @andrewkroh has an idea. I start with the other ones.

@martinscholz83
Copy link
Contributor

I have started two PR's for config handling and the missing go generate lines. I wait for them to be merged before i go on with the samples for the docs and dashboards.

@andrewkroh
Copy link
Member

I opened an intermediate PR to refactor the code and error handling. I made a change to remove the sleep, but the first value will always be 0. I'm thinking for move the core PDH code into its own package.

andrewkroh added a commit to andrewkroh/beats that referenced this issue Mar 31, 2017
Refactoring of the Windows performance data code. There's more cleanup to come. See elastic#3828.

- Improve generated PDH code to return usable error codes.
- Use the standard error interface in return values.
- Use more restrictive param types in generated functions.
- Create PdhErrno that interprets the PDH error codes and returns their human readable error string.
ruflin pushed a commit that referenced this issue Apr 3, 2017
Refactoring of the Windows performance data code. There's more cleanup to come. See #3828.

- Improve generated PDH code to return usable error codes.
- Use the standard error interface in return values.
- Use more restrictive param types in generated functions.
- Create PdhErrno that interprets the PDH error codes and returns their human readable error string.
@martinscholz83
Copy link
Contributor

As #3871 now is merged, i would go on with #3854.

@andrewkroh
Copy link
Member

Congrats @maddin2016, this is being released today with 6.0.0-alpha1. 🎉

Since more people will be trying it now, it would be nice to get some more example in the docs and put some real data into the data.json.

@martinscholz83
Copy link
Contributor

Great to hear @andrewkroh 🎉 Many thanks for your help to get this finished!!
Is it ok to put some stuff from my locale machine into data.json or has it to be some specific data?

@ruflin
Copy link
Contributor Author

ruflin commented May 10, 2017

I think it is fine as long as it is clear it is only an example. Is there a way to auto generate the data.json?

@martinscholz83
Copy link
Contributor

@ruflin, i currently have rebase my master. If i try to debug, i get the following error

module\system\process\helper.go:398: undefined: sort.Slice
module\system\process\helper.go:405: undefined: sort.Slice

Can you confirm this?

@martinscholz83
Copy link
Contributor

I use go 1.7.3

@martinscholz83
Copy link
Contributor

🙈 Sorry. Just read that the current go version for developing is 1.8.1. Updated and error is gone.

@martinscholz83
Copy link
Contributor

martinscholz83 commented May 10, 2017

This would be an example output. Is this ok

{
  "@timestamp": "2017-05-10T06:42:11.339Z",
  "beat": {
    "hostname": "beathost",
    "name": "beathost"
  },
  "metricset": {
    "module": "windows",
    "name": "perfmon"
  },
  "windows": {
    "perfmon": {
      "processor": {
        "time": {
          "total": {
            "pct": 22.486702
          }
        }
      }
    }
  }
}

@ruflin
Copy link
Contributor Author

ruflin commented May 11, 2017

@maddin2016 is this auto generated or manually?

@martinscholz83
Copy link
Contributor

This is generated manually. Just changed hostname and name.

@ruflin
Copy link
Contributor Author

ruflin commented May 11, 2017

Any chance it could be generated automatically? Reason I'm asking is that in case we change something in the data model, it is very easy for us to recreate all the files again.

@martinscholz83
Copy link
Contributor

I have debug the project and just copy the output. How do you normally generate these data.json files?

@ruflin
Copy link
Contributor Author

ruflin commented May 11, 2017

Each metricset normally has a test called TestDatathat uses some mbtest helper to create the events and write them to disk. See for example: https://github.com/elastic/beats/blob/master/metricbeat/module/system/memory/memory_test.go#L11

To then run this test you have to run go test github.com/elastic/beats/metricbeat/module/your-module/... -v In case it has an integration test dependency, you have to add -tags=integration.

@martinscholz83
Copy link
Contributor

Ok, then i have just to extend my current test file. Thanks @ruflin

@martinscholz83
Copy link
Contributor

@ruflin, if i run this test i get this

=== RUN   TestData
--- SKIP: TestData (0.38s)
        data_generator.go:25: Skip data generation tests

@martinscholz83
Copy link
Contributor

Got it. Have to add -data=true

@martinscholz83
Copy link
Contributor

I have opened #4287

@martinscholz83
Copy link
Contributor

martinscholz83 commented May 19, 2017

@ruflin, @andrewkroh. Can you please track a new feature to the list above. It's about using wildcard querys to collect data from multiple instances from a counter.
For example \PhysicalDisk(*)\% Disk Time. If you have wildcards in your query you have to call a different function which is actually not implemented in perfmon. From MSDN

If the counter path contains a wildcard character for the instance name, call PdhGetFormattedCounterArray to retrieve an array of formatted counter values for each instance collected.

Lets say you have 3 disks. Then the following counters are available.

_Total, *, 0, 1, 2

Currently we just can call _Total or select one of the three disks. But not *.

This is also needed for #3798. I will add this ASAP and open a new PR.

Thanks.

PS: Maybe the actual docs should point out to this missing feature.

@krasekhi
Copy link

krasekhi commented Feb 7, 2018

Is this enhancement still being considered for a future release? -
Add config option to ignore failures for non-existent counters

Currently we have metricbeat deployed to over 10,000 windows endpoints and when one of the defined WMI counters no longer exists the service stops. Being able to ignore the failures for non-existent counters would be massive improvement so we don't lose all data from the endpoint.

Thanks!

@ruflin
Copy link
Contributor Author

ruflin commented Feb 8, 2018

@krasekhi Lets open a separate issue for this one to better track it instead of just the list above. Could you open one?

@martinscholz83
Copy link
Contributor

Hi @ruflin, @andrewkroh for my inclusion (😄) back to beats, i think this is a good start. I would just catch if an counter not exists and write a warning into log.

@martinscholz83
Copy link
Contributor

And of course make this optional as a config option.

@ruflin
Copy link
Contributor Author

ruflin commented Feb 12, 2018

@martinscholz83 You have a PR in mind? :-D

@ruflin
Copy link
Contributor Author

ruflin commented Feb 26, 2018

I'm closing this meta issue as we have covered allmost all the points above. For all the new things to come we should open separate issues / PR's.

@martinscholz83 Thanks a lot for getting us to this point.

@ruflin ruflin closed this as completed Feb 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants