Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New windows service metricset #5332

Merged
merged 60 commits into from
Nov 2, 2017

Conversation

martinscholz83
Copy link
Contributor

As discussed in #5256, i created a new metricset to collect information about windows services.
To do

  • uptime for services

@elasticmachine
Copy link
Collaborator

Can one of the admins verify this patch?

@martinscholz83
Copy link
Contributor Author

@andrewkroh, currently i find no way to open the processes to get the uptime without to run as administrator or give the current user SeDebugPrivileges. Even with PROCESS_QUERY_LIMITED_INFORMATION i can't call OpenProcess. Do you have any other ideas?

Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you are off to a good start. I left comments and will take a closer look after you do some more cleanup and refactoring. Some tests would be helpful too.

@@ -0,0 +1,61 @@
// +build ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than duplicating this can you move it up a level and share it between both metricsets.

func New(base mb.BaseMetricSet) (mb.MetricSet, error) {
logp.Warn("EXPERIMENTAL: The windows services metricset is experimental")

config := struct{}{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not used it should be removed.

}

reader, err := NewServiceReader()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to be conventional in Go to put the error check on the immediately following line. Please remove the newline separator.

// It returns the event which is then forward to the output. In case of an error, a
// descriptive error must be returned.
func (m *MetricSet) Fetch() ([]common.MapStr, error) {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't start methods with a newline. We try to not do this in Beats (with one exception for functions where the params span multiple lines).

"unicode/utf16"
"unsafe"

"github.com/elastic/beats/libbeat/common"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All github.com/elastic imports should be in the third group (it goes stdlib, third-party, elastic).

//sys _CloseServiceHandle(handle uintptr) (err error) = advapi32.CloseServiceHandle

var (
sizeOfEnumServiceStatusProcess = (int)(unsafe.Sizeof(EnumServiceStatusProcess{}))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing sizeof instead of sizeOf is most common (b/c that's the name of the C operator that generates the value).

return serviceHandle, nil
}

func getServiceStates(handle ServiceDatabaseHandle, state ServiceEnumState) ([]ServiceStatus, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is quite long. I recommend breaking some of the pieces into smaller functions then calling those from this function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's definitely better now, but it still scores a 13 by gocyclo (<= 10 is what I would aim for). Can you see if you can do a bit more refactoring to this method. Thanks

//Get uptime for service
if ServiceState(serviceTemp.ServiceStatusProcess.DwCurrentState) != ServiceStopped {

var processCreationTime syscall.Filetime
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what the system process metricset does so you can reuses the same code. See ProcTime.Get() in elastic/gosigar. They system module has logic for attempting to enable the SeDebugPrivilege. We'll need to figure out how to reuse code and only activate it once. But if the system module is used along side this metricset it should work (assuming the appropriate privs exist).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about to move the logic from system_windows.go into a helper class ...\metricebeat\helper\windows.go?

return nil, err
}

if err := syscall.OpenProcessToken(currentProcess, syscall.TOKEN_ADJUST_PRIVILEGES, &token); err != nil {
Copy link
Member

@andrewkroh andrewkroh Oct 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably shouldn't hard fail on a permission issue. I think it would be better to gracefully degrade such that the uptime info isn't available.

Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice progress.

What about to move the logic from system_windows.go into a helper class ...\metricebeat\helper\windows.go?

Yeah, but how about privileges_windows.go. I think the helper function should have some kind of logic such that only the first call does any real work. This way if both the system module and the windows-services metricset are used that this code only runs once.

There's a metricbeat/debug file in your PR that looks to have been accidentally added. Can you please remove that.


var (
sizeofEnumServiceStatusProcess = (int)(unsafe.Sizeof(EnumServiceStatusProcess{}))
sizeofQueryServiceConfig = (int)(unsafe.Sizeof(QueryServiceConfig{}))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I delete this one

sizeofQueryServiceConfig = (int)(unsafe.Sizeof(QueryServiceConfig{}))
)

type enumServiceStatusProcess struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused?

return nil
}

func getServiceUptime(processId uint32) (gosigar.ProcTime, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend encapsulating the logic for the uptime calculation into this helper function. I would change the return params to be (time.Duration, error).

Copy link
Contributor Author

@martinscholz83 martinscholz83 Oct 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean privileges_windows.go? Or should we create another helper function.
Sorry. I misunderstood you 🙈

return serviceHandle, nil
}

func getServiceStates(handle ServiceDatabaseHandle, state ServiceEnumState) ([]ServiceStatus, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's definitely better now, but it still scores a 13 by gocyclo (<= 10 is what I would aim for). Can you see if you can do a bit more refactoring to this method. Thanks

"service_name": service.ServiceName,
"state": service.CurrentState,
"start_type": service.StartType,
"uptime": service.Uptime,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the units for this? I would embed them into the field name like uptime.ms. (This needs to be a nested object rather than a field name containing a dot.)

description: >
`services` contains the status for windows services.
fields:
- name: status
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please update this to include all of the fields being sent.

For an example of how to use a field formatter for the uptime field see ./metricbeat/module/system/uptime/_meta/fields.yml. This will cause the uptime field to be rendered nicely in Kibana (like "2 days").

@@ -0,0 +1,19 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a TestData function (example) to the package that will generate this file when you run go test -data . from within this package's directory.

@martinscholz83
Copy link
Contributor Author

I think the helper function should have some kind of logic such that only the first call does any real work.

You mean some simple bool if the check has already run?

Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I think it needs only a few more things

  • make update - To regenerate the docs.
  • make fmt - To fix the imports. Then you probably need to manually fix things to ensure they are grouped as stdlib, third-party, then elastic.


// CheckAndEnableSeDebugPrivilege checks if the process's token has the
// SeDebugPrivilege and enables it if it is disabled.
func CheckAndEnableSeDebugPrivilege() error {
Copy link
Member

@andrewkroh andrewkroh Oct 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean some simple bool if the check has already run?

I'm thinking this should be checkAndEnableSeDebugPrivilege() and a separate method named CheckAndEnableSeDebugPrivilege exists that calls checkAndEnableSeDebugPrivilege() only once by using sync.Once.

This way only the system module or the windows module will do the initialization, but not both.

}

tm := time.Now().UnixNano()
uptime := time.Duration((tm / int64(time.Millisecond)) - int64(processCreationTime.StartTime))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is any more clear, but there is the time.Since() function for the purpose of calculating elapsed times.

uptime := time.Since(time.Unix(0, int64(processCreationTime.StartTime) * int64(time.Millisecond)))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

}

func NewServiceReader() (*ServiceReader, error) {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove newline from beginning of method. (BTW I built a tool to find these called nonewlines.)


//var state ServiceEnumState

// configState := strings.ToLower(config.State)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please remove the commented out code.

@martinscholz83
Copy link
Contributor Author

In commit make fmt i removed in line 111 the -local switch from .../libbeat/scripts/Makefile because goimport doesn't know this switch. Can you confirm this?!

@andrewkroh
Copy link
Member

goimports has a -local flags. Try updating go get -u golang.org/x/tools/cmd/goimports.

Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Almost there.

I recommend trying gometalinter. It helps me catch things in my code.

go get -u github.com/alecthomas/gometalinter
gometalinter --install
cd to/your/package/directory
gometalinter --deadline=30s --disable=gotype

return time.Duration(processCreationTime.StartTime), err
}

uptime := time.Since(time.Unix(0, int64(processCreationTime.StartTime)*int64(time.Millisecond))) / time.Millisecond
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't divide by time.Milliseconds. Leave the value stored as nanoseconds because this is what a time.Duration value is defined as. Do the conversion to ms later when you need to write the value to an event.

return nil
}

func CheckAndEnableSeDebugPrivilege() (err error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally try to avoid named result parameters. https://github.com/golang/go/wiki/CodeReviewComments#named-result-parameters

return err
}

// CheckAndEnableSeDebugPrivilege checks if the process's token has the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function name should be lower-case now.

description: >
`services` contains the status for windows services.
fields:
- name: uptime
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be simplified if you use a dotted key name. The tooling with expand it for the purposes of the index template.

  fields:
    - name: uptime.ms
      type: long
      format: duration
      input_format: milliseconds
      description: >
        The service uptime in milliseconds.

    - name: service_name

"service_name": service.ServiceName,
"state": service.CurrentState,
"start_type": service.StartType,
"uptime": map[string]interface{}{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If service.Uptime is 0 then let's not add it to the event. It should only be zero if an error occurred which would mean the data is invalid, right?

Copy link
Contributor Author

@martinscholz83 martinscholz83 Oct 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or the service is stopped. So maybe a check for 0 should help.

- name: display_name
type: keyword
description: >
The display name of the service.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the fields that are enums it would be nice to document what the possible values are. I would simply add a sentence that says "The possible values are x, y, and z".

Copy link
Contributor Author

@martinscholz83 martinscholz83 Oct 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some special format for the values like x, y and z?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you like 👍

@@ -29,8 +24,11 @@
type: keyword
description: >
The start type of the service.
The possible values are `ServiceAutoStart`, `ServiceBootStart`, `ServiceDemandStart`, `ServiceDisabled` and `ServiceSystemStart`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we adhere to the "Oxford comma" rule in our documentation. As such there should be a comma before the "and". Same for the other lists.

}

if service.Uptime > 0 {
ev.Put("uptime", map[string]interface{}{"ms": service.Uptime})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use ev.Put("uptime.ms", service.Uptime). It will automatically do the right thing.

"windows": {
"services": {
"uptime": {
"ms": 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this needs to be updated now that 0 values are filtered.

@andrewkroh
Copy link
Member

andrewkroh commented Oct 25, 2017

There are some formatting issues that Travis CI is detecting. See https://travis-ci.org/elastic/beats/jobs/292592948#L590-L593

Also can you please squash your commits. Then rebase it on master (make sure your master is up-to-date with the remote first). There were some changes in master that are missing from your branch to allow Jenkins CI to run. Let me know if you need help with the git stuff.

@martinscholz83
Copy link
Contributor Author

These formatting errors are really frustating. I have run gometalinter and goimports over these files. @andrewkroh, can you take a look what's wrong. Thank you!

@andrewkroh
Copy link
Member

andrewkroh commented Oct 26, 2017

Use make fmt then view the git diff locally. After you commit the changes do one last make check.

@martinscholz83
Copy link
Contributor Author

I've figured out why goimports was not working. I had a ubuntu package installed golang-golang-x-tools 😒

@andrewkroh
Copy link
Member

jenkins, test it

@martinscholz83
Copy link
Contributor Author

martinscholz83 commented Oct 27, 2017

@andrewkroh, before you merge. One problem i actually run into is the part here. Sometimes i get an error from sys.UTF16BytesToUTF8Bytes input buffer must have an even length. This is different from machine to machine. That's why i have to do lastOffset = uintptr(len(servicesBuffer)) - 1 to make it even. Do you have any ideas. From understadnig i think its the same as in pdh_windows.go that the service name are appended to the end of the buffer.

Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes i get an error from sys.UTF16BytesToUTF8Bytes input buffer must have an even length

It looks like Windows appends a single byte 0x00 to ensure the buffer is null-terminated. This is why the overall buffer length is odd. I made some changes in andrewkroh@7ea1e2f. Can you pull that commit into this branch or apply the change yourself.



[float]
=== `windows.services.service_name`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I read through these keys I am thinking we should call this the windows service metricset (not plural). Similar to how the system process metricset is not plural. Then the field names here make more sense.

Additionally, if it is renamed then I would change windows.services.service_name to windows.service.name to removing the stuttering.

andrewkroh and others added 7 commits November 2, 2017 08:28
Made change to account for null-terminators that are 0x00 when we were expecting 0x0000.

Fix misspelling in ServiceRunning
Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Can you please fix the one typo I just found then we can merge this. 🎉

type: keyword
description: >
The actual state of the service.
The possible values are `ServiceContinuePending`, `ServicePausePending`, `ServicePaused`, `ServiceRuning`, `ServiceStartPending`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/ServiceRuning/ServiceRunning/

@andrewkroh
Copy link
Member

You'll need a make update to rebuild the docs from fields.yml.

@martinscholz83
Copy link
Contributor Author

Jap. Switch to my linux machine

@andrewkroh
Copy link
Member

jenkins, test it

@andrewkroh andrewkroh merged commit 7b4dd0d into elastic:master Nov 2, 2017
@andrewkroh
Copy link
Member

@maddin2016 Thanks for another great Windows monitoring contribution!

@martinscholz83
Copy link
Contributor Author

Anytime! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants