-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New windows service metricset #5332
New windows service metricset #5332
Conversation
Can one of the admins verify this patch? |
@andrewkroh, currently i find no way to open the processes to get the uptime without to run as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you are off to a good start. I left comments and will take a closer look after you do some more cleanup and refactoring. Some tests would be helpful too.
@@ -0,0 +1,61 @@ | |||
// +build ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than duplicating this can you move it up a level and share it between both metricsets.
func New(base mb.BaseMetricSet) (mb.MetricSet, error) { | ||
logp.Warn("EXPERIMENTAL: The windows services metricset is experimental") | ||
|
||
config := struct{}{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's not used it should be removed.
} | ||
|
||
reader, err := NewServiceReader() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to be conventional in Go to put the error check on the immediately following line. Please remove the newline separator.
// It returns the event which is then forward to the output. In case of an error, a | ||
// descriptive error must be returned. | ||
func (m *MetricSet) Fetch() ([]common.MapStr, error) { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't start methods with a newline. We try to not do this in Beats (with one exception for functions where the params span multiple lines).
"unicode/utf16" | ||
"unsafe" | ||
|
||
"github.com/elastic/beats/libbeat/common" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All github.com/elastic
imports should be in the third group (it goes stdlib, third-party, elastic).
//sys _CloseServiceHandle(handle uintptr) (err error) = advapi32.CloseServiceHandle | ||
|
||
var ( | ||
sizeOfEnumServiceStatusProcess = (int)(unsafe.Sizeof(EnumServiceStatusProcess{})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Writing sizeof
instead of sizeOf
is most common (b/c that's the name of the C operator that generates the value).
return serviceHandle, nil | ||
} | ||
|
||
func getServiceStates(handle ServiceDatabaseHandle, state ServiceEnumState) ([]ServiceStatus, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is quite long. I recommend breaking some of the pieces into smaller functions then calling those from this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's definitely better now, but it still scores a 13 by gocyclo (<= 10 is what I would aim for). Can you see if you can do a bit more refactoring to this method. Thanks
//Get uptime for service | ||
if ServiceState(serviceTemp.ServiceStatusProcess.DwCurrentState) != ServiceStopped { | ||
|
||
var processCreationTime syscall.Filetime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what the system process metricset does so you can reuses the same code. See ProcTime.Get()
in elastic/gosigar. They system module has logic for attempting to enable the SeDebugPrivilege
. We'll need to figure out how to reuse code and only activate it once. But if the system module is used along side this metricset it should work (assuming the appropriate privs exist).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about to move the logic from system_windows.go
into a helper class ...\metricebeat\helper\windows.go
?
return nil, err | ||
} | ||
|
||
if err := syscall.OpenProcessToken(currentProcess, syscall.TOKEN_ADJUST_PRIVILEGES, &token); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably shouldn't hard fail on a permission issue. I think it would be better to gracefully degrade such that the uptime info isn't available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice progress.
What about to move the logic from system_windows.go into a helper class ...\metricebeat\helper\windows.go?
Yeah, but how about privileges_windows.go. I think the helper function should have some kind of logic such that only the first call does any real work. This way if both the system module and the windows-services metricset are used that this code only runs once.
There's a metricbeat/debug file in your PR that looks to have been accidentally added. Can you please remove that.
|
||
var ( | ||
sizeofEnumServiceStatusProcess = (int)(unsafe.Sizeof(EnumServiceStatusProcess{})) | ||
sizeofQueryServiceConfig = (int)(unsafe.Sizeof(QueryServiceConfig{})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. I delete this one
sizeofQueryServiceConfig = (int)(unsafe.Sizeof(QueryServiceConfig{})) | ||
) | ||
|
||
type enumServiceStatusProcess struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused?
return nil | ||
} | ||
|
||
func getServiceUptime(processId uint32) (gosigar.ProcTime, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend encapsulating the logic for the uptime calculation into this helper function. I would change the return params to be (time.Duration, error)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean privileges_windows.go
? Or should we create another helper function.
Sorry. I misunderstood you 🙈
return serviceHandle, nil | ||
} | ||
|
||
func getServiceStates(handle ServiceDatabaseHandle, state ServiceEnumState) ([]ServiceStatus, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's definitely better now, but it still scores a 13 by gocyclo (<= 10 is what I would aim for). Can you see if you can do a bit more refactoring to this method. Thanks
"service_name": service.ServiceName, | ||
"state": service.CurrentState, | ||
"start_type": service.StartType, | ||
"uptime": service.Uptime, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the units for this? I would embed them into the field name like uptime.ms
. (This needs to be a nested object rather than a field name containing a dot.)
description: > | ||
`services` contains the status for windows services. | ||
fields: | ||
- name: status |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please update this to include all of the fields being sent.
For an example of how to use a field formatter for the uptime field see ./metricbeat/module/system/uptime/_meta/fields.yml. This will cause the uptime field to be rendered nicely in Kibana (like "2 days").
@@ -0,0 +1,19 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a TestData
function (example) to the package that will generate this file when you run go test -data .
from within this package's directory.
You mean some simple bool if the check has already run? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I think it needs only a few more things
make update
- To regenerate the docs.make fmt
- To fix the imports. Then you probably need to manually fix things to ensure they are grouped as stdlib, third-party, then elastic.
|
||
// CheckAndEnableSeDebugPrivilege checks if the process's token has the | ||
// SeDebugPrivilege and enables it if it is disabled. | ||
func CheckAndEnableSeDebugPrivilege() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean some simple bool if the check has already run?
I'm thinking this should be checkAndEnableSeDebugPrivilege()
and a separate method named CheckAndEnableSeDebugPrivilege
exists that calls checkAndEnableSeDebugPrivilege()
only once by using sync.Once
.
This way only the system module or the windows module will do the initialization, but not both.
} | ||
|
||
tm := time.Now().UnixNano() | ||
uptime := time.Duration((tm / int64(time.Millisecond)) - int64(processCreationTime.StartTime)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is any more clear, but there is the time.Since()
function for the purpose of calculating elapsed times.
uptime := time.Since(time.Unix(0, int64(processCreationTime.StartTime) * int64(time.Millisecond)))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
} | ||
|
||
func NewServiceReader() (*ServiceReader, error) { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove newline from beginning of method. (BTW I built a tool to find these called nonewlines.)
|
||
//var state ServiceEnumState | ||
|
||
// configState := strings.ToLower(config.State) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please remove the commented out code.
In commit make fmt i removed in line 111 the |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Almost there.
I recommend trying gometalinter. It helps me catch things in my code.
go get -u github.com/alecthomas/gometalinter
gometalinter --install
cd to/your/package/directory
gometalinter --deadline=30s --disable=gotype
return time.Duration(processCreationTime.StartTime), err | ||
} | ||
|
||
uptime := time.Since(time.Unix(0, int64(processCreationTime.StartTime)*int64(time.Millisecond))) / time.Millisecond |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't divide by time.Milliseconds
. Leave the value stored as nanoseconds because this is what a time.Duration
value is defined as. Do the conversion to ms
later when you need to write the value to an event.
return nil | ||
} | ||
|
||
func CheckAndEnableSeDebugPrivilege() (err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We generally try to avoid named result parameters. https://github.com/golang/go/wiki/CodeReviewComments#named-result-parameters
return err | ||
} | ||
|
||
// CheckAndEnableSeDebugPrivilege checks if the process's token has the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function name should be lower-case now.
description: > | ||
`services` contains the status for windows services. | ||
fields: | ||
- name: uptime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be simplified if you use a dotted key name. The tooling with expand it for the purposes of the index template.
fields:
- name: uptime.ms
type: long
format: duration
input_format: milliseconds
description: >
The service uptime in milliseconds.
- name: service_name
"service_name": service.ServiceName, | ||
"state": service.CurrentState, | ||
"start_type": service.StartType, | ||
"uptime": map[string]interface{}{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If service.Uptime
is 0 then let's not add it to the event. It should only be zero if an error occurred which would mean the data is invalid, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or the service is stopped. So maybe a check for 0
should help.
- name: display_name | ||
type: keyword | ||
description: > | ||
The display name of the service. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the fields that are enums it would be nice to document what the possible values are. I would simply add a sentence that says "The possible values are x, y, and z".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some special format for the values like x
, y
and z
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you like 👍
@@ -29,8 +24,11 @@ | |||
type: keyword | |||
description: > | |||
The start type of the service. | |||
The possible values are `ServiceAutoStart`, `ServiceBootStart`, `ServiceDemandStart`, `ServiceDisabled` and `ServiceSystemStart`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we adhere to the "Oxford comma" rule in our documentation. As such there should be a comma before the "and". Same for the other lists.
} | ||
|
||
if service.Uptime > 0 { | ||
ev.Put("uptime", map[string]interface{}{"ms": service.Uptime}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use ev.Put("uptime.ms", service.Uptime)
. It will automatically do the right thing.
"windows": { | ||
"services": { | ||
"uptime": { | ||
"ms": 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this needs to be updated now that 0 values are filtered.
There are some formatting issues that Travis CI is detecting. See https://travis-ci.org/elastic/beats/jobs/292592948#L590-L593 Also can you please squash your commits. Then rebase it on master (make sure your master is up-to-date with the remote first). There were some changes in master that are missing from your branch to allow Jenkins CI to run. Let me know if you need help with the git stuff. |
These formatting errors are really frustating. I have run |
Use |
I've figured out why |
jenkins, test it |
@andrewkroh, before you merge. One problem i actually run into is the part here. Sometimes i get an error from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes i get an error from sys.UTF16BytesToUTF8Bytes input buffer must have an even length
It looks like Windows appends a single byte 0x00 to ensure the buffer is null-terminated. This is why the overall buffer length is odd. I made some changes in andrewkroh@7ea1e2f. Can you pull that commit into this branch or apply the change yourself.
metricbeat/docs/fields.asciidoc
Outdated
|
||
|
||
[float] | ||
=== `windows.services.service_name` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I read through these keys I am thinking we should call this the windows service metricset (not plural). Similar to how the system process metricset is not plural. Then the field names here make more sense.
Additionally, if it is renamed then I would change windows.services.service_name
to windows.service.name
to removing the stuttering.
Made change to account for null-terminators that are 0x00 when we were expecting 0x0000. Fix misspelling in ServiceRunning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Can you please fix the one typo I just found then we can merge this. 🎉
type: keyword | ||
description: > | ||
The actual state of the service. | ||
The possible values are `ServiceContinuePending`, `ServicePausePending`, `ServicePaused`, `ServiceRuning`, `ServiceStartPending`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/ServiceRuning/ServiceRunning/
You'll need a |
Jap. Switch to my linux machine |
jenkins, test it |
@maddin2016 Thanks for another great Windows monitoring contribution! |
Anytime! 👍 |
As discussed in #5256, i created a new metricset to collect information about windows services.
To do